In at least one embodiment, multiple instances of GPGPU 3530 can be configured to operate as a compute cluster. In at least one embodiment, compute clusters 3536A-3536H may implement any technically feasible communication techniques for synchronization and data exchange. In at least one embodiment, multiple instances of GPGPU 3530 communicate over host interface 3532. In at least one embodiment, GPGPU 3530 includes an I/O hub 3539 that couples GPGPU 3530 with a GPU link 3540 that enables a direct connection to other instances of GPGPU 3530. In at least one embodiment, GPU link 3540 is coupled to a dedicated GPU-to-GPU bridge that enables communication and synchronization between multiple instances of GPGPU 3530. In at least one embodiment GPU link 3540 couples with a high speed interconnect to transmit and receive data to other GPGPUs 3530 or parallel processors. In at least one embodiment, multiple instances of GPGPU 3530 are located in separate data processing systems and communicate via a network device that is accessible via host interface 3532. In at least one embodiment GPU link 3540 can be configured to enable a connection to a host processor in addition to or as an alternative to host interface 3532. In at least one embodiment, GPGPU 3530 can be configured to execute a CUDA program.