FIG. 24 illustrates a data processing system 2400 according to embodiments described herein. The data processing system 2400 is a heterogeneous processing system having a processor 2402, unified memory 2410, and a GPGPU 2420. The processor 2402 and the GPGPU 2420 can be any of the processors and GPGPU/parallel processors as described herein. The unified memory 2410 represents a unified address space that may be accessed by the processor 2402 and the GPGPU 2420. The unified memory includes system memory 2412 as well as GPGPU memory 2418. In some embodiments the GPGPU memory 2418 includes GPGPU local memory 2428 within the GPGPU 2420 and can also include some or all of system memory 2412. For example, compiled code 2414B stored in system memory 2412 can also be mapped into GPGPU memory 2418 for access by the GPGPU 2420. In one embodiment a runtime library 2416 in system memory 2412 can facilitate the compilation and/or execution of compiled code 2414B. The processor 2402 can execute instructions for a compiler 2415 stored in system memory 2412. The compiler 2415 can compile source code 2414A into compiled code 2414B for execution by the processor 2402 and/or GPGPU 2420. In one embodiment, the compiler 2415 is, or can include a shader compiler to compiler shader programs specifically for execution by the GPGPU 2420.