In various implementations, systems and methods are provided for generating instructions for an integrated circuit device. The integrated circuit device includes multiple execution engines, which may be able to operate independently but whose operations may have data and/or resource dependencies. In various examples, the techniques discussed herein can include receiving an input data set that describes the operations to be performed by the integrated circuit device. The input data can, for example, be a dataflow graph. From the input data set, a memory operation to be performed by a first execution engine can be identified, as well as an operation that is to be performed by a second execution engine and that requires that the memory operation be completed. To accommodate this dependency, the instructions for the first execution engine can include a checkpoint set instruction and the instructions for the second execution engine can include a checkpoint wait instruction. The checkpoint wait can cause the second execution engine to wait for the first execution engine to reach the checkpoint set instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.
In various examples, the integrated circuit device can implement checkpoints using hardware registers. In these examples, a checkpoint may be set by writing a value to the register, incrementing a value in the register, or decrementing a value in the register. Hardware registers can have a small footprint on the chip die, and little circuitry is needed to write a register value or check a register value. Thus, using the techniques discussed herein, synchronization of the execution engines in the integrated circuit device can be accomplished with minimal additional circuitry on the integrated circuit device.