What is claimed is:1. An integrated circuit device, comprising:a first execution engine operable to execute a first set of instructions;a second execution engine operable to execute a second set of instructions;a specified number of checkpoint registers; andsynchronization logic,wherein the first set of instructions includes a first instruction that causes the first execution engine to set a value in the checkpoint register,wherein the second set of instructions includes a second instruction that causes the second execution engine to wait for a condition corresponding to the checkpoint value set in the checkpoint register by the first instruction executed by the first execution engine,wherein the value corresponds to a data dependency between the first execution engine and the second execution engine or a resource dependency between the first execution engine and the second execution engine, and wherein the first execution engine sets the value in the checkpoint register upon completion of an operation upon which the second execution engine depends,wherein the synchronization logic is operable to broadcast the value set in the checkpoint register by the first execution engine to the second execution engine, andwherein when the condition corresponding to the checkpoint value in the checkpoint register is met, the second execution engine resumes execution of the second set of instructions, andwherein each of the specified number of checkpoint registers is accessible to both the first execution engine and the second execution engine.2. The integrated circuit device of claim 1, wherein the first execution engine is configured to set the checkpoint value in the checkpoint register by setting a specific value to the checkpoint register, incrementing a current value of the checkpoint register by a specified value, or decrementing the current value of the checkpoint register by a specified value.3. The integrated circuit device of claim 1, wherein the first execution engine or the second execution engine includes an array of processing engines, a computation engine executing a pooling operation, a computation engine executing an activation function, or a Direct Memory Access (DMA) engine.4. An integrated circuit device, comprising:a first execution engine operable to execute a first set of instructions;a second execution engine operable to execute a second set of instructions, wherein execution of the second set of instructions depends on completion of a given operation by the first execution engine; anda specified number of checkpoint registers,wherein the first set of instructions includes a first instruction that causes the first execution engine to set a value in a checkpoint register of the specified number of checkpoint registers upon completion of the given operation,wherein the second set of instructions includes a second instruction that causes the second execution engine to wait for a condition corresponding to the checkpoint value in the checkpoint register,wherein each of the specified number of checkpoint registers is accessible to both the first execution engine and the second execution engine, andwherein the integrated circuit device further comprises synchronization logic operable to broadcast the value set in the checkpoint register by the first execution engine to the second execution engine.5. The integrated circuit device of claim 4, wherein the checkpoint register comprises a plurality of addresses corresponding to the checkpoint register, andwherein upon a value being written to one of the plurality of addresses corresponding to the checkpoint register, the checkpoint register is operable to set a specific value to the checkpoint register, increment a current value of the checkpoint register by a specified value, and decrement the current value of the checkpoint register by a specified value.6. The integrated circuit device of claim 4, wherein the first execution engine and the second execution engine are operable to set or wait for a condition corresponding to a checkpoint value in any of the specified number of checkpoint registers.7. The integrated circuit device of claim 4, wherein the checkpoint register comprises a plurality of checkpoint registers,a first checkpoint register of the plurality of checkpoint registers corresponds to the first execution engine,a second checkpoint register of the plurality of checkpoint registers corresponds to the second execution engine, andeach of the plurality of checkpoint registers is accessible to both the first execution engine and the second execution engine.8. The integrated circuit device of claim 4, wherein the checkpoint register comprises a plurality of checkpoint registers,a first checkpoint register of the plurality of checkpoint registers corresponds to the first execution engine, anda second checkpoint register of the plurality of checkpoint registers corresponds to the second execution engine.9. The integrated circuit device of claim 8, wherein the first execution engine is operable to remotely set a first value in the second checkpoint register but not to wait for the first value remotely set in the second checkpoint register, andwherein the second execution engine is operable to remotely set a second value in the first checkpoint register but not to wait for the second value remotely set in the first checkpoint register.10. The integrated circuit device of claim 9, further comprising synchronization logic,wherein the synchronization logic is operable to coordinate remote setting of the first value in the second checkpoint register corresponding to the second execution engine by the first execution engine, and coordinate remote setting of the second value in the first checkpoint register corresponding to the first execution engine by the second execution engine.11. The integrated circuit device of claim 8, further comprising a third execution engine and a third checkpoint register corresponding to the third execution engine, the third execution engine operable to execute a third set of instructions comprising a third instruction that causes the third execution engine to wait for a first value to be remotely set in the third checkpoint register by the first execution engine and a second value to be remotely set in the third checkpoint register by the second execution engine,wherein the third execution engine depends upon completion of a first operation performed by the first execution engine and completion of a second operation performed by the second execution engine,wherein the first execution engine remotely sets the first value in the third checkpoint register upon completion of the first operation upon which the third execution engine depends, andwherein the second execution engine remotely sets the second value in the third checkpoint register upon completion of the second operation upon which the third execution engine depends.12. The integrated circuit device of claim 4, wherein the first execution engine or the second execution engine includes an array of processing engines, a computation engine executing a pooling operation, a computation engine executing an activation function, or a Direct Memory Access (DMA) engine.13. The integrated circuit device of claim 4, wherein the first execution engine and the second execution engine are operable to set monotonically increasing values in the checkpoint register.14. The integrated circuit device of claim 4, wherein the integrated circuit device is a neural network processor.15. A computer implemented method, comprising:generating a first set of instructions to be executed by a first execution engine;generating a second set of instructions to be executed by a second execution engine, wherein execution of the second set of instructions depends on completion of an operation by the first execution engine, wherein a first instruction in the second set of instructions causes the second execution engine to wait for the first execution engine to complete the operation;executing, by the first execution engine, the first set of instructions to complete the operation upon which execution of the second set of instructions by the second execution engine depends, wherein a last instruction in the first set of instructions is an instruction to set a value in a checkpoint register of a specified number of checkpoint registers; andsending, by the checkpoint register, the value set in the checkpoint register by the first execution engine to the second execution engine, wherein each of the specified number of checkpoint registers is accessible to both the first execution engine and the second execution engine, andbroadcasting, by synchronization logic, the value set in the checkpoint register by the first execution engine to the second execution engine.16. The computer-implemented method of claim 15, wherein the first execution engine is operable to set the checkpoint value in the checkpoint register by setting a specific value to the checkpoint register, incrementing a current value of the checkpoint register by a specified value, or decrementing the current value of the checkpoint register by a specified value.17. The computer-implemented method of claim 15, wherein the first execution engine is operable to set monotonically increasing values in the checkpoint register for dependencies by the second execution engine on completion of subsequent operations by the first execution engine.18. The computer-implemented method of claim 15, wherein the second execution engine is operable to wait for a value equal to a value set in the checkpoint register, a value greater than a value set in the checkpoint register, a value greater than or equal to a value set in the checkpoint register, a value less than a value set in the checkpoint register, or a value less than or equal to a value set in the checkpoint register.