白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Synchronization of concurrent computation engines

專利號
US11175919B1
公開日期
2021-11-16
申請人
Amazon Technologies, Inc.(US WA Seattle)
發(fā)明人
Ilya Minkin; Ron Diamant; Drazen Borkovic; Jindrich Zejda; Dana Michelle Vantrease
IPC分類
G06F9/30; G06F9/35; G06F13/28; G06F9/38; G06F9/52; G06N3/06
技術(shù)領(lǐng)域
checkpoint,engine,execution,register,ckpt1,engines,in,wait,value,can
地域: WA WA Seattle

摘要

Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

說明書

BACKGROUND

Integrated circuit devices, such as processors, accelerators, and others, can include multiple execution engines. For example, the integrated circuit device can include parallel execution engines that are capable of performing large, multi-stage computations, such as convolutions. As another example, the integrated circuit device can include execution engines for more specific operations, such as accumulating values or floating point math.

The data on which the execution engines operate can be retrieved from a memory of the integrated circuit device. Results produced by the execution engines can further be written to the memory. The memory may be limited in size, due to considerations such as the available space on the chip for the memory.

BRIEF DESCRIPTION OF THE DRAWINGS

Various examples in accordance with the present disclosure will be described with reference to the drawings, in which:

FIG. 1 is a diagram illustrating an example dataflow graph with data and/or resource dependencies;

FIG. 2 is a diagram illustrating the operations in the dataflow graph of FIG. 1 as these operations may be executed by a first execution engine and a second execution engine;

FIG. 3 is a diagram illustrating an example of setting a value in a checkpoint register;

FIG. 4 is a sequence diagram illustrating an example of using checkpoints to synchronize execution engines in a global checkpoint register implementation;

權(quán)利要求

1
What is claimed is:1. An integrated circuit device, comprising:a first execution engine operable to execute a first set of instructions;a second execution engine operable to execute a second set of instructions;a specified number of checkpoint registers; andsynchronization logic,wherein the first set of instructions includes a first instruction that causes the first execution engine to set a value in the checkpoint register,wherein the second set of instructions includes a second instruction that causes the second execution engine to wait for a condition corresponding to the checkpoint value set in the checkpoint register by the first instruction executed by the first execution engine,wherein the value corresponds to a data dependency between the first execution engine and the second execution engine or a resource dependency between the first execution engine and the second execution engine, and wherein the first execution engine sets the value in the checkpoint register upon completion of an operation upon which the second execution engine depends,wherein the synchronization logic is operable to broadcast the value set in the checkpoint register by the first execution engine to the second execution engine, andwherein when the condition corresponding to the checkpoint value in the checkpoint register is met, the second execution engine resumes execution of the second set of instructions, andwherein each of the specified number of checkpoint registers is accessible to both the first execution engine and the second execution engine.2. The integrated circuit device of claim 1, wherein the first execution engine is configured to set the checkpoint value in the checkpoint register by setting a specific value to the checkpoint register, incrementing a current value of the checkpoint register by a specified value, or decrementing the current value of the checkpoint register by a specified value.3. The integrated circuit device of claim 1, wherein the first execution engine or the second execution engine includes an array of processing engines, a computation engine executing a pooling operation, a computation engine executing an activation function, or a Direct Memory Access (DMA) engine.4. An integrated circuit device, comprising:a first execution engine operable to execute a first set of instructions;a second execution engine operable to execute a second set of instructions, wherein execution of the second set of instructions depends on completion of a given operation by the first execution engine; anda specified number of checkpoint registers,wherein the first set of instructions includes a first instruction that causes the first execution engine to set a value in a checkpoint register of the specified number of checkpoint registers upon completion of the given operation,wherein the second set of instructions includes a second instruction that causes the second execution engine to wait for a condition corresponding to the checkpoint value in the checkpoint register,wherein each of the specified number of checkpoint registers is accessible to both the first execution engine and the second execution engine, andwherein the integrated circuit device further comprises synchronization logic operable to broadcast the value set in the checkpoint register by the first execution engine to the second execution engine.5. The integrated circuit device of claim 4, wherein the checkpoint register comprises a plurality of addresses corresponding to the checkpoint register, andwherein upon a value being written to one of the plurality of addresses corresponding to the checkpoint register, the checkpoint register is operable to set a specific value to the checkpoint register, increment a current value of the checkpoint register by a specified value, and decrement the current value of the checkpoint register by a specified value.6. The integrated circuit device of claim 4, wherein the first execution engine and the second execution engine are operable to set or wait for a condition corresponding to a checkpoint value in any of the specified number of checkpoint registers.7. The integrated circuit device of claim 4, wherein the checkpoint register comprises a plurality of checkpoint registers,a first checkpoint register of the plurality of checkpoint registers corresponds to the first execution engine,a second checkpoint register of the plurality of checkpoint registers corresponds to the second execution engine, andeach of the plurality of checkpoint registers is accessible to both the first execution engine and the second execution engine.8. The integrated circuit device of claim 4, wherein the checkpoint register comprises a plurality of checkpoint registers,a first checkpoint register of the plurality of checkpoint registers corresponds to the first execution engine, anda second checkpoint register of the plurality of checkpoint registers corresponds to the second execution engine.9. The integrated circuit device of claim 8, wherein the first execution engine is operable to remotely set a first value in the second checkpoint register but not to wait for the first value remotely set in the second checkpoint register, andwherein the second execution engine is operable to remotely set a second value in the first checkpoint register but not to wait for the second value remotely set in the first checkpoint register.10. The integrated circuit device of claim 9, further comprising synchronization logic,wherein the synchronization logic is operable to coordinate remote setting of the first value in the second checkpoint register corresponding to the second execution engine by the first execution engine, and coordinate remote setting of the second value in the first checkpoint register corresponding to the first execution engine by the second execution engine.11. The integrated circuit device of claim 8, further comprising a third execution engine and a third checkpoint register corresponding to the third execution engine, the third execution engine operable to execute a third set of instructions comprising a third instruction that causes the third execution engine to wait for a first value to be remotely set in the third checkpoint register by the first execution engine and a second value to be remotely set in the third checkpoint register by the second execution engine,wherein the third execution engine depends upon completion of a first operation performed by the first execution engine and completion of a second operation performed by the second execution engine,wherein the first execution engine remotely sets the first value in the third checkpoint register upon completion of the first operation upon which the third execution engine depends, andwherein the second execution engine remotely sets the second value in the third checkpoint register upon completion of the second operation upon which the third execution engine depends.12. The integrated circuit device of claim 4, wherein the first execution engine or the second execution engine includes an array of processing engines, a computation engine executing a pooling operation, a computation engine executing an activation function, or a Direct Memory Access (DMA) engine.13. The integrated circuit device of claim 4, wherein the first execution engine and the second execution engine are operable to set monotonically increasing values in the checkpoint register.14. The integrated circuit device of claim 4, wherein the integrated circuit device is a neural network processor.15. A computer implemented method, comprising:generating a first set of instructions to be executed by a first execution engine;generating a second set of instructions to be executed by a second execution engine, wherein execution of the second set of instructions depends on completion of an operation by the first execution engine, wherein a first instruction in the second set of instructions causes the second execution engine to wait for the first execution engine to complete the operation;executing, by the first execution engine, the first set of instructions to complete the operation upon which execution of the second set of instructions by the second execution engine depends, wherein a last instruction in the first set of instructions is an instruction to set a value in a checkpoint register of a specified number of checkpoint registers; andsending, by the checkpoint register, the value set in the checkpoint register by the first execution engine to the second execution engine, wherein each of the specified number of checkpoint registers is accessible to both the first execution engine and the second execution engine, andbroadcasting, by synchronization logic, the value set in the checkpoint register by the first execution engine to the second execution engine.16. The computer-implemented method of claim 15, wherein the first execution engine is operable to set the checkpoint value in the checkpoint register by setting a specific value to the checkpoint register, incrementing a current value of the checkpoint register by a specified value, or decrementing the current value of the checkpoint register by a specified value.17. The computer-implemented method of claim 15, wherein the first execution engine is operable to set monotonically increasing values in the checkpoint register for dependencies by the second execution engine on completion of subsequent operations by the first execution engine.18. The computer-implemented method of claim 15, wherein the second execution engine is operable to wait for a value equal to a value set in the checkpoint register, a value greater than a value set in the checkpoint register, a value greater than or equal to a value set in the checkpoint register, a value less than a value set in the checkpoint register, or a value less than or equal to a value set in the checkpoint register.
微信群二維碼
意見反饋