白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Hardware accelerator for executing a computation task

專利號
US11175957B1
公開日期
2021-11-16
申請人
International Business Machines Corporation(US NY Armonk)
發(fā)明人
Dionysios Diamantopoulos; Florian Michael Scheidegger; Adelmo Cristiano Innocenza Malossi; Christoph Hagleitner; Konstantinos Bekas
IPC分類
G06F9/30; G06F9/50; G06F9/38
技術領域
bit,may,unit,input,be,units,data,tensor,operands,hardware
地域: NY NY Armonk

摘要

The present disclosure relates to a hardware accelerator for executing a computation task composed of a set of operations. The hardware accelerator comprises a controller and a set of computation units. Each computation unit of the set of computation units is configured to receive input data of an operation of the set of operations and to perform the operation, wherein the input data is represented with a distinct bit length associated with each computation unit. The controller is configured to receive the input data represented with a certain bit length of the bit lengths and to select one of the set of computation units that can deliver a valid result and that is associated with a bit length smaller than or equal to the certain bit length.

說明書

BACKGROUND

The present invention relates to the field of digital computer systems, and more specifically to a hardware accelerator.

Hardware acceleration enables the use of a computer hardware specially made to perform some functions more efficiently than is possible in software running on a general-purpose central processing unit (CPU). However, there is a need to improve the logic utilization of hardware accelerators.

SUMMARY

Various embodiments provide a hardware accelerator, a method for a hardware accelerator, and computer program product for a hardware accelerator. Embodiments of the present invention can be freely combined with each other if they are not mutually exclusive.

In one aspect, the invention relates to a hardware accelerator for executing a computation task composed of a set of operations. The hardware accelerator comprises a controller and a set of computation units, each computation unit of the set of computation units being configured to receive input data of an operation of the set of operations and to perform the operation, wherein the input data is represented by a distinct bit length associated with the each computation unit, thereby the set of computation units being associated with a set of distinct bit lengths, the controller being configured to receive the input data represented with a certain bit length of the set of bit lengths and to select the computation unit of the set of computation units that can deliver a valid result, wherein the selected computation unit is associated with a bit length smaller than or equal to the certain bit length, wherein an output of the selected computation unit is provided as a result of the operation.

權利要求

1
The invention claimed is:1. A hardware accelerator for executing a computation task composed of a set of operations, the hardware accelerator comprising:a controller and a set of computation units, wherein each computation unit of the set of computation units being configured to receive input data of an operation of the set of operations and to perform the operation, wherein the input data is represented by a distinct bit length associated with each computation unit, and wherein the set of computation units is associated with a set of bit lengths; andthe controller being configured to receive the input data represented with a certain bit length of the set of bit lengths and to select the computation unit of the set of computation units that can deliver a valid result and that is associated with a bit length smaller than or equal to the certain bit length, wherein an output of the selected computation unit is provided as a result of the operation.2. The hardware accelerator of claim 1, the controller being configured to stop execution of the operation by non-selected computation unit(s).3. The hardware accelerator of claim 1, the input data being received simultaneously at the set of computation units and the controller.4. The hardware accelerator of claim 1, the selected computation unit being associated with a smallest bit length that is smaller than or equal to the certain bit length.5. The hardware accelerator of claim 1, the certain bit length being a highest bit length of the set of bit lengths.6. The hardware accelerator of claim 1, the set of computation units comprising a first computation unit associated with the certain bit length, wherein each computation unit of the set of computation units that is different from the first computation unit is a replication unit of the first computation unit.7. The hardware accelerator of claim 1, the certain bit length of the set of bit lengths being n-bit, wherein each computation unit of the set of computation units that is associated with a bit length k-bit is configured to read the k least significant bits (LSB) of the received input data, wherein k-bit is smaller than n-bit.8. The hardware accelerator of claim 1, being selected from the group consisting of a field-programmable gate array (FPGA), a graphics processing unit (GPU), and an application-specific integrated circuit (ASIC).9. The hardware accelerator of claim 8, the set of computation units comprising a minimum number of computation units such that a logic utilization of the FPGA is higher than a predefined threshold.10. The hardware accelerator of claim 1, a number of computation units of the set of computation units being the number of bits of the highest bit length of the set of bit lengths.11. The hardware accelerator of claim 1, the computation task comprising one of: training a deep neural network, inference of a deep neural network, matrix-vector multiplication, and matrix-matrix multiplication.12. The hardware accelerator of claim 1, the input data of the operation comprising two operands, the controller comprising logic gates to determine a maximum number of leading zeros that is present in the operands of the input data, wherein the maximum number of leading zeros is indicative of a bit length of the selected computation unit.13. A method for executing a computation task composed of a set of operations, the method comprising:providing a hardware accelerator comprising a controller and a set of computation units;receiving, at each computation of the set of computation units, input data of an operation of the set of operations and starting the operation, wherein the input data is represented with a distinct bit length associated with each computation unit, and wherein the set of computation units is associated with a set of bit lengths;receiving, at the controller, the input data represented with a certain bit length of the set of bit lengths;selecting, by the controller, the computation unit of the set of computation units that can deliver a valid result, the selected computation unit being associated with a bit length smaller than or equal to the certain bit length; andproviding the output of the selected computation unit as a result of the operation.14. The method of claim 13, wherein providing the hardware accelerator comprises:providing the hardware accelerator comprising the controller and a first computation unit configured to receive the input data of the operation and to perform the operation, wherein the input data is represented with a bit length n-bit;creating one or more replication units of the first computation unit, wherein each created replication unit of the replication units is configured to receive the input data of the operation and to perform the operation, wherein the input data is represented with a bit length k-bit, where k is smaller than n, k<n; andthe set of computation units comprising the first computation unit and the created replication units.15. The method of claim 14, the hardware accelerator comprising a field-programmable gate array (FPGA), wherein creating each replication unit of the replication units comprises:generating a bitstream file, andprogramming the FPGA in accordance with the bitstream file so that a portion of the FPGA is configured as the replication unit.16. The method of claim 15, being automatically performed.17. The method of claim 13, further comprising repeating the receiving step, the selection step, and the providing step for each operation of the set of operations.18. A computer program product for executing a computation task composed of a set of operations, the computer program product comprising:one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions comprising:program instructions to provide a hardware accelerator comprising a controller and a set of computation units;program instructions to receive, at each computation of the set of computation units, input data of an operation of the set of operations and starting the operation, wherein the input data is represented with a distinct bit length associated with each computation unit, and wherein the set of computation units is associated with a set of bit lengths;program instructions to receive, at the controller, the input data represented with a certain bit length of the set of bit lengths;program instructions to select, by the controller, the computation unit of the set of computation units that can deliver a valid result, the selected computation unit being associated with a bit length smaller than or equal to the certain bit length; andprogram instructions to provide the output of the selected computation unit as a result of the operation.
微信群二維碼
意見反饋