白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Hardware accelerator for executing a computation task

專(zhuān)利號(hào)
US11175957B1
公開(kāi)日期
2021-11-16
申請(qǐng)人
International Business Machines Corporation(US NY Armonk)
發(fā)明人
Dionysios Diamantopoulos; Florian Michael Scheidegger; Adelmo Cristiano Innocenza Malossi; Christoph Hagleitner; Konstantinos Bekas
IPC分類(lèi)
G06F9/30; G06F9/50; G06F9/38
技術(shù)領(lǐng)域
bit,may,unit,input,be,units,data,operands,tensor,hardware
地域: NY NY Armonk

摘要

The present disclosure relates to a hardware accelerator for executing a computation task composed of a set of operations. The hardware accelerator comprises a controller and a set of computation units. Each computation unit of the set of computation units is configured to receive input data of an operation of the set of operations and to perform the operation, wherein the input data is represented with a distinct bit length associated with each computation unit. The controller is configured to receive the input data represented with a certain bit length of the bit lengths and to select one of the set of computation units that can deliver a valid result and that is associated with a bit length smaller than or equal to the certain bit length.

說(shuō)明書(shū)

The present invention may speed up the computations performed by the hardware accelerators. For that, the present invention involves a speculation of precision to deliver the results faster when the precision of the partial product allows it with no compromise in accuracy. The speculation of precision means that a computation unit reads only a number of least significant bits (LSB) of the input data and thus speculating that the ignored most significant bits (MSB) are 0. The present invention may thus provide parallel processing units with different precision capabilities. The selected computation unit may have a lower number of bits which means a smaller memory footprint, more efficient arithmetic units, lower latency, and higher memory bandwidth. The benefit of using a reduced precision format may lie in the efficiency of the multiply and accumulate operations, e.g., in deep learning inference or training. The hardware accelerator may, thus, enable a competitive inference system for a fast and efficient matrix multiplier.

According to one embodiment, the input data is received simultaneously at the set of computation units and the controller. This embodiment causes the computation units to (speculatively) start the execution at the same time. In parallel, the controller can decide, or select, which of the computation units can deliver the results faster when the precision of the partial product allows it with no compromise in accuracy.

權(quán)利要求

1
微信群二維碼
意見(jiàn)反饋