According to one embodiment, the set of computation units comprise a first computation unit associated with the highest bit length of the set of bit lengths, wherein each computation unit of the set of computation units, which is different from the first computation unit, is a replication unit of the first computation unit. Computation unit replication refers to determining one or more copies of the first computation unit. The replication unit is a copy of the first computation unit.
The first computation unit may be associated with the bit length n-bit. Each computation unit of the remaining N?1 computation units may be associated with a distinct bit length n?j-bit where j has a value varying from 1 to n?1.
According to one embodiment, the highest bit length of the set of bit lengths is n-bit, wherein each computation unit of the set of computation units that is associated with a bit length k-bit smaller than n-bit is configured to read the k least significant bits (LSB) of the received input data. This embodiment may enable to provide inputs of each of the computation units from a same loaded data. That is, instead of converting all parameter values from a high precision to low precision, the same input data may be loaded and further used as input to the set of computation units. This may enable a speculation of precision by speculating that the ignored (unread) MSB bits are 0.
According to one embodiment, the hardware accelerator comprises a FPGA, a GPU, an ASIC, a neuromorphic device, or a bit-addressable device.