The diagram of FIG. 5 shows data paths associated with the components involved in the computation of the data operation. The data path 501 indicates the time at which the input operands are ready in the DMA loads 321 and 322. The data operation starts at the 8-bit GEMM and 3-bit GEMM at the same time t0 as indicated by data paths 502 and 504 respectively. The data paths 503 and 505 indicate the time at which the 8-bit GEMM and 3-bit GEMM are ready respectively. The data path 506 indicates that the selector 317 selects the 3-bit GEMM at time t1 before any of the 8-bit GEMM and 3-bit GEMM was ready. Thus, the controller 315 may be ready at the same time the selected 3-bit GEMM was ready. This is indicated in data path 507. The result of the data operation may be provided at time t2 as output of the controller 315 and shown in data path 508. As shown in FIG. 5, the present invention may enable to gain the time difference between time t2 and the time t3 at which the 8-bit GEMM was ready.
FIG. 6 is a flowchart of a method for performing a computation task using a hardware accelerator in accordance with an embodiment of the present invention. The computation task may be an inference of a neural network. The hardware accelerator may be an FPGA tensor accelerator. The hardware accelerator may comprise a first computation unit that is configured to perform operations of the computation task with a full precision, e.g., 8-bit. For the purpose of explanation, the method may be implemented in the hardware acceleration system 100 illustrated in previous FIGS. 1-2 but is not limited to this implementation.