I have read Teacher nihui’s article:
https://aijishu.com/a/1060000000503889
I am also curious, can the computing power of Radxa Orion O6 NPU reach the nominal 28.8 TOPS?
Constructing a high computational density conv model
According to nihui’s suggestion, construct an onnx model mainly based on convolution and use onnx_tool to count the number of MACs.
import onnx_tool
onnx_tool.model_profile('./model.onnx')
In this model:
Forward_MACs=36270243840, so:
OPs=MACs * 2=72540.48768 (MOPs)
Inference with NPU backend, aploy on Radxa Orion O6 board
The board measured FPS is 277 (frame per second)
OPs: 72540.48768 (MOPs), it is equal to onnx_tool’s value.
The formula for calculating NPU computing power is FPS * OPs,
277.01631332144507 (frame/s) * 72540.48768 (Mops)=20.094898463653305 (Tops)
Why hasn’t it reached 28.8Tops?
At this point, it is speculated that it is related to the underutilization of MAC.
Profiling report data
Using the profiling tool provided by the SDK.
The performance of the model was also tested as follows:
The FPS for profiling is 306.7 (frame per second)
OPs are 72509 (MOPs)
At this point, calculate FPS * OPs according to the computing power formula:
306.70377782378335 (frame/s) * 72509.0304 (Mops)=22.23879355 (Tops)
Average MAC Utilization is 75.4%
Calculate the maximum computing power of NPU
Calculate the maximum computing power of NPU based on the numerical values in the Profiting report:
22.23879355/75.4%=29.5(Tops)