Radxa Orion O6 NPU Computing Power Evaluation

I have read Teacher nihui’s article:
https://aijishu.com/a/1060000000503889


I am also curious, can the computing power of Radxa Orion O6 NPU reach the nominal 28.8 TOPS?

Constructing a high computational density conv model

According to nihui’s suggestion, construct an onnx model mainly based on convolution and use onnx_tool to count the number of MACs.

import onnx_tool

onnx_tool.model_profile('./model.onnx')


In this model:
Forward_MACs=36270243840, so:
OPs=MACs * 2=72540.48768 (MOPs)

Inference with NPU backend, aploy on Radxa Orion O6 board

image
The board measured FPS is 277 (frame per second)
OPs: 72540.48768 (MOPs), it is equal to onnx_tool’s value.

The formula for calculating NPU computing power is FPS * OPs,
277.01631332144507 (frame/s) * 72540.48768 (Mops)=20.094898463653305 (Tops)

Why hasn’t it reached 28.8Tops?
At this point, it is speculated that it is related to the underutilization of MAC.

Profiling report data

Using the profiling tool provided by the SDK.
The performance of the model was also tested as follows:


The FPS for profiling is 306.7 (frame per second)
OPs are 72509 (MOPs)

At this point, calculate FPS * OPs according to the computing power formula:
306.70377782378335 (frame/s) * 72509.0304 (Mops)=22.23879355 (Tops)
Average MAC Utilization is 75.4%

Calculate the maximum computing power of NPU

Calculate the maximum computing power of NPU based on the numerical values in the Profiting report:
22.23879355/75.4%=29.5(Tops)