I have completed a C++ example running a YOLOv8 model on the NPU, it shows you the full process of how to achieve inference from the source pytorch model to end object detection results.
In comparison to CIX’s python code we get significant performance improvement.
Timing | Python | C++ |
---|---|---|
Setting input tensors | 17.22ms | 3.07ms |
Inference pass on NPU | 55.22ms | 55.54ms |
Retrieving output tensors | 42.57ms | 6.72ms |
Total time | 115.01ms | 65.33ms |
I have also outlined how to work out what the magic numbers are for quantization with the CIX compiler so you can find them for other sized models.