Use YoloV8 in RK3588 NPU

Yeah we are lacking empirical benchmarks of input size and resultant mAP and FPS.
It would seem some conversions bench ms (FPS) but forget to test mAP (Mean Average Precision) or mention input size.

@stuartiannaylor

I managed to get yolov8n (v1.6) working with the 3 cores and compare it to yolov5s (v1.5), so this might interest you.

Note: My observation is from a programmer’s perspective, not from AI programmer’s perspective.
Yolov8 seems to be more accurate but misses some detection for the video sample (cars), thus a bit faster.

The test: X11 (dual-head), libmali, performance

YOLOv5s:

DISPLAY=:0.0 ./rknn_yolov5_demo ./model/RK3588/yolov5s-640-640.rknn ../../../../../videos_rknn/h264.FVDO_Freeway_720p.264 
模型名称:	./model/RK3588/yolov5s-640-640.rknn
Threads:	12
Loading mode...
model is NHWC input fmt
Loading mode...
rga_api version 1.9.1_[4]
loadLabelName ./model/coco_80_labels_list.txt
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
Loading mode...
model is NHWC input fmt
60 frames avg:	90.909091 frames
60 frames avg:	109.689214 frames
60 frames avg:	118.577075 frames

avg:	105.616897 frames

YOLOv8n:

DISPLAY=:0.0 ./rknn_yolov8_demo ./model/RK3588/yolov8n.rknn ../../../../../videos_rknn/h264.FVDO_Freeway_720p.264 
模型名称:	./model/RK3588/yolov8n.rknn
Threads:	12
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
rga_api version 1.9.1_[4]
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
loadLabelName ./model/coco_80_labels_list.txt
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading mode...
model is NHWC input fmt
model input height=640, width=640, channel=3
60 frames avg:	102.389078 frames
60 frames avg:	113.851992 frames
60 frames avg:	112.994350 frames

avg:	108.534776 frames

Regarding the 320 Image resolution, unfortunately i lack some AI background to conduct a real test.

320 doesn’t matter as the models are trained on 640, you can squish the model to 320 and obviously it runs faster, but less accurately.

You can use https://docs.ultralytics.com/modes/val/#introduction to get the mAP of the 640 vs 320 but for same model when altered from the training res of 640 inveriabilly it loses (mean Average Precision)
Likley the ratio will be just the same for the rknn models.

There is a dataset https://cocodataset.org/#home that also includes the images with the metadata of what should be detected. So what you do is download and run and totally forgot how mAP is calculated as its not just if detected but how well the bounding box did https://towardsdatascience.com/breaking-down-mean-average-precision-map-ae462f623a52

I am not sure what avg: 108.534776 frames is it meaning 108 FPS?

I apologise @avaf as I meant to but forgot to test https://github.com/airockchip/rknn_model_zoo

Yes, 108 fps…

how were you able to achieve 108FPS at 640x640? would you mind sharing which changes you made to get yolov8n working on 3 cores? I am using the yolov8 example from rknn_model_zoo, but because they took the DFL part out of the model, the post-processing is slow and in total (inference + post processing) it takes around 30-40ms per frame (in python).

First off, you should try to use multi-threading (example found here). It didn’t yield great results for me until I tried to lower python’s niceness. I did it via terminal (man nice), but you might as well try to use something like os.nice() within python itself.