Best option for YOLOv8 object detection?

I’m looking for a Raspberry Pi alternative for a computer vision project. Hoping to run YOLOv8s on-device 640x640 at 15+ FPS.

Would the ROCK 5A 4GB be a good fit for my project? Any other recommendations? Any advice on getting real-time object detection to work well on a Radxa device? tx!

The RK3588 can handle 720p @ 30 FPS using YOLOv8s no problem.

Why would the FPS drop that actual camera stream on 1920x1080 gave 60fps and same shows 20fps with yolov8n. Is it because of OpenCV code or RK3588 NPU?

Probably because the code you have is processing the frames sequentially. To get 60 FPS you need to process frames in parallel and use a pool of YOLO models across all 3 NPU cores.

Do you have any sample deployment code?

But, in my case irrespective camera stream support either 30fps, 60fps the rknn always shows 20fps.

The link above is demo/code for implementing parallel processing of frames and pooled Model runtime. As for deploying it, it goes approximately as follows;

  1. Make sure RKNN NPU driver is installed dmesg | grep -i rknpu.
  2. Install Go (depends on OS installed, but can be done via APT).
  3. Install GoCV using vendor instructions.
  4. Run stream server example.

Also if you post the code your using, I could confirm that the 20FPS problem is from sequential processing.

Here is the code I’m using, OpenCV, CPP. I did set the 3 RKNPU core which is not there in the code.

Yeah that code is processing frames sequentially. Also setting the runtime
to 3 cores using RKNN_NPU_CORE_0_1_2 does little as a single YOLO model does not scale over multiple NPU cores well. You need to run multiple of the same Model in a Pool to get maximum performance out of the NPU.

Here is a CPP example that uses multithreading and pool.

=> ./rknn_yolov5_demo model/RK3588/yolov5s-640-640.rknn 11
Loading model...
sdk version: 1.5.2 (c6b7b351a@2023-08-23T15:28:22) driver version: 0.9.6
model input num: 1, output num: 3
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading model...
sdk version: 1.5.2 (c6b7b351a@2023-08-23T15:28:22) driver version: 0.9.6
model input num: 1, output num: 3
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading model...
sdk version: 1.5.2 (c6b7b351a@2023-08-23T15:28:22) driver version: 0.9.6
model input num: 1, output num: 3
model is NHWC input fmt
model input height=640, width=640, channel=3
Average:	 nan fps/s
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
QSettings::value: Empty key passed
QSettings::value: Empty key passed
[ WARN:0@0.638] global ./modules/videoio/src/cap_gstreamer.cpp (1127) open OpenCV | GStreamer warning: Error opening bin: no element "11"
[ WARN:0@0.638] global ./modules/videoio/src/cap_gstreamer.cpp (862) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created

Look like we need to add additional code for camera video number.

@3djelly Is the tracker code or yolov8 code here can able to take camera input?

Yes, it can take a webcamera. And the CPP example would need some adjustment to handle double digit camera device numbers.

The code out there would require adjusting to suit your own needs.

I added the camera input changes to CPP, but still not getting 40fps max, out of 60fps

loadLabelName ./model/coco_80_labels_list.txt
120帧内平均帧率:         39.787798 fps/s
120帧内平均帧率:         40.000000 fps/s
120帧内平均帧率:         39.986671 fps/s
120帧内平均帧率:         39.933444 fps/s
120帧内平均帧率:         38.759690 fps/s
120帧内平均帧率:         40.040040 fps/s
120帧内平均帧率:         39.960040 fps/s
120帧内平均帧率:         40.040040 fps/s
120帧内平均帧率:         39.960040 fps/s
120帧内平均帧率:         40.040040 fps/s
120帧内平均帧率:         40.000000 fps/s
120帧内平均帧率:         40.000000 fps/s
120帧内平均帧率:         39.960040 fps/s
120帧内平均帧率:         40.013338 fps/s
120帧内平均帧率:         40.013338 fps/s
120帧内平均帧率:         39.946738 fps/s

Increase the thread number to 6 or 9.

no change - I tried 6 and 9 with performance enabled as well

Try go-rknnlite for comparison.

I tried using go-rknnlite and it can stream 1080p video at 60 FPS.

It saturates 60% of all three NPU cores, but playback over HTTP is jittery. The performance issues are in the following areas;

  1. Need to stream over wired ethernet connection as Wifi is too slow.
  2. The scaling of video frames from 1920x1080 to the input tensor size of 640x640 uses a lot of CPU resources 80% across all 8 cores of RK3588.
  3. Post processing of YOLOv8 off loads a lot the work to CPU, compared to running YOLOv5 for example.

You will get smoother playback dropping to 720p @ 60 FPS with YOLOv8. Also performance will depend on your application, do you play back locally or need to stream over a network connection?

Tried yesterday, but found lot of build issues with go. Do we have ore-requisite installation steps for go-rknnlite? where to find the NPU load via sysfs or any tool? I’m looking for network stream but first I can develop locally.

What OS and version are you using?
I have instructions for Debian Bookworm v6.1 (Radxa official OS).

Same. Debian 12 6.1