Best option for YOLOv8 object detection?

oliver_something · April 1, 2024, 11:04pm

I’m looking for a Raspberry Pi alternative for a computer vision project. Hoping to run YOLOv8s on-device 640x640 at 15+ FPS.

Would the ROCK 5A 4GB be a good fit for my project? Any other recommendations? Any advice on getting real-time object detection to work well on a Radxa device? tx!

3djelly · August 6, 2024, 4:25am

The RK3588 can handle 720p @ 30 FPS using YOLOv8s no problem.

Jagan · November 8, 2024, 2:35pm

Why would the FPS drop that actual camera stream on 1920x1080 gave 60fps and same shows 20fps with yolov8n. Is it because of OpenCV code or RK3588 NPU?

3djelly · November 10, 2024, 6:11am

Probably because the code you have is processing the frames sequentially. To get 60 FPS you need to process frames in parallel and use a pool of YOLO models across all 3 NPU cores.

Jagan · November 10, 2024, 3:59pm

Do you have any sample deployment code?

But, in my case irrespective camera stream support either 30fps, 60fps the rknn always shows 20fps.

3djelly · November 10, 2024, 6:53pm

The link above is demo/code for implementing parallel processing of frames and pooled Model runtime. As for deploying it, it goes approximately as follows;

Make sure RKNN NPU driver is installed dmesg | grep -i rknpu.
Install Go (depends on OS installed, but can be done via APT).
Install GoCV using vendor instructions.
Run stream server example.

Also if you post the code your using, I could confirm that the 20FPS problem is from sequential processing.

Jagan · November 10, 2024, 7:08pm

Here is the code I’m using, OpenCV, CPP. I did set the 3 RKNPU core which is not there in the code.

3djelly · November 10, 2024, 7:33pm

Yeah that code is processing frames sequentially. Also setting the runtime
to 3 cores using RKNN_NPU_CORE_0_1_2 does little as a single YOLO model does not scale over multiple NPU cores well. You need to run multiple of the same Model in a Pool to get maximum performance out of the NPU.

Here is a CPP example that uses multithreading and pool.

Jagan · November 10, 2024, 8:15pm

=> ./rknn_yolov5_demo model/RK3588/yolov5s-640-640.rknn 11
Loading model...
sdk version: 1.5.2 (c6b7b351a@2023-08-23T15:28:22) driver version: 0.9.6
model input num: 1, output num: 3
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading model...
sdk version: 1.5.2 (c6b7b351a@2023-08-23T15:28:22) driver version: 0.9.6
model input num: 1, output num: 3
model is NHWC input fmt
model input height=640, width=640, channel=3
Loading model...
sdk version: 1.5.2 (c6b7b351a@2023-08-23T15:28:22) driver version: 0.9.6
model input num: 1, output num: 3
model is NHWC input fmt
model input height=640, width=640, channel=3
Average:	 nan fps/s
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
QSettings::value: Empty key passed
QSettings::value: Empty key passed
[ WARN:0@0.638] global ./modules/videoio/src/cap_gstreamer.cpp (1127) open OpenCV | GStreamer warning: Error opening bin: no element "11"
[ WARN:0@0.638] global ./modules/videoio/src/cap_gstreamer.cpp (862) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created

Jagan · November 11, 2024, 12:51pm

Look like we need to add additional code for camera video number.

Jagan · November 11, 2024, 1:45pm

@3djelly Is the tracker code or yolov8 code here can able to take camera input?

3djelly · November 11, 2024, 6:15pm

Yes, it can take a webcamera. And the CPP example would need some adjustment to handle double digit camera device numbers.

The code out there would require adjusting to suit your own needs.

Jagan · November 11, 2024, 6:31pm

I added the camera input changes to CPP, but still not getting 40fps max, out of 60fps

loadLabelName ./model/coco_80_labels_list.txt
120帧内平均帧率:         39.787798 fps/s
120帧内平均帧率:         40.000000 fps/s
120帧内平均帧率:         39.986671 fps/s
120帧内平均帧率:         39.933444 fps/s
120帧内平均帧率:         38.759690 fps/s
120帧内平均帧率:         40.040040 fps/s
120帧内平均帧率:         39.960040 fps/s
120帧内平均帧率:         40.040040 fps/s
120帧内平均帧率:         39.960040 fps/s
120帧内平均帧率:         40.040040 fps/s
120帧内平均帧率:         40.000000 fps/s
120帧内平均帧率:         40.000000 fps/s
120帧内平均帧率:         39.960040 fps/s
120帧内平均帧率:         40.013338 fps/s
120帧内平均帧率:         40.013338 fps/s
120帧内平均帧率:         39.946738 fps/s

3djelly · November 11, 2024, 6:36pm

Increase the thread number to 6 or 9.

Jagan · November 11, 2024, 7:11pm

no change - I tried 6 and 9 with performance enabled as well

3djelly · November 11, 2024, 7:23pm

Try go-rknnlite for comparison.

3djelly · November 12, 2024, 2:24am

I tried using go-rknnlite and it can stream 1080p video at 60 FPS.

It saturates 60% of all three NPU cores, but playback over HTTP is jittery. The performance issues are in the following areas;

Need to stream over wired ethernet connection as Wifi is too slow.
The scaling of video frames from 1920x1080 to the input tensor size of 640x640 uses a lot of CPU resources 80% across all 8 cores of RK3588.
Post processing of YOLOv8 off loads a lot the work to CPU, compared to running YOLOv5 for example.

You will get smoother playback dropping to 720p @ 60 FPS with YOLOv8. Also performance will depend on your application, do you play back locally or need to stream over a network connection?

Jagan · November 12, 2024, 2:51am

Tried yesterday, but found lot of build issues with go. Do we have ore-requisite installation steps for go-rknnlite? where to find the NPU load via sysfs or any tool? I’m looking for network stream but first I can develop locally.

3djelly · November 12, 2024, 2:57am

What OS and version are you using?
I have instructions for Debian Bookworm v6.1 (Radxa official OS).

Jagan · November 12, 2024, 4:43am

Same. Debian 12 6.1