Changed to OpenCV and got better results… 9.5 ~ 10 FPS, no latency.
IMX219 + NPU real-time object detection on Zero 3W (experimental)
I dumped OpenCV and FFmpeg, optimized the code a bit (still single-threaded), SDK 1.5, and got some improvements 15 FPS (1920x1080), i need to draw some widgets on screen, i don’t think i can get better results than that with a single thread.
./rknn-v4l2 -f v4l2 -p NV12 -s 1920x1080 -i /dev/video0 -x 1920 -y 1080 -b 28 -a 40 -m ./model/RK356X/yolov5s-640-640.rknn
Model: ./model/RK356X/yolov5s-640-640.rknn - size: 7624064.
sdk version: 1.5.0 (e6fe0c678@2023-05-25T08:09:20) driver version: 0.8.8
model input num: 1, output num: 3
model: 640x640x3
INFO: SDL: compiled with=2.30.0 linked against=2.30.0
arm_release_ver of this libmali is 'g2p0-01eac0', rk_so_ver is '10'.
rga_api version 1.3.2_[0] (RGA is compiling with meson base: $PRODUCT_BASE)
loadLabelName ./model/coco_80_labels_list.txt
INFO: Program quit after 13539 ticks
INFO: Stop sensor device
INFO: Close sensor device
INFO: Free resize_buf: 0xffff8973f010
INFO: Destroy renderer: 0xaaaad6fb3e70
INFO: Destroy window: 0xaaaad6fbc280
Free rknn ctx: 187650726442880)
Free model data: 0xffff95792010
Avg FPS: 15.0
Monitoring:
top - 20:05:49 up 7:20, 0 users, load average: 1.07, 0.67, 0.32
Tasks: 157 total, 1 running, 156 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.6 us, 2.1 sy, 0.0 ni, 93.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 1977.7 total, 1337.8 free, 206.0 used, 434.0 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1590.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13311 rock 1 -19 364668 100392 72424 D 17.5 5.0 0:41.35 rknn-v4+
289 root 20 0 1081888 7592 3240 S 12.9 0.4 3:16.71 rkaiq_3+
11 root 20 0 0 0 0 I 0.3 0.0 0:13.48 rcu_sch+
190 root -51 0 0 0 0 S 0.3 0.0 0:07.89 irq/30-+
12165 root 0 -20 0 0 0 I 0.3 0.0 0:00.78 kworker+
12935 root 20 0 0 0 0 I 0.3 0.0 0:01.01 kworker+
13197 root 20 0 0 0 0 I 0.3 0.0 0:00.26 kworker+
13310 root 20 0 7392 2724 2136 R 0.3 0.1 0:01.68 top
13321 root 20 0 0 0 0 I 0.3 0.0 0:00.32 kworker+
1 root 20 0 165964 7780 5032 S 0.0 0.4 0:05.66 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.11 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par+
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_perc+
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tas+
10 root 20 0 0 0 0 S 0.0 0.0 0:01.52 ksoftir+
12 root rt 0 0 0 0 S 0.0 0.0 0:00.32 migrati+
CPU Temp:
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72777
Opengles2:
Hello, Avaf. 15FPS is a great result! Would you mind sharing your code?
These experiments are sponsored for a possible project, I have permission to share the results at this time. If the project doesn’t go ahead and I get the green light to share more code more than it has been done here: IMX415 + NPU demo on ROCK 5B
Follow the results and comparisons that will be carried out here and there if you are more interested.
Hey, but you can become a sponsor too.
Actually i get 19 FPS as a POC but i need to get some real world results to see if its viable.
Low Light condition test, the $10 camera (shipping included) is for close up anyway.
The drawback of such a high FPS for this tiny board, it gets ~85 C Temp (long run). They sent me a big radiator, let’s if it fits and can then reduce the temp.
See how it runs on HDMI USB touch (7 inch):
https://mega.nz/file/4GQWkIwJ#gkVQpYJ6nPlZUBnFmMyN-kE9Wy2--hkw7y2rdWcKDnw
The next step is to check if i can run a second instance and get rid of the cables.
My suggestion to Radxa team, launch a new board, with a similar layout but with rk3568 instead of the rk3566-T , 3 usb-c instead of 2, CSI connector at one end, DSI at the other end. People may complain it would get a bit bigger and would not fit into an rpi zero w case, but i think the current layout does not fit as well (i haven’t try it yet… )
Update:
- include RTC backup
- micro HDMI (optional?? or without it to save space and cool down the temp)
I would be willing to pay $10 more for such a board and use it with 8" or 10" display
And finally, would it be too much to ask for rk3588s on such a board?
Today i got the heatsink installed, it fits like this:
It looks messy but it works, i am glad no shorts occurred.
Running for 30 min… (performance, max npu and gpu freq.)
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/type
soc-thermal
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
64444
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
65000
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
65000
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
66250
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
66875
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
66875
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
67500
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
68125
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
71111
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72222
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72222
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72222
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72777
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72777
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72222
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
72222
It does the job, but if there’s no air flowing through the fins, I think it can get to 75 C and stabilize.
Update:
uptime 2:06 , no airflow or ventilation:
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
75625
Update 2:
uptime 3:27, no airflow or ventilation, temp is stable now:
root@rzero-3w:/home/rock# cat /sys/devices/virtual/thermal/thermal_zone0/temp
76250
How can i get in touch with you to discuss becoming a sponsor ?
yolov5s is way slower than some of the other models out there. You can probably get up to 2x or 2.5x framerate with yolov5n or yolov6n
See this repo and look at the benchmarks
I also did the comparisons myself and I did get a gigantic boost by using yolov5n
I stopped researching / testing inference a while ago, it would be nice to see some benchmarks about this. If you can document your findings and the code you used would be nice.
I expect to get a rock-5c next month so i can review my code on this board with kernel 6.1 and a working CSI, hopefully. Arace seems to take a long time to dispatch the goods, but I asked for special treatment, this could be the reason. Allnet was quick to ship and handle in the same situation.
BTW, the RK3566-T HDMI output is extremely slow, i hope you are not comparing to my results above here.
You can see code and benchmarks on the repo I listed I think I had around 0.02ms inference but I was bottlenecked by CPU (I used OpenCV)
What is Rk3566-T? I assume it is Rk3566. I use MIPI DSI at 93hz
Man, you’re asking for trouble…
Let’s just say it is. By HDMI output i mean the GPU. Slower CPU, slower GPU, and sometimes no NPU.
I mean practical benchmarks not theoretical. I did not see any improvement with yolov8 on rk3588 for example.
My inference times went down significantly when I used yolov5n, to around 0.02s as I said. I do a lot of experimentation everyday so I would have to dig up to find that code. Anyway I’ll share my results when I get the SDL3 code working
Yeah RK3566 is slower than RK3568 but not by a lot. Never heard of any RK3566-T so I don’t know if that’s something new. I am using RK3566. Custom board ( https://imgur.com/a/86vrNDY )
Are you sure about your metric?
30 FPS is 33.33 ms for each frame.
Not quite sure it was many months ago. But it was around that number. In any case I was heavily CPU-bottlenecked back then and pre-processing / post-processing tanked my framerate.
I am working hard on getting the SDL code working and then I’ll share my results
EDIT: sorry I meant 0.02s not 0.02ms. That would be insane
I apparently have 28.4fps on the SDL3 code running on my RK3566
Avg FPS: 28.4
Playback looks very smooth, but I am having an issue, as my camera is apparently running at 37.5fps, or at least that’s what the framerate calculation says when I delete all NPU operations from the program, this means my playback has an ugly 2 second delay, because it is not discarding any frames from the camera and is trying to keep up.
I also have some weird color issues but that may be just be my very old RGA version, I don’t know
Anyway framerate is good but I would really need to slow down the camera to not have delay on the playback
By the way CPU usage is 7%. Awesome! When I tried to use OpenCV with this resolution I got like 10fps and 45% CPU usage, which was a bottleneck as it was single threaded
Looks good. My imx-219 setup max fps is 30 fps (1920x1080), i know rockchip had a patch for 48 fps, but i could not find it (they deleted if i guess), do you have that patch?
For reference, here is my finding regarding Inference:
Inference on bus.jpg, single core, single thread
|Board / model| Yolov5s | Yolov8n |
|-------------|------------|------------|
|Zero 3W |60.970000 ms|52.175000 ms|
|Rock 5B |21.875000 ms|21.251000 ms|
BTW, i posted how to fix the latency, you need to consume the frames buffered by ffmpeg, so 2 seconds at 30 fps will be something like 60 frames, so read 65 frames and discard them, before you start processing it.
I have 16ms-20ms on my custom trained yolov5n model. I don’t think it’s being miscalculated
Inference time = 20 ms
model is NHWC input fmt
So certainly look into switching for a faster model to improve framerate.
I’ll be looking into how to fix the latency as you say
By the way I had to do some weird stuff on to the SDL3 code to get it to work, and it still has a couple of issues on colors. But I’ll fix that later
Looks pretty good, mind sharing your model?