IMX219 + NPU real-time object detection on Zero 3W (experimental)

avaf · April 24, 2024, 8:17pm

Looks pretty good, mind sharing your model?

Avinadad_Mendez · April 24, 2024, 8:55pm

Looks good. My imx-219 setup max fps is 30 fps (1920x1080), i know rockchip had a patch for 48 fps, but i could not find it (they deleted if i guess), do you have that patch?

I don’t have it. But such a patch would be to the IMX219 driver I think. You would just require to find the correct register map for 60fps and plug it inside the imx219.c driver ( drivers/media/i2c/imx219.c ) it should be possible to find that register map somewhere. Probably on Alibaba as crazy as it sounds, I sourced my IMX586 from Alibaba and my seller gave me a lot of classified documentation and multiple register maps for different settings

avaf · April 25, 2024, 4:17pm

Zero 3W, Real-time, custom model yolov5n.rknn - 80 objects (1920x1080) result:

Tasks: 157 total,   1 running, 156 sleeping,   0 stopped,   0 zombie
%Cpu(s): 11.5 us,  6.8 sy,  0.0 ni, 81.4 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem :   1977.7 total,   1343.1 free,    222.4 used,    412.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1568.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
   4731 rock       1 -19  510184 109092  84504 D  50.2   5.4   0:10.19 rknn-v4+ 
    291 root      20   0 1081888   8788   4748 S  13.2   0.4   2:17.71 rkaiq_3+ 
    801 rock      20   0  425580  57808  35592 S   9.2   2.9   0:55.45 weston   
   3478 root       0 -20       0      0      0 I   1.3   0.0   0:02.38 kworker+ 
    193 root     -51   0       0      0      0 S   1.0   0.0   0:03.27 irq/30-+ 
   4305 root       0 -20       0      0      0 I   1.0   0.0   0:01.27 kworker+ 
   4765 rock      20   0    7392   3200   2612 R   0.7   0.2   0:00.07 top      
     11 root      20   0       0      0      0 I   0.3   0.0   0:04.39 rcu_sch+ 
   4491 root      20   0       0      0      0 I   0.3   0.0   0:00.14 kworker+ 
   4655 root      20   0       0      0      0 D   0.3   0.0   0:00.15 kworker+ 
      1 root      20   0  166004  10168   7432 S   0.0   0.5   0:04.09 systemd  
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.04 kthreadd 
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp   
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par+ 
      8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_perc+ 
      9 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tas+ 
     10 root      20   0       0      0      0 S   0.0   0.0   0:00.37 ksoftir+

3djelly · April 26, 2024, 8:28pm

On your screenshot you have 18.3 FPS frame rate and 34ms inference time.

Do you have additional post-processing going on outside of the 34ms inference time which is causing the lower FPS rate? As 1000/18.3=54ms suggests to me your code has another 20ms post-processing.

avaf · April 26, 2024, 8:38pm

Drawing TTF text and Rects , i think that is expensive. This is post inference.
Not to mention GPU is slow, or run slow. I don’t have any rk3566 to compare to, only rk3568 which is much, much faster.

Do you have Zero 3W around to check if my findings below are correct?

cat /sys/class/devfreq/fde40000.npu/cur_freq 
900000000
cat /sys/class/devfreq/fde60000.gpu/cur_freq 
400000000

avaf · April 26, 2024, 9:48pm

cat ./devices/system/cpu/cpufreq/policy0/scaling_available_frequencies
408000 600000 816000 1104000 1416000

cat ./devices/system/cpu/cpufreq/policy0/scaling_max_freq
1416000

avaf · April 26, 2024, 10:14pm

I have my DT like this:

cpu0_opp_table: cpu0-opp-table {
  compatible = "operating-points-v2";
  opp-shared;

  mbist-vmin = <825000 900000 950000>;
  nvmem-cells = <&cpu_leakage>, <&core_pvtm>, <&mbist_vmin>, <&cpu_opp_info>;
  nvmem-cell-names = "leakage", "pvtm", "mbist-vmin", "opp-info";
  rockchip,max-volt = <1200000>;
  rockchip,pvtm-voltage-sel = <
   0 84000 0
   84001 87000 1
   87001 91000 2
   91001 100000 3
  >;
  rockchip,pvtm-freq = <408000>;
  rockchip,pvtm-volt = <900000>;
  rockchip,pvtm-ch = <0 5>;
  rockchip,pvtm-sample-time = <1000>;
  rockchip,pvtm-number = <10>;
  rockchip,pvtm-error = <1000>;
  rockchip,pvtm-ref-temp = <40>;
  rockchip,pvtm-temp-prop = <26 26>;
  rockchip,thermal-zone = "soc-thermal";
  rockchip,temp-hysteresis = <5000>;
  rockchip,low-temp = <0>;
  rockchip,low-temp-adjust-volt = <

      0 1992 75000
  >;

  opp-408000000 {
   opp-hz = /bits/ 64 <408000000>;
   opp-microvolt = <850000 850000 1150000>;
   opp-microvolt-L3 = <900000 900000 1150000>;
   clock-latency-ns = <40000>;
  };
  opp-600000000 {
   opp-hz = /bits/ 64 <600000000>;
   opp-microvolt = <850000 850000 1150000>;
   opp-microvolt-L3 = <900000 900000 1150000>;
   clock-latency-ns = <40000>;
  };
  opp-816000000 {
   opp-hz = /bits/ 64 <816000000>;
   opp-microvolt = <850000 850000 1150000>;
   opp-microvolt-L3 = <900000 900000 1150000>;
   clock-latency-ns = <40000>;
   opp-suspend;
  };
  opp-1104000000 {
   opp-hz = /bits/ 64 <1104000000>;
   opp-microvolt = <900000 900000 1150000>;
   opp-microvolt-L0 = <900000 900000 1150000>;
   opp-microvolt-L1 = <850000 850000 1150000>;
   opp-microvolt-L2 = <850000 850000 1150000>;
   opp-microvolt-L3 = <900000 900000 1150000>;
   clock-latency-ns = <40000>;
  };
  opp-1416000000 {
   opp-hz = /bits/ 64 <1416000000>;
   opp-microvolt = <1025000 1025000 1150000>;
   opp-microvolt-L0 = <1025000 1025000 1150000>;
   opp-microvolt-L1 = <975000 975000 1150000>;
   opp-microvolt-L2 = <950000 950000 1150000>;
   opp-microvolt-L3 = <1000000 1000000 1150000>;
   clock-latency-ns = <40000>;
  };
  opp-1608000000 {
   opp-hz = /bits/ 64 <1608000000>;
   opp-microvolt = <1100000 1100000 1150000>;
   opp-microvolt-L0 = <1100000 1100000 1150000>;
   opp-microvolt-L1 = <1050000 1050000 1150000>;
   opp-microvolt-L2 = <1025000 1025000 1150000>;
   opp-microvolt-L3 = <1000000 1000000 1150000>;
   clock-latency-ns = <40000>;
  };
  opp-1800000000 {
   opp-hz = /bits/ 64 <1800000000>;
   opp-microvolt = <1150000 1150000 1150000>;
   opp-microvolt-L0 = <1150000 1150000 1150000>;
   opp-microvolt-L1 = <1100000 1100000 1150000>;
   opp-microvolt-L2 = <1075000 1075000 1150000>;
   opp-microvolt-L3 = <1050000 1050000 1150000>;
   clock-latency-ns = <40000>;
  };
  opp-1992000000 {
   opp-hz = /bits/ 64 <1992000000>;
   opp-microvolt = <1150000 1150000 1150000>;
   opp-microvolt-L0 = <1150000 1150000 1150000>;
   opp-microvolt-L1 = <1150000 1150000 1150000>;
   opp-microvolt-L2 = <1125000 1125000 1150000>;
   opp-microvolt-L3 = <1100000 1100000 1150000>;
   clock-latency-ns = <40000>;
  };
 };

How to set 1.8 Ghz, any idea?

stuartiannaylor · April 26, 2024, 11:48pm

Yeah forgot exactly but there is Rockchip/Radxa kernel code that excludes the 18000000 opp for some reason that can confuse, forgot where I got that info and if rockchip or radxa but it had an if statement.

avaf · April 26, 2024, 11:56pm

But my board is running at 1.4 GHz i think.
Someone here on the forum claimed Joshua’s Ubuntu Image runs at 1.8 GHz on 3W but i don’t have any SD card available to try out. And Ubuntu Desktop is bloated…

stuartiannaylor · April 27, 2024, 12:05am

Dunno Avaf as my memory is terrible but did have it running at 1.8.

kernel/drivers/soc/rockchip/rockchip_opp_select.c

avaf · April 27, 2024, 12:13am

Thanks for the info, i will try to find how to do that.

stuartiannaylor · April 27, 2024, 12:21am

I think you just change the opp table name where opp-1608000000 becomes opp-1600000000 and same with 1.8 Ghz to something else.
Apols for forgetting but something like that as didn’t run a custom kernel

avaf · April 27, 2024, 12:31am

Do you still have your 3W around with Joshua’s Image? If you do, is it possible to post here the running dtb (ziped)? Thank you!

I found about the rk3566t limitation in the code, maybe i need to recompile it.

stuartiannaylor · April 27, 2024, 1:07am

Prob not as they now seem to lack SD cards.
You can hack the code but from the discord conv lower down I make a comment on 1,6ghz and postd the dts changes. I think the OS just loads the opp on ordinal and doesn’t care about opp-name, so it was just a dts to dtb change and no kernel compile needed.
Its just the code that looks for specific opp-names to delete.

Avinadad_Mendez · April 27, 2024, 1:41am

I don’t think CPU usage is the issue, I have around 7-10% CPU usage on my end and I have 28fps-32fps on my 16ms inference model.Tomorrow I’ll try other models

I think this would greatly benefit from multithreading, even if just to run the video display async from the RKNN thread, at least the video would always look smooth even with low inference framerate

EDIT: 50% CPU usage??? Maybe that’s definitely the issue on your end!

Are you using nyanmisaka zero-copy FFMPEG?

Avinadad_Mendez · April 27, 2024, 1:43am

By the way there is always the posibility my 16ms inference time calculation was wrong somehow. I’ll definitely double check on that tomorrow. In any case I am sastified with my current framerate, multithreading the RKNN section would fix any possible issue down the road

avaf · April 27, 2024, 1:30pm

@stuartiannaylor
Following your suggestion i managed to get 1.8 available, thanks.
Let’s see how it performs now.

cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies 
408000 600000 816000 1104000 1416000 1608000 1800000 

cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq
1800000

@Avinadad_Mendez

I have 5 FFmpeg installed, i need to review the code and see what is wrong with that, but 10% CPU usage on X11 is pretty impressive. I could only achieve that with FFmpeg using DRM (no wayland, gbm or X11).
Anyway, FFmpeg converts the buffer to DRM_PRIME buffer i think and i am dealing with and rendering RGB24 buffer. On my first post i had ~ 17% CPU usage.

I need to double-check if i made some mistakes and see if i can run with 1.8 GHz without damaging the board.

The custom rknn model had a performance increase of 20%, your 16 ms may be correct. Thanks.

Avinadad_Mendez · April 27, 2024, 2:09pm

I got 10% with plain X but without any desktop enviroment. With enlightenment i’d probably get around 30%. I’ll test more today

avaf · April 27, 2024, 10:48pm

Board running 6 hrs, CPU freq 1.8 GHz, at least stable.

cat /sys/class/devfreq/fde60000.gpu/cur_freq
800000000
cat /sys/class/devfreq/fde40000.npu/cur_freq
900000000
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
1800000

Temp (on idle):

cat /sys/devices/virtual/thermal/thermal_zone0/temp
55555
cat /sys/devices/virtual/thermal/thermal_zone1/temp
54375

@Avinadad_Mendez
Sorry for the wrong info, rknn-v4l2 is not using ffmpeg, that may be the reason of high CPU usage.

Avinadad_Mendez · April 27, 2024, 11:41pm

No idea honestly. I saw the rknn-v4l2 code and it was very similar to the ff-rknn code. I just did what you told me to in that other thread

You should make sure you’re using Nyanmisaka FFMPEG and Jellyfin-RGA / Jellyfin-MPP just in case