ROCK 5B Debug Party Invitation

gnattu · October 27, 2022, 2:31am

To explain the rendering stack is hard, but I will try to explain it as simple.

The libmali we are having shipped with Radxa Debian only implements OpenGL ES, not the desktop OpenGL that most of the desktop environments requires, and thus the X11 desktop is rendered with llvmpipe. Wayland supports EGL backend so you can try weston and you will notice significantly smoother desktop.

Hardware accelerated video playback is provided by a different driver (not part of the gpu). Although on a regular PC the video codec accelerators comes with the GPU but they are actually two separate subsystems. It is working on Radxa debian using the pre-installed chromium which applied patches, and the mpv pre-installed.

NGBRO · October 27, 2022, 8:58am

Oh right… I’ll give it a try when I have time.

However, are there any flags I need to set in Chromium in order to get it working? It doesn’t seem to work “out of the box” for me, as I see high CPU usage when I play YT videos.

gnattu · October 27, 2022, 9:18am

It should work “out of the box” for h264, vp8 and vp9 videos as that version included v4l2 plugin, you can check if that YT video is av1 which is preferred by google nowadays but the hardware support is still lacking.

stuartiannaylor · October 27, 2022, 10:58am

Just wondered if you guys knew the state of play with the Radxa image and the NPU as haven’t tried due to dmesg.

[    6.438392] RKNPU fdab0000.npu: Adding to iommu group 0
[    6.438641] RKNPU fdab0000.npu: RKNPU: rknpu iommu is enabled, using iommu mode
[    6.438856] RKNPU fdab0000.npu: Looking up rknpu-supply from device tree
[    6.439767] RKNPU fdab0000.npu: Looking up mem-supply from device tree
[    6.440383] RKNPU fdab0000.npu: can't request region for resource [mem 0xfdab0000-0xfdabffff]
[    6.440429] RKNPU fdab0000.npu: can't request region for resource [mem 0xfdac0000-0xfdacffff]
[    6.440456] RKNPU fdab0000.npu: can't request region for resource [mem 0xfdad0000-0xfdadffff]
[    6.442727] RKNPU fdab0000.npu: Looking up rknpu-supply from device tree
[    6.443276] RKNPU fdab0000.npu: Looking up mem-supply from device tree
[    6.444370] RKNPU fdab0000.npu: leakage=16
[    6.444412] RKNPU fdab0000.npu: Looking up rknpu-supply from device tree
[    6.451294] RKNPU fdab0000.npu: pvtm=911
[    6.455335] RKNPU fdab0000.npu: pvtm-volt-sel=5
[    6.456469] RKNPU fdab0000.npu: avs=0
[    6.456672] RKNPU fdab0000.npu: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=800000000 h_table=0
[    6.466163] RKNPU fdab0000.npu: failed to find power_model node
[    6.466174] RKNPU fdab0000.npu: RKNPU: failed to initialize power model
[    6.466183] RKNPU fdab0000.npu: RKNPU: failed to get dynamic-coefficient

Also forgot what the Mali blob is called but do any of you think this might work as also would love to get a bench of something with the G610 doing some ML.

https://developer.arm.com/documentation/102603/2108/Device-specific-installation/Install-on-Odroid-N2-Plus

Is it just libmali-valhall-g610-g6p0-x11 - Mali GPU User-Space Binary Drivers I need to install on Ubuntu as my preorder will likely arrive soon and thinking of some ML projects to play with.
I really wish RKNN-Toolkit had a tensorflow delegate but prob give that a whirl as well but would love to be able to partition model across CPU/GPU/NPU purely as a bench.

I have been checking out OpenAi’s Whisper that is an monster of a model but on CPU its 5x a Pi4b.

amazingfate · October 28, 2022, 8:03am

@jack please take a look at this issue. My board also get dead when running at DRAM frequency 2112 MHz. This is the output of command dmesg |grep volt-sel:

[    4.613369] cpu cpu0: pvtm-volt-sel=4
[    4.624359] cpu cpu4: pvtm-volt-sel=4
[    4.636125] cpu cpu6: pvtm-volt-sel=4
[    4.896723] rockchip-dmc dmc: leakage-volt-sel=0
[    4.897287] mali fb000000.gpu: pvtm-volt-sel=3
[    5.104413] RKNPU fdab0000.npu: pvtm-volt-sel=4

My dmc has a very low volt sel. Now I have to decrease my DRAM freq to 1560 MHz for daily use.

icecream95 · October 28, 2022, 8:41am

I have leakage-volt-sel=1 for DMC.

Note that lower values mean that a higher voltage is used… maybe your SoC does not even meet the requirements for L0 and so would like even higher voltages. But then why do I also have problems, when my SoC is good enough to use the lower L1 voltages? So perhaps that isn’t the problem.

One thing I find odd, is that the voltage for 2112 MHz actually comes from the 2750 MHz OPP, though the difference in voltage between 1560 MHz and “2750” MHz seems roughly equal to that between 1068 MHz and 1560 MHz.

For an idea of the performance gained by moving from 1560 MHz to 2112 MHz, it makes compilation of Mesa about 5% faster. Different tasks may see different impacts, 10% or even higher could be realistic.

amazingfate · October 28, 2022, 8:56am

clk 2112MHz should come from scmi. I did tried to increase the voltage to 1.0V but the board still get hanged. We both have v1.3 boards, let’s wait to see if v1.42 boards have this issue.

tkaiser · October 28, 2022, 9:43am

Are you aware a new BL31 BLOB is flying around?

amazingfate · October 28, 2022, 11:20am

I just bulilt a new u-boot with bl31 v1.28 and ddr v1.08, now my system doesn’t hang with ddr freq 2112 MHz. Thank you for informing this! I will create a pull request to armbian to update these binaries. @icecream95 you can try building u-boot with the new binaries: https://github.com/radxa/rkbin/tree/master/bin/rk35.

tkaiser · October 28, 2022, 12:24pm

Would be interesting to get sbc-bench outputs from before and after with otherwise identical settings (you were a bit into ‘overclocking’ the A76?) to check how numbers differ (especially ramlat / tinymembench).

solaris3308 · October 30, 2022, 10:54pm

I have see much about pvtm，but I still can’t undersatnd the relationship of pvtm,leakage and “/sys/kernel/debug/pvtm/*/value”.And how they are worked together.TRM contains no description of mechanism.Can you offer me some info as following command:
dmesg | grep cpu.cpu
dmesg | grep dmc

tkaiser · October 31, 2022, 6:17am

TRM part 2, chapters 17 and 18.

This change Radxa did recently is just cosmetics BTW. Clueless users will be happy that cpufreq scaling now lists the 2400 MHz OPP while an MCU inside the SoC still rejects higher clockspeeds. All that changes is the difference between reported and real clockspeeds

Before:

Cpufreq OPP: 2256    Measured: 2250 (2250.953/2250.855/2250.806)

After:

Cpufreq OPP: 2400    Measured: 2250 (2250.512/2250.316/2249.826)     (-6.2%)

To get higher clockspeeds higher supply voltage is needed…

willy · October 31, 2022, 1:38pm

I continue to think that we’re missing something here. My impression is too that (at least for stability) the voltage has to be increased. I don’t think the MCU is very smart, at least because if it’s too smart it becomes bogus and can cause serious trouble resulting in unfixable chips. It’s possible that the MCU has access to the PVTM values itself and has the equivalent of a copy of the OPP tables, but I strongly doubt it as it would be a pain to maintain. Maybe it just applies a well-defined operation between the configured (not measured) voltage, the requested frequency and the PVTM values, and enforces a limit to the configured frequency. In this case maybe increasing the voltage a bit would solve it by skewing the operation. That would explain why from the beginning we’ve measured different frequencies than configured for the topmost opps. But it would not be difficult to add 2 lines about this in the TRM indicating that configured frequencies might be trimmed by the internal MCU based on PVTM and voltage…

stuartiannaylor · October 31, 2022, 1:46pm

./mhz 10 10000
count=169252 us50=3605 us250=18032 diff=14427 cpu_MHz=2346.323
count=169252 us50=3606 us250=18034 diff=14428 cpu_MHz=2346.160
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3607 us250=18037 diff=14430 cpu_MHz=2345.835
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3606 us250=18037 diff=14431 cpu_MHz=2345.673
count=169252 us50=3606 us250=18036 diff=14430 cpu_MHz=2345.835
count=169252 us50=3606 us250=18039 diff=14433 cpu_MHz=2345.347

Which isn’t that far away but the top opp is now in operation as before where you not changing the top opp value but until Radxa made changes that was never being used?

solaris3308 · October 31, 2022, 2:15pm

Thanks tkaiser!
But I have kown it for a while.I suspect it’s maybe throttled by cpufreq driver.Which write some thing to some reg.
“TRM part 2, chapters 17 and 18” only description hardware logic but I need software mechanism.
No manual to read!But I want to collect some data to work it out.
So, If anyone have a board,please show me the result of “dmesg | grep cpu.cpu”
By the way," cosmetics BTW"

stuartiannaylor · October 31, 2022, 2:16pm

dmesg | grep cpu.cpu
[    6.107499] cpu cpu0: leakage=20
[    6.107522] cpu cpu0: Looking up cpu-supply from device tree
[    6.108989] cpu cpu0: pvtm=1538
[    6.109078] cpu cpu0: pvtm-volt-sel=6
[    6.109105] cpu cpu0: Looking up cpu-supply from device tree
[    6.109224] cpu cpu0: Looking up mem-supply from device tree
[    6.109559] cpu cpu4: leakage=16
[    6.109575] cpu cpu4: Looking up cpu-supply from device tree
[    6.116052] cpu cpu4: pvtm=1770
[    6.119982] cpu cpu4: pvtm-volt-sel=6
[    6.120007] cpu cpu4: Looking up cpu-supply from device tree
[    6.120500] cpu cpu4: Looking up mem-supply from device tree
[    6.121221] cpu cpu6: leakage=16
[    6.121236] cpu cpu6: Looking up cpu-supply from device tree
[    6.127726] cpu cpu6: pvtm=1760
[    6.131676] cpu cpu6: pvtm-volt-sel=6
[    6.131700] cpu cpu6: Looking up cpu-supply from device tree
[    6.132191] cpu cpu6: Looking up mem-supply from device tree
[    6.133280] cpu cpu0: avs=0
[    6.133961] cpu cpu4: avs=0
[    6.134633] cpu cpu6: avs=0
[    6.134782] cpu cpu0: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[    6.134859] cpu cpu0: EM: created perf domain
[    6.134908] cpu cpu0: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=1608000000 h_table=0
[    6.135323] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 2 >= em_perf_state1
[    6.135328] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[    6.135333] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 5 >= em_perf_state4
[    6.135442] cpu cpu4: EM: created perf domain
[    6.135481] cpu cpu4: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=2208000000 h_table=0
[    6.143627] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 2 >= em_perf_state1
[    6.143633] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[    6.143639] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 5 >= em_perf_state4
[    6.143769] cpu cpu6: EM: created perf domain
[    6.143860] cpu cpu6: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=2208000000 h_table=0

tkaiser · October 31, 2022, 3:05pm

This actually happens, so far being limited to RK3588S thingies. This here is another Khadas Edge2 where PVTM gone wrong and the A76 were limited to below 400 MHz: http://ix.io/4esV

It’s a weak SoC of course:

       cpu cpu0: pvtm=1419
       cpu cpu0: pvtm-volt-sel=1
       cpu cpu4: pvtm=1656
       cpu cpu4: pvtm-volt-sel=3
       cpu cpu6: pvtm=1638
       cpu cpu6: pvtm-volt-sel=2

Benchmark execution started at 40.7°C SoC temperature, it then looked like this:

cpu4-cpu5 (Cortex-A76): Cpufreq OPP: 2352    Measured: 2209 (-6.1%)
cpu6-cpu7 (Cortex-A76): Cpufreq OPP: 2304    Measured: 2186 (-5.1%)

After benchmark execution the SoC temperature is at 46.2°C (so Khadas active fansink must be in use and the idle temperature is already insanely high) and mhz reports this:

cpu4-cpu5 (Cortex-A76): Cpufreq OPP: 2352    Measured:  394 (-83.2%)
cpu6-cpu7 (Cortex-A76): Cpufreq OPP: 2304    Measured: 2177 (-5.5%)

So the 2nd A76 cluster just has recovered while the 1st is still on lowest clockspeed possible. But the 7-zip scores proof that the A76 were almost all the time on below 400 MHz: 5468,6170,5766 (RK3588 with performance DMC governor and the A76 at 2.4 GHz scores above 16500, with dmc_ondemand governor w/o appropriate up_treshold it’s below 14500).

@solaris3308 dmesg | grep cpu.cpu | ix --> http://ix.io/4eBN

solaris3308 · October 31, 2022, 3:10pm

I have find out a very simple way to modify working freq!
Look this :
firefly@firefly2:~$ ./mhz 20 50000
count=516515 us50=11032 us250=55182 diff=44150 cpu_MHz=2339.819
count=516515 us50=11038 us250=55186 diff=44148 cpu_MHz=2339.925
count=516515 us50=11034 us250=55184 diff=44150 cpu_MHz=2339.819
count=516515 us50=11035 us250=55184 diff=44149 cpu_MHz=2339.872
count=516515 us50=11045 us250=55192 diff=44147 cpu_MHz=2339.978
count=516515 us50=11036 us250=55190 diff=44154 cpu_MHz=2339.607
count=516515 us50=11035 us250=55191 diff=44156 cpu_MHz=2339.501
count=516515 us50=11038 us250=55187 diff=44149 cpu_MHz=2339.872
^C
firefly@firefly2:~$ ./mhz 20 50000
count=516515 us50=11449 us250=57273 diff=45824 cpu_MHz=2254.343
count=516515 us50=11452 us250=57280 diff=45828 cpu_MHz=2254.146
count=516515 us50=11453 us250=57282 diff=45829 cpu_MHz=2254.097
count=516515 us50=11454 us250=57286 diff=45832 cpu_MHz=2253.949
count=516515 us50=11456 us250=57285 diff=45829 cpu_MHz=2254.097
count=516515 us50=11457 us250=57290 diff=45833 cpu_MHz=2253.900
count=516515 us50=11458 us250=57279 diff=45821 cpu_MHz=2254.490
count=516515 us50=11455 us250=57285 diff=45830 cpu_MHz=2254.048

I only modify big0’s opp.So only big0 reach freq 2239,which only reach 2254 before.
But I still need much time to test.
By the way.pvtm is NOT the larger the better.Look this:
6.273919] cpu cpu4: Looking up cpu-supply from device tree
[ 6.275608] cpu cpu4: bin=0
[ 6.275782] cpu cpu4: leakage=9
[ 6.275803] cpu cpu4: Looking up cpu-supply from device tree
[ 6.282314] cpu cpu4: pvtm=1636
[ 6.286295] cpu cpu4: pvtm-volt-sel=2
[ 6.286340] cpu cpu4: Looking up cpu-supply from device tree
[ 6.286838] cpu cpu4: Looking up mem-supply from device tree
[ 6.287385] cpu cpu6: Looking up cpu-supply from device tree
[ 6.289046] cpu cpu6: bin=0
[ 6.289222] cpu cpu6: leakage=9
[ 6.289242] cpu cpu6: Looking up cpu-supply from device tree
[ 6.295718] cpu cpu6: pvtm=1696
[ 6.299666] cpu cpu6: pvtm-volt-sel=4
[ 6.299698] cpu cpu6: Looking up cpu-supply from device tree
[ 6.300202] cpu cpu6: Looking up mem-supply from device tree

Stat_headcrabed · October 31, 2022, 4:39pm

check this: https://gitlab.com/rk3588_linux/linux/bsp/docs/-/blob/linux-5.10/Common/DVFS/Rockchip_Developer_Guide_CPUFreq_EN.pdf

willy · October 31, 2022, 4:54pm

I’ve just done something ugly, I’ve generated 169 opps that cover frequencies from 2208 to 2784 in 48 MHz steps, and voltages from 900mV to 1050mV. The opps are called 1000000 + <mv100> so for example 2208 MHz at 912.5 mV is called “2208912500” and is reported as “2208912” in the frequency table (which truncates to kHz). This way all values are different and I can manually select the combination of voltage and frequency by switching to the userspace governor. I restricted all of them to turbo mode so that they’re not selected by default, which lets me first switch to userspace before setting 1 into boost_mode.

The script I used to produce them is the following:

for ((o=2208; o<=2800; o+=48)); do for ((v=900000; v<=1050000; v+=12500)); do printf "opp-$((o*1000000+v/10)) {\n";printf "\topp-supported-hw = <0xff 0xffff>;\n"; printf "\topp-hz = <0 $((o*1000000+v/10))>;\n"; printf "\topp-microvolt = <$v $v $v $v $v $v>;\n"; printf "\tclock-latency-ns = <40000>;\n"; printf "\tturbo-mode;\n"; printf "};\n"; done; done

What’s particularly interesting is that now I have the proof that the configured voltage affects the frequency, and in addition some voltages are rejected (it’s possible that I did something wrong, I noticed in the DTS that there are 6 values in the microvolt entry and have set them all to the same value but some seem to still have an upper bound of 1V (3 and 6)). Hmm often the first change produces an error and doing it again makes it accepted.

### 900 mV
# echo 2208090 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed
# taskset -c 4 ~rock/mhz/mhz 100
count=516515 us50=12364 us250=61832 diff=49468 cpu_MHz=2088.279
count=516515 us50=12366 us250=61830 diff=49464 cpu_MHz=2088.448
count=516515 us50=12366 us250=61833 diff=49467 cpu_MHz=2088.322

### 912 mV
# echo 2208091 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=12216 us250=61089 diff=48873 cpu_MHz=2113.703
count=516515 us50=12218 us250=61092 diff=48874 cpu_MHz=2113.660
count=516515 us50=12217 us250=61092 diff=48875 cpu_MHz=2113.616

### 925mV
# echo 2208092 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
root@rock-5b:~# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=12072 us250=60367 diff=48295 cpu_MHz=2139.000
count=516515 us50=12072 us250=60368 diff=48296 cpu_MHz=2138.956
count=516515 us50=12073 us250=60367 diff=48294 cpu_MHz=2139.044

### 937 mV
# echo 2208093 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=11936 us250=59683 diff=47747 cpu_MHz=2163.550
count=516515 us50=11937 us250=59691 diff=47754 cpu_MHz=2163.232
count=516515 us50=11938 us250=59694 diff=47756 cpu_MHz=2163.142

### 950mV
# echo 2208095 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=11805 us250=59036 diff=47231 cpu_MHz=2187.186
count=516515 us50=11806 us250=59036 diff=47230 cpu_MHz=2187.233
count=516515 us50=11808 us250=59039 diff=47231 cpu_MHz=2187.186

### 962mV
# echo 2208096 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=11683 us250=58424 diff=46741 cpu_MHz=2210.115
count=516515 us50=11684 us250=58426 diff=46742 cpu_MHz=2210.068
count=516515 us50=11686 us250=58432 diff=46746 cpu_MHz=2209.879

### 975mV
# echo 2208097 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=11568 us250=57846 diff=46278 cpu_MHz=2232.227
count=516515 us50=11569 us250=57849 diff=46280 cpu_MHz=2232.131
count=516515 us50=11569 us250=57857 diff=46288 cpu_MHz=2231.745

### 987mV
# echo 2208098 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=11457 us250=57291 diff=45834 cpu_MHz=2253.851
count=516515 us50=11459 us250=57298 diff=45839 cpu_MHz=2253.605
count=516515 us50=11459 us250=57300 diff=45841 cpu_MHz=2253.507

### 1000mV
# echo 2208100 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=11350 us250=56751 diff=45401 cpu_MHz=2275.346
count=516515 us50=11350 us250=56754 diff=45404 cpu_MHz=2275.196
count=516515 us50=11351 us250=56759 diff=45408 cpu_MHz=2274.996

Any voltages above 1000mV are rejected:

[ 1334.294298] vdd_cpu_big0_s0: Restricting voltage, 1012500-1000000uV
[ 1334.294429] vdd_cpu_big0_s0: Restricting voltage, 1012500-1000000uV
[ 1334.294447] cpu cpu4: rockchip_cpufreq_set_volt: failed to set voltage (1012500 1012500 1012500 uV): -22
[ 1334.294465] cpufreq: __target_index: Failed to change cpu frequency: -22

Trying different frequencies seem to give me the same frequency as the ones provided by 2208 MHz at the same voltage:

### 2352 at 1000mV:
# echo 2352100 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed 
# taskset -c 4 ~rock/mhz/mhz 100 
count=516515 us50=11348 us250=56752 diff=45404 cpu_MHz=2275.196
count=516515 us50=11351 us250=56761 diff=45410 cpu_MHz=2274.895
count=516515 us50=11351 us250=56761 diff=45410 cpu_MHz=2274.895

Also, trying to set higher frequencies than accepted is rejected as well:

# echo 2400100 > /sys/devices/system/cpu/cpufreq/policy4/scaling_setspeed
[ 1659.362498] cpu cpu4: cpu_opp_helper: failed to set clk rate: -22
[ 1659.369293] cpufreq: __target_index: Failed to change cpu frequency: -22

Maybe the whole thing is only controlled by the voltage and the PVTM values, which would explain why we’ve been seeing higher and lower frequencies than requested, and different values on different boards.

I’m possibly missing something but now it proves that the configured voltage does have an impact on the effective frequency.