USB3’s work frequency is near to 2.4Ghz, which is used by WiFi and Bluetooth. So it’s obvious they would interference without protection
ROCK 5B Debug Party Invitation
Just tested my production unit on Radxa Debian 20221024 build gotten off Github, and noticed that the renderer is llvmpipe. Is there any hardware-accelerated graphics and video playback yet in any of the official images? Or am I missing something?
To explain the rendering stack is hard, but I will try to explain it as simple.
The libmali we are having shipped with Radxa Debian only implements OpenGL ES, not the desktop OpenGL that most of the desktop environments requires, and thus the X11 desktop is rendered with llvmpipe. Wayland supports EGL backend so you can try weston and you will notice significantly smoother desktop.
Hardware accelerated video playback is provided by a different driver (not part of the gpu). Although on a regular PC the video codec accelerators comes with the GPU but they are actually two separate subsystems. It is working on Radxa debian using the pre-installed chromium which applied patches, and the mpv pre-installed.
Oh right… I’ll give it a try when I have time.
However, are there any flags I need to set in Chromium in order to get it working? It doesn’t seem to work “out of the box” for me, as I see high CPU usage when I play YT videos.
It should work “out of the box” for h264, vp8 and vp9 videos as that version included v4l2 plugin, you can check if that YT video is av1 which is preferred by google nowadays but the hardware support is still lacking.
Just wondered if you guys knew the state of play with the Radxa image and the NPU as haven’t tried due to dmesg.
[ 6.438392] RKNPU fdab0000.npu: Adding to iommu group 0
[ 6.438641] RKNPU fdab0000.npu: RKNPU: rknpu iommu is enabled, using iommu mode
[ 6.438856] RKNPU fdab0000.npu: Looking up rknpu-supply from device tree
[ 6.439767] RKNPU fdab0000.npu: Looking up mem-supply from device tree
[ 6.440383] RKNPU fdab0000.npu: can't request region for resource [mem 0xfdab0000-0xfdabffff]
[ 6.440429] RKNPU fdab0000.npu: can't request region for resource [mem 0xfdac0000-0xfdacffff]
[ 6.440456] RKNPU fdab0000.npu: can't request region for resource [mem 0xfdad0000-0xfdadffff]
[ 6.442727] RKNPU fdab0000.npu: Looking up rknpu-supply from device tree
[ 6.443276] RKNPU fdab0000.npu: Looking up mem-supply from device tree
[ 6.444370] RKNPU fdab0000.npu: leakage=16
[ 6.444412] RKNPU fdab0000.npu: Looking up rknpu-supply from device tree
[ 6.451294] RKNPU fdab0000.npu: pvtm=911
[ 6.455335] RKNPU fdab0000.npu: pvtm-volt-sel=5
[ 6.456469] RKNPU fdab0000.npu: avs=0
[ 6.456672] RKNPU fdab0000.npu: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=800000000 h_table=0
[ 6.466163] RKNPU fdab0000.npu: failed to find power_model node
[ 6.466174] RKNPU fdab0000.npu: RKNPU: failed to initialize power model
[ 6.466183] RKNPU fdab0000.npu: RKNPU: failed to get dynamic-coefficient
Also forgot what the Mali blob is called but do any of you think this might work as also would love to get a bench of something with the G610 doing some ML.
Is it just libmali-valhall-g610-g6p0-x11 - Mali GPU User-Space Binary Drivers
I need to install on Ubuntu as my preorder will likely arrive soon and thinking of some ML projects to play with.
I really wish RKNN-Toolkit had a tensorflow delegate but prob give that a whirl as well but would love to be able to partition model across CPU/GPU/NPU purely as a bench.
I have been checking out OpenAi’s Whisper that is an monster of a model but on CPU its 5x a Pi4b.
@jack please take a look at this issue. My board also get dead when running at DRAM frequency 2112 MHz. This is the output of command dmesg |grep volt-sel
:
[ 4.613369] cpu cpu0: pvtm-volt-sel=4
[ 4.624359] cpu cpu4: pvtm-volt-sel=4
[ 4.636125] cpu cpu6: pvtm-volt-sel=4
[ 4.896723] rockchip-dmc dmc: leakage-volt-sel=0
[ 4.897287] mali fb000000.gpu: pvtm-volt-sel=3
[ 5.104413] RKNPU fdab0000.npu: pvtm-volt-sel=4
My dmc has a very low volt sel. Now I have to decrease my DRAM freq to 1560 MHz for daily use.
I have leakage-volt-sel=1
for DMC.
Note that lower values mean that a higher voltage is used… maybe your SoC does not even meet the requirements for L0 and so would like even higher voltages. But then why do I also have problems, when my SoC is good enough to use the lower L1 voltages? So perhaps that isn’t the problem.
One thing I find odd, is that the voltage for 2112 MHz actually comes from the 2750 MHz OPP, though the difference in voltage between 1560 MHz and “2750” MHz seems roughly equal to that between 1068 MHz and 1560 MHz.
For an idea of the performance gained by moving from 1560 MHz to 2112 MHz, it makes compilation of Mesa about 5% faster. Different tasks may see different impacts, 10% or even higher could be realistic.
clk 2112MHz should come from scmi. I did tried to increase the voltage to 1.0V but the board still get hanged. We both have v1.3 boards, let’s wait to see if v1.42 boards have this issue.
I just bulilt a new u-boot with bl31 v1.28 and ddr v1.08, now my system doesn’t hang with ddr freq 2112 MHz. Thank you for informing this! I will create a pull request to armbian to update these binaries. @icecream95 you can try building u-boot with the new binaries: https://github.com/radxa/rkbin/tree/master/bin/rk35.
Would be interesting to get sbc-bench
outputs from before and after with otherwise identical settings (you were a bit into ‘overclocking’ the A76?) to check how numbers differ (especially ramlat / tinymembench).
I have see much about pvtm,but I still can’t undersatnd the relationship of pvtm,leakage and “/sys/kernel/debug/pvtm/*/value”.And how they are worked together.TRM contains no description of mechanism.Can you offer me some info as following command:
dmesg | grep cpu.cpu
dmesg | grep dmc
TRM part 2, chapters 17 and 18.
This change Radxa did recently is just cosmetics BTW. Clueless users will be happy that cpufreq scaling now lists the 2400 MHz OPP while an MCU inside the SoC still rejects higher clockspeeds. All that changes is the difference between reported and real clockspeeds
Before:
Cpufreq OPP: 2256 Measured: 2250 (2250.953/2250.855/2250.806)
After:
Cpufreq OPP: 2400 Measured: 2250 (2250.512/2250.316/2249.826) (-6.2%)
To get higher clockspeeds higher supply voltage is needed…
I continue to think that we’re missing something here. My impression is too that (at least for stability) the voltage has to be increased. I don’t think the MCU is very smart, at least because if it’s too smart it becomes bogus and can cause serious trouble resulting in unfixable chips. It’s possible that the MCU has access to the PVTM values itself and has the equivalent of a copy of the OPP tables, but I strongly doubt it as it would be a pain to maintain. Maybe it just applies a well-defined operation between the configured (not measured) voltage, the requested frequency and the PVTM values, and enforces a limit to the configured frequency. In this case maybe increasing the voltage a bit would solve it by skewing the operation. That would explain why from the beginning we’ve measured different frequencies than configured for the topmost opps. But it would not be difficult to add 2 lines about this in the TRM indicating that configured frequencies might be trimmed by the internal MCU based on PVTM and voltage…
./mhz 10 10000
count=169252 us50=3605 us250=18032 diff=14427 cpu_MHz=2346.323
count=169252 us50=3606 us250=18034 diff=14428 cpu_MHz=2346.160
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3607 us250=18037 diff=14430 cpu_MHz=2345.835
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3606 us250=18035 diff=14429 cpu_MHz=2345.998
count=169252 us50=3606 us250=18037 diff=14431 cpu_MHz=2345.673
count=169252 us50=3606 us250=18036 diff=14430 cpu_MHz=2345.835
count=169252 us50=3606 us250=18039 diff=14433 cpu_MHz=2345.347
Which isn’t that far away but the top opp is now in operation as before where you not changing the top opp value but until Radxa made changes that was never being used?
Thanks tkaiser!
But I have kown it for a while.I suspect it’s maybe throttled by cpufreq driver.Which write some thing to some reg.
“TRM part 2, chapters 17 and 18” only description hardware logic but I need software mechanism.
No manual to read!But I want to collect some data to work it out.
So, If anyone have a board,please show me the result of “dmesg | grep cpu.cpu”
By the way," cosmetics BTW"
dmesg | grep cpu.cpu
[ 6.107499] cpu cpu0: leakage=20
[ 6.107522] cpu cpu0: Looking up cpu-supply from device tree
[ 6.108989] cpu cpu0: pvtm=1538
[ 6.109078] cpu cpu0: pvtm-volt-sel=6
[ 6.109105] cpu cpu0: Looking up cpu-supply from device tree
[ 6.109224] cpu cpu0: Looking up mem-supply from device tree
[ 6.109559] cpu cpu4: leakage=16
[ 6.109575] cpu cpu4: Looking up cpu-supply from device tree
[ 6.116052] cpu cpu4: pvtm=1770
[ 6.119982] cpu cpu4: pvtm-volt-sel=6
[ 6.120007] cpu cpu4: Looking up cpu-supply from device tree
[ 6.120500] cpu cpu4: Looking up mem-supply from device tree
[ 6.121221] cpu cpu6: leakage=16
[ 6.121236] cpu cpu6: Looking up cpu-supply from device tree
[ 6.127726] cpu cpu6: pvtm=1760
[ 6.131676] cpu cpu6: pvtm-volt-sel=6
[ 6.131700] cpu cpu6: Looking up cpu-supply from device tree
[ 6.132191] cpu cpu6: Looking up mem-supply from device tree
[ 6.133280] cpu cpu0: avs=0
[ 6.133961] cpu cpu4: avs=0
[ 6.134633] cpu cpu6: avs=0
[ 6.134782] cpu cpu0: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[ 6.134859] cpu cpu0: EM: created perf domain
[ 6.134908] cpu cpu0: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=1608000000 h_table=0
[ 6.135323] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 2 >= em_perf_state1
[ 6.135328] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[ 6.135333] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 5 >= em_perf_state4
[ 6.135442] cpu cpu4: EM: created perf domain
[ 6.135481] cpu cpu4: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=2208000000 h_table=0
[ 6.143627] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 2 >= em_perf_state1
[ 6.143633] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[ 6.143639] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 5 >= em_perf_state4
[ 6.143769] cpu cpu6: EM: created perf domain
[ 6.143860] cpu cpu6: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=2208000000 h_table=0
This actually happens, so far being limited to RK3588S thingies. This here is another Khadas Edge2 where PVTM gone wrong and the A76 were limited to below 400 MHz: http://ix.io/4esV
It’s a weak SoC of course:
cpu cpu0: pvtm=1419
cpu cpu0: pvtm-volt-sel=1
cpu cpu4: pvtm=1656
cpu cpu4: pvtm-volt-sel=3
cpu cpu6: pvtm=1638
cpu cpu6: pvtm-volt-sel=2
Benchmark execution started at 40.7°C SoC temperature, it then looked like this:
cpu4-cpu5 (Cortex-A76): Cpufreq OPP: 2352 Measured: 2209 (-6.1%)
cpu6-cpu7 (Cortex-A76): Cpufreq OPP: 2304 Measured: 2186 (-5.1%)
After benchmark execution the SoC temperature is at 46.2°C (so Khadas active fansink must be in use and the idle temperature is already insanely high) and mhz
reports this:
cpu4-cpu5 (Cortex-A76): Cpufreq OPP: 2352 Measured: 394 (-83.2%)
cpu6-cpu7 (Cortex-A76): Cpufreq OPP: 2304 Measured: 2177 (-5.5%)
So the 2nd A76 cluster just has recovered while the 1st is still on lowest clockspeed possible. But the 7-zip scores proof that the A76 were almost all the time on below 400 MHz: 5468,6170,5766
(RK3588 with performance DMC governor and the A76 at 2.4 GHz scores above 16500, with dmc_ondemand governor w/o appropriate up_treshold
it’s below 14500).
@solaris3308 dmesg | grep cpu.cpu | ix
--> http://ix.io/4eBN
I have find out a very simple way to modify working freq!
Look this :
firefly@firefly2:~$ ./mhz 20 50000
count=516515 us50=11032 us250=55182 diff=44150 cpu_MHz=2339.819
count=516515 us50=11038 us250=55186 diff=44148 cpu_MHz=2339.925
count=516515 us50=11034 us250=55184 diff=44150 cpu_MHz=2339.819
count=516515 us50=11035 us250=55184 diff=44149 cpu_MHz=2339.872
count=516515 us50=11045 us250=55192 diff=44147 cpu_MHz=2339.978
count=516515 us50=11036 us250=55190 diff=44154 cpu_MHz=2339.607
count=516515 us50=11035 us250=55191 diff=44156 cpu_MHz=2339.501
count=516515 us50=11038 us250=55187 diff=44149 cpu_MHz=2339.872
^C
firefly@firefly2:~$ ./mhz 20 50000
count=516515 us50=11449 us250=57273 diff=45824 cpu_MHz=2254.343
count=516515 us50=11452 us250=57280 diff=45828 cpu_MHz=2254.146
count=516515 us50=11453 us250=57282 diff=45829 cpu_MHz=2254.097
count=516515 us50=11454 us250=57286 diff=45832 cpu_MHz=2253.949
count=516515 us50=11456 us250=57285 diff=45829 cpu_MHz=2254.097
count=516515 us50=11457 us250=57290 diff=45833 cpu_MHz=2253.900
count=516515 us50=11458 us250=57279 diff=45821 cpu_MHz=2254.490
count=516515 us50=11455 us250=57285 diff=45830 cpu_MHz=2254.048
I only modify big0’s opp.So only big0 reach freq 2239,which only reach 2254 before.
But I still need much time to test.
By the way.pvtm is NOT the larger the better.Look this:
6.273919] cpu cpu4: Looking up cpu-supply from device tree
[ 6.275608] cpu cpu4: bin=0
[ 6.275782] cpu cpu4: leakage=9
[ 6.275803] cpu cpu4: Looking up cpu-supply from device tree
[ 6.282314] cpu cpu4: pvtm=1636
[ 6.286295] cpu cpu4: pvtm-volt-sel=2
[ 6.286340] cpu cpu4: Looking up cpu-supply from device tree
[ 6.286838] cpu cpu4: Looking up mem-supply from device tree
[ 6.287385] cpu cpu6: Looking up cpu-supply from device tree
[ 6.289046] cpu cpu6: bin=0
[ 6.289222] cpu cpu6: leakage=9
[ 6.289242] cpu cpu6: Looking up cpu-supply from device tree
[ 6.295718] cpu cpu6: pvtm=1696
[ 6.299666] cpu cpu6: pvtm-volt-sel=4
[ 6.299698] cpu cpu6: Looking up cpu-supply from device tree
[ 6.300202] cpu cpu6: Looking up mem-supply from device tree