ROCK 5B Debug Party Invitation

Ok, I still adjusted the 2208 OPP (1000 instead of 987.5 mV) and adopted the DT overlay (the Armbian way):

mkdir -m755 /boot/overlay-user
dtc -I dts -O dtb rk3588-increase-opp-microvolt.dts -o /boot/overlay-user/rk3588-increase-opp-microvolt.dtbo
echo "user_overlays=rk3588-increase-opp-microvolt" >>/boot/armbianEnv.txt

One reboot later it looks good:

tk@rock-5b:~$ source sbc-bench.sh ; ParseOPPTables | grep -A31 cluster1-opp-table
   cluster1-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1012.5 mV
      2304 MHz  1025.0 mV
      2352 MHz  1037.5 mV
      2400 MHz  1050.0 mV

   cluster2-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1012.5 mV
      2304 MHz  1025.0 mV
      2352 MHz  1037.5 mV
      2400 MHz  1050.0 mV

Now letting measure the difference these 50mV make… (Netio=192.168.83.72/2 sbc-bench.sh -g 4-7 already fired up)

And the increased voltage does affect the full upper spectrum of DVFS OPP:

Cpufreq OPP: 2400    Measured: 2433 (2433.407/2433.235/2433.178)     (+1.4%)
Cpufreq OPP: 2352    Measured: 2414 (2414.298/2414.186/2414.129)     (+2.6%)
Cpufreq OPP: 2304    Measured: 2395 (2395.265/2395.210/2394.821)     (+3.9%)
Cpufreq OPP: 2256    Measured: 2375 (2375.109/2375.000/2374.945)     (+5.3%)
Cpufreq OPP: 2208    Measured: 2190 (2190.572/2190.572/2190.293)
1 Like

Measured with original DVFS OPP table:

Now with increased voltages for the upper DVFS OPP (same setup, same PSU, same port on the powermeter, same externally powered fan):

MHz OPP / measured 7-ZIP MIPS Temp consumption
408 / 400 2320 25.9°C 2696mW
600 / 600 3428 25.9°C 2930mW
816 / 850 4833 27.1°C 3176mW
1008 / 1050 5997 27.8°C 3423mW
1200 / 1260 7078 28.1°C 3563mW
1416 / 1440 8048 28.7°C 3790mW
1608 / 1630 9020 29.6°C 4040mW
1800 / 1820 9967 31.2°C 4466mW
2016 / 2020 10955 33.3°C 5186mW
2208 / 2190 11757 36.4°C 5960mW
2256 / 2370 12508 39.8°C 7300mW
2304 / 2390 12602 40.4°C 7490mW
2352 / 2400 12641 41.6°C 7760mW
2400 / 2420 12732 42.2°C 7856mW

This is only the A76 clusters working together, no A55 involved. We see on my board with an up to 50mV increase PVTM generously allowing the specific OPP to clock (significantly) higher. The 2256 OPP now ends up with already close to 2.4GHz (2370 MHz) and the top OPP gets a 70 MHz increase in clockspeeds, a marginal performance increase (12410 -> 12730 7-ZIP MIPS) and a whopping consumption increase by 750mW.

Difference between idle and ‘full load’ consumption is with original settings 5.08W (7096-2020) and with increased settings 5.81W (7856-2050). Performance ‘boost’ is laughable 320 7-ZIP MIPS (12730-12410).

So to get a ~2.5% ‘boost’ in performance we tried a 5% voltage increase and end up with almost 15% higher consumption.

The higher Vcore supply voltage the less efficient the whole thing gets :slight_smile:

PDFs with all data uploaded:

1 Like

First ‘overclocking’ attempt: https://gist.github.com/ThomasKaiser/68cb5f8c50e0600d5fc15930371df261

The OPP tables are expanded:

tk@rock-5b:~$ source sbc-bench.sh ; ParseOPPTables | grep -A37 cluster1-opp-table
   cluster1-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1007.5 mV
      2304 MHz  1015.0 mV
      2352 MHz  1025.0 mV
      2400 MHz  1040.0 mV
      2448 MHz  1055.0 mV
      2496 MHz  1075.0 mV
      2544 MHz  1100.0 mV

   cluster2-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1007.5 mV
      2304 MHz  1015.0 mV
      2352 MHz  1025.0 mV
      2400 MHz  1040.0 mV
      2448 MHz  1055.0 mV
      2496 MHz  1075.0 mV
      2544 MHz  1100.0 mV

But to no avail:

tk@rock-5b:/sys/devices/system/cpu/cpufreq/policy4$ cat cpuinfo_max_freq 
2400000

Most probably I’m missing something simple (or still suffer from not PVTM understanding fully :slight_smile: )

The cpu supply of rock5b is limited to 1.05V: https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L267
So you have to increase the supply to 1.1V.

1 Like

Alright then let’s stay within these 1050 mV limits. I ended up with this OPP table to test:

   408 MHz   675.0 mV
   600 MHz   675.0 mV
   816 MHz   675.0 mV
  1008 MHz   675.0 mV
  1200 MHz   675.0 mV
  1416 MHz   725.0 mV
  1608 MHz   762.5 mV
  1800 MHz   850.0 mV
  2016 MHz   925.0 mV
  2208 MHz  1000.0 mV
  2256 MHz  1002.5 mV
  2304 MHz  1005.0 mV
  2352 MHz  1010.0 mV
  2400 MHz  1020.0 mV
  2448 MHz  1030.0 mV
  2496 MHz  1040.0 mV
  2544 MHz  1050.0 mV

But PVTM at work again:

Cpufreq OPP: 2544    Measured:  394    (395.013/394.986/394.968)    (-84.5%)
Cpufreq OPP: 2496    Measured:  394    (395.004/394.977/394.968)    (-84.2%)
Cpufreq OPP: 2448    Measured:  395    (395.013/395.004/394.986)    (-83.9%)
Cpufreq OPP: 2400    Measured: 2400 (2400.218/2400.107/2399.995)
Cpufreq OPP: 2352    Measured: 2379 (2379.870/2379.760/2379.650)     (+1.1%)
Cpufreq OPP: 2304    Measured: 2369 (2369.661/2369.335/2369.335)     (+2.8%)
Cpufreq OPP: 2256    Measured: 2369 (2369.335/2369.226/2369.063)     (+5.0%)
Cpufreq OPP: 2208    Measured: 2193 (2193.270/2193.223/2193.223)
Cpufreq OPP: 2016    Measured: 2022 (2022.377/2022.179/2022.030)

I already collected such a weird result with highest OPP resulting in ~400 MHz.

Well I guess next step would be to adjust this

	rockchip,pvtm-voltage-sel = <
		0	1595	0
		1596	1615	1
		1616	1640	2
		1641	1675	3
		1676	1710	4
		1711	1743	5
		1744	1776	6
		1777	9999	7
	>;

(or find some way to further tweak the opp-supported-hw values`?). Since all of this is of limited use due to excessive consumption increases when leaving ‘default PVTM’ land I’ll give up for now.

And another one, this time RK3588s (Khadas Edge2): http://sprunge.us/TenXhp

It’s a weak RK3588s getting 2256/2304 MHz max:

      cpu cpu0: pvtm=1475
      cpu cpu0: pvtm-volt-sel=3
      cpu cpu4: pvtm=1700
      cpu cpu4: pvtm-volt-sel=4
      cpu cpu6: pvtm=1711
      cpu cpu6: pvtm-volt-sel=5

Idle temp when starting sbc-bench is at 49.0°C and this matters since with more demanding tasks the SoC (or the MCU inside) switch to less than 400 MHz at the highest OPP. Measured prior to benchmark execution:

Cpufreq OPP: 2256    Measured: 2235 (2235.657/2235.512/2235.415)
Cpufreq OPP: 2304    Measured: 2249 (2249.777/2249.679/2249.630)     (-2.4%)

And directly afterwards:

Cpufreq OPP: 2256    Measured:  394    (394.716/394.707/394.689)    (-82.5%)
Cpufreq OPP: 2304    Measured: 2252 (2252.278/2252.180/2252.180)     (-2.3%)

And the benchmark scores themselves proof that both A76 clusters remained almost all the time on less than 400 MHz.

When executing ramlat it’s already obvious that something’s weird. Khadas guys have not yet discovered the dmc/dfi device-tree nodes so they’re running RAM with highest clockspeeds. But cluster1 is obviously at below 400 MHz:

  size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
    4k: 10.14 10.14 10.14 10.14 10.14 10.14 10.14 19.28 
    8k: 10.14 10.14 10.16 10.14 10.14 10.14 10.14 19.75 
   16k: 10.14 10.14 10.14 10.14 10.14 10.14 10.15 19.75 
   32k: 10.14 10.14 10.14 10.14 10.14 10.14 10.14 19.77 
   64k: 10.17 10.16 10.20 10.16 10.18 10.17 10.17 19.81 
  128k: 31.54 31.53 31.52 31.54 31.52 34.96 43.05 76.76 
  256k: 36.60 36.27 36.47 36.31 36.42 36.18 44.75 76.95 
  512k: 49.43 49.09 49.28 49.11 49.22 53.76 69.54 115.5 
 1024k: 60.31 59.44 59.35 59.47 59.14 68.30 90.86 143.2 
 2048k: 68.94 67.80 70.63 67.41 67.62 78.11 106.5 158.5 
 4096k: 115.9 99.16 104.0 92.17 108.0 100.7 119.8 165.2 
 8192k: 175.6 151.1 179.1 156.3 158.8 150.9 155.5 184.7 
16384k: 192.6 191.1 192.2 192.7 189.0 186.3 190.1 209.3 

…while cluster2 can run at this time with ~ 2250 MHz:

  size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
    4k: 1.773 1.773 1.773 1.774 1.773 1.773 1.773 3.377 
    8k: 1.773 1.773 1.774 1.773 1.773 1.773 1.774 3.456 
   16k: 1.773 1.773 1.773 1.773 1.773 1.773 1.774 3.456 
   32k: 1.773 1.773 1.773 1.774 1.775 1.773 1.775 3.458 
   64k: 1.774 1.774 1.775 1.775 1.774 1.774 1.775 3.459 
  128k: 5.322 5.322 5.320 5.322 5.320 5.915 7.438 13.43 
  256k: 6.216 6.377 6.207 6.393 6.228 6.232 7.799 13.42 
  512k: 12.47 11.64 12.24 11.66 12.18 12.31 14.08 20.92 
 1024k: 18.19 17.77 17.97 17.76 17.63 17.94 19.94 29.82 
 2048k: 20.02 19.75 19.19 19.77 19.21 20.04 22.25 32.31 
 4096k: 54.32 43.67 52.58 45.32 57.91 45.11 46.73 60.64 
 8192k: 104.2 88.18 102.7 87.68 106.0 86.30 84.14 99.10 
16384k: 122.4 119.1 123.4 120.9 123.2 118.1 112.4 113.7

Pretty interesting. The fact that the lower frequencies got a boost when increasing their voltage setting makes me think that not all bits are used for voltage alone but that maybe 1 or 2 of the lowest bits of the voltage setting are in fact sent to the clock generator as a way to artificially lower the consumption by instead slightly lowering the frequency.

I’ve been surprised at the beginning that they had such a precise Vreg taking steps of 12.5mV. But if you think that instead it takes 50mV steps and that the two remaining bits only configure how many 24 MHz bins to add/remove, it can start to make a lot of sense.

I change the cpu supply to 1.5V, and now I can use volt higher than 1.05V. Here is the sbc-bench result of volt 1.15V at opp 2.4GHz: http://ix.io/4bL8
I failed to change cpu supply using device tree overlay, so I just edited the device tree in the kernel source code and compiled a new dtb package.

@tkaiser I think clk over 2.4GHz is locked by scmi firmware like rk356x did. If you look into dmesg output you can see a lot of error saying “set clk failed”. So we have to only increase the microvolt of opp 2.4GHz if we want to over clock at this moment. If rockchip release the source code of ATF we may find other way to unlock the clk over 2.4GHz.

Well, frying the SoC at 1500mV (150% of the designed 1000mW supply voltage at ‘full CPU speed’) results in this:

Cpufreq OPP: 2400    Measured: 2530 (2530.882/2530.882/2530.572)     (+5.4%)
Cpufreq OPP: 2400    Measured: 2547 (2548.049/2547.923/2547.860)     (+6.1%)

That’s close to nothing :frowning:

With only 50mV more instead of 500mV I was able to measure this:

That was just a 5% voltage increase and not 50%! At 1500mV consumption must be really ruined while performance only slightly benefits. Your 7-zip MIPS score today is lower than the one you had months ago with original DVFS settings even if now your CPU clockspeeds are 9% higher:

cpufreq dmc settings 7-zip MIPS openssl memcpy memset
~2540 MHz dmc_ondemand (upthreshold: 40) 15090 1448890 10160 28770
~2420 MHz dmc_ondemand (upthreshold: 20) 16720 1387470 9710 29340
~2310 MHz performance 16290 1322410 10200 28610

The reason is simple: adjusting the dmc governor (dmc_ondemand with upthreshold=20 – my board in the middle) is the better choice than overvolting/overclocking since it gives you lower idle consumption and better performance at the same time, especially compared to ‘overclocking’ which is horrible from an energy efficient point of view. The higher the supply voltages the less efficient the CPU cores.

I change the dmc governor to performance and get higher result: http://ix.io/4bMe comparing to the default dmc_ondemand: http://ix.io/4bLQ, but higher temperature. BTW this is microvolt 1.25v to opp 2.4GHz.

Comparing 4 results, two times no overclocking (your and mine board at the bottom), two times overvolting/overclocking:

cpufreq dmc settings 7-zip MIPS openssl memcpy memset idle full load
~2640 MHz performance 17350 1505860 9910 28750 +600mW unknown
~2540 MHz dmc_ondemand (40) 15090 1448890 10160 28770 - unknown
~2350 MHz dmc_ondemand (20) 16300 1327430 9550 29140 - -
~2310 MHz performance 16290 1322410 10200 28610 +600mW ~7000mW

performance dmc governor has the huge disadvantage of increased idle consumption for no other reason than unfortunate settings.

Overclocking requires overvolting which ends up with huge consumption increases at full CPU utilization. What I’ve measured above with just 1050 mV instead of 1000 mV was a whopping 15% consumption increase for a laughable 2.5% performance ‘boost’. Have you’ve been able to measure peak consumption when benchmarking? It’s not possible with a simple USB powermeter anyway at least not with 7-zip since too much fluctuation.

As such IMO key to better overall performance while keeping consumption low is better settings instead of ‘overclocking’. :slight_smile:

I have to keep the fan running at full speed to do the overclocking, which is very noisy. I don’t have a powermeter so I don’t know about the power consumption, but it can be inferred from the temperature: 60°C with full speed running fan is too high. But for my low pvtm value, microvolt 1.05v can let my board reach 2.4GHz, which is valuable to me.

A little concern after having played around with DT overlays and overvolting: my board now consumes significantly more than before even without any DT overlay loaded or other DT manipulations.

@willy: have you already done some tests here? Since you’re able to measure maybe it’s a good idea to check consumption prior to any such tests and then compare.

Since I made only reboots all the time I disconnected it now from power for an hour but still same symptom: significantly higher idle consumption. Also the upthreshold=20 trick doesn’t work any more since now DRAM is all the time at 2112 MHz and not 528 MHz as before (having this already taken into account and measured with powersave dmc governor as well).

Maybe I’m just overlooking something but thought I write this as a little warning so others can measure before/after conducting any overvolting tests.

I think your extra consumption simply comes from the DMC running at full speed (for whatever reason), that’s the same difference of ~600mW you measured last month.

It does indeed happen to fry chips with voltage increases, but not by such small values. You’re definitely safe below the “absolute maximum ratings”, which usually are much higher because they don’t depend on temperature but the process node and technology used that impose hard limits on the voltage across a transistor, even in idle. I don’t know the value here but it might well be 1.5V or so. Regardless I’ve sometimes operated machines above the abs max ratings… But the gains on modern technologies are fairly limited, we’re not dealing with 486s anymore.

With that said, no I haven’t been running frequency tests yet (by pure lack of time, not curiosity). And yes, that’s definitely something for which I’ll watch the wattmeter. I’m even thinking that it could be nice to observe the total energy used by the test and compare it with the test duration. Usually the ratios are totally discouraging :slight_smile:

Last week I measured 1280mW in idle, now it’s 1460mW. DMC not involved since switched to powersave before:

root@rock-5b:/tmp# echo powersave >/sys/devices/platform/dmc/devfreq/dmc/governor
root@rock-5b:/tmp# monit-rk3588.sh 
 CPU0-3  CPU4-5  CPU6-7     DDR     DSU     GPU     NPU
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
    408     408     408     528     396     200     200
^C

And as reported the upthreshold behaviour now differs since with 20 DRAM is at 2112 MHz all the time while it was at 528 MHz last week. This then still adds another ~600mW…

did you happen to change the kernel maybe since previous measurements ?

It was just a change in settings I had forgotten and the TL;DR version is as simple as this:

  • consumption difference wrt ASPM with DRAM at lowest clock: 230 mW
  • consumption difference wrt ASPM with DRAM at highest clock: 160 mW

Setting /sys/sys/module/pcie_aspm/parameters/policy to either default or performance makes no significant difference. The consumption delta is always compared to powersave (5.10 BSP kernel default).

Long story: I was still on ‘Linux 5.10.69-rockchip-rk3588’ yesterday. Now decided to start over with Armbian_22.08.0-trunk_Rock-5b_focal_legacy_5.10.69.img I built 4 weeks ago:

idle consumption: 1250mW

One apt install linux-image-legacy-rockchip-rk3588 linux-dtb-legacy-rockchip-rk3588 linux-u-boot-rock-5b-legacy and a reboot later I’m at 5.10.72 (Armbian’s version string cosmetics):

idle consumption: 1230mW

Seems neither kernel nor bootloader related. I had a bunch of userland packages also updated but then remembered that silly me recently adjusted relevant settings. Since I started over with a freshly built Ubuntu 20.04 Armbian image (to be able to directly compare with Radxa’s) I had tweaks missing for some time:

/sys/devices/platform/dmc/devfreq/dmc/upthreshold = 25
/sys/sys/module/pcie_aspm/parameters/policy = default

And yesterday I applied these settings again and then the numbers differed.

With ASPM set to powersave (the kernel default) I’m measuring the 3rd time: idle consumption: 1220mW (which hints at 1220-1250mW being a range of expected results variation. For the lower numbers: I’ve three RPi USB-C power bricks lying around and am using most probably now a different one than before).

Now switching to /sys/sys/module/pcie_aspm/parameters/policy = default again I’m measuring three times: 1530mW, 1480mW and 1480mW.

Then retesting with /sys/devices/platform/dmc/devfreq/dmc/governor = powersave to ensure DRAM keeps clocked at 528 MHz: 1470mW, 1460mW and 1470mW.

One more test with ASPM and dmc governor set to powersave: idle consumption: 1250mW.

Well, then switching ASPM between powersave and default makes up for a consumption difference of ~230mW (1470-1240). That’s a lot more than I measured weeks ago. But now at 50.10.72 and back then most probably on 5.10.66. Also back then there was no dmc governor enabled and as such the DRAM all the time at 2112 MHz (now I’m comparing the difference at 528 MHz).

So retesting with the older image that is at 5.10.69:

DRAM at 528 MHz:

  • both ASPM and dmc governor set to powersave: 1250mW
  • ASPM default and dmc governor powersave: 1510mW
  • ASPM performance and dmc governor powersave: 1470mW

DRAM at 2112 MHz:

  • ASPM powersave and dmc governor performance: 1930mW
  • ASPM default and dmc governor performance: 2090mW
  • both ASPM and dmc governor set to performance: 2100mW

BTW: when all CPU cores are at 408 MHz the scmi_clk_dsu clock jumps from 0 to 396 MHz (DSU). No idea how to interpret this…

Setting upthreshold = 20 ends up way too often with DRAM at 2112 MHz instead of 528 MHz as such I slightly increased upthreshold in my setup. Though need to get a quality M.2 SSD I can use for further tests. Right now every SSD that is not crappy is either in use somewhere else or a few hundred km away.

1 Like

Cool that you found it! I remember that we’ve noted quite a number of times already that ASPM made a significant difference in power usage. This one is particularly important, maybe in part because the board features several lanes of PCIe 3.0, making it more visible than older boards with fewer, slower lanes.

Yeah, but until now I’ve not connected any PCIe device to any of the M.2 slots. So I really need to get a good SSD to test with since judging by reports from ODROID forum idle consumption once an NVMe SSD has been slapped into the M.2 slot rises to insane levels with RK’s BSP kernel (should be 4.19 over there, unfortunately users only sometimes post the relevant details).

Could be an indication that there are driver/settings issues with NVMe’s APST (Autonomous Power State Transition) asides ASPM.