ROCK 5B Debug Party Invitation

tkaiser · September 27, 2022, 7:11pm

Well, this is the memory OPP table as defined. Which is in conflict with /sys/devices/platform/dmc/devfreq/dmc/available_frequencies since there’s no 2750MHz entry but it ends at 2112 MHz: 528000000 1068000000 1560000000 2112000000

@willy: is there a way to determine from ramlat measurements DRAM clockspeeds? Every cpufreq governor switched to performance and then checking dmc’s available_frequencies:

root@rock-5b:/sys/devices/platform/dmc/devfreq/dmc# for i in $(<available_frequencies) ; do echo -e "\n$i\n"; echo $i >min_freq ; echo $i >max_freq ; taskset -c 5 /usr/local/src/ramspeed/ramlat -s -n 200 ; done

528000000

   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.706 1.706 1.706 1.706 1.706 1.706 1.706 3.234 
     8k: 1.706 1.706 1.706 1.706 1.706 1.706 1.706 3.325 
    16k: 1.706 1.706 1.706 1.706 1.706 1.706 1.706 3.325 
    32k: 1.706 1.706 1.706 1.706 1.706 1.706 1.706 3.329 
    64k: 1.707 1.707 1.707 1.707 1.707 1.707 1.707 3.328 
   128k: 5.149 5.146 5.144 5.146 5.144 5.853 7.185 12.92 
   256k: 5.982 6.163 5.970 6.165 5.965 5.991 7.497 12.90 
   512k: 8.734 8.199 8.603 8.201 8.626 8.408 9.332 14.92 
  1024k: 20.14 18.54 18.49 18.54 18.53 18.88 20.95 30.60 
  2048k: 25.87 20.83 21.94 20.82 21.96 22.34 25.41 38.82 
  4096k: 99.21 72.74 84.82 72.67 86.27 72.74 78.43 107.8 
  8192k: 178.6 158.7 173.2 158.9 172.7 153.4 159.3 182.4 
 16384k: 217.0 203.5 211.8 202.3 211.5 194.6 208.8 217.0 

1068000000

   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.707 1.707 1.707 1.707 1.707 1.707 1.707 3.233 
     8k: 1.707 1.707 1.707 1.707 1.707 1.707 1.707 3.327 
    16k: 1.707 1.707 1.707 1.707 1.707 1.707 1.707 3.327 
    32k: 1.707 1.707 1.707 1.707 1.707 1.707 1.707 3.330 
    64k: 1.708 1.707 1.708 1.707 1.708 1.708 1.708 3.330 
   128k: 5.123 5.121 5.120 5.121 5.120 5.815 7.166 12.92 
   256k: 5.976 6.163 5.972 6.165 5.975 5.994 7.500 12.91 
   512k: 8.832 8.530 8.778 8.536 8.514 8.859 9.705 15.37 
  1024k: 18.62 18.56 18.51 18.56 18.68 18.90 20.98 30.61 
  2048k: 18.92 19.39 18.64 19.39 18.69 19.76 21.65 30.88 
  4096k: 71.60 52.61 62.12 52.40 63.17 54.66 58.07 77.36 
  8192k: 131.4 113.6 127.8 112.5 127.5 110.4 113.5 128.4 
 16384k: 160.9 147.5 157.4 147.3 157.4 142.9 147.7 152.7 

1560000000

   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.707 1.708 1.707 1.707 1.707 1.707 1.708 3.237 
     8k: 1.707 1.708 1.707 1.708 1.707 1.707 1.708 3.328 
    16k: 1.707 1.708 1.707 1.708 1.707 1.708 1.708 3.329 
    32k: 1.708 1.708 1.707 1.708 1.707 1.708 1.708 3.332 
    64k: 1.709 1.708 1.708 1.708 1.708 1.708 1.709 3.331 
   128k: 5.152 5.150 5.149 5.150 5.149 5.824 7.197 12.93 
   256k: 6.011 5.996 6.004 5.996 6.004 6.008 7.518 12.92 
   512k: 8.634 8.240 8.814 8.265 8.105 8.374 9.551 15.25 
  1024k: 18.76 18.69 18.38 18.68 18.11 19.03 21.14 30.59 
  2048k: 18.73 19.26 18.50 19.26 18.55 19.55 21.59 30.85 
  4096k: 60.18 46.27 54.10 46.18 54.60 47.84 50.54 65.17 
  8192k: 108.3 92.29 102.9 91.68 103.6 91.60 94.70 105.9 
 16384k: 135.8 124.9 133.7 125.0 134.6 122.7 124.7 126.9 

2112000000

   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.709 1.709 1.708 1.709 1.708 1.708 1.709 3.240 
     8k: 1.708 1.709 1.708 1.709 1.708 1.709 1.709 3.330 
    16k: 1.708 1.709 1.708 1.709 1.708 1.708 1.709 3.330 
    32k: 1.708 1.709 1.708 1.709 1.708 1.709 1.709 3.334 
    64k: 1.710 1.709 1.710 1.709 1.710 1.710 1.710 3.334 
   128k: 5.128 5.127 5.126 5.127 5.126 5.744 7.175 12.94 
   256k: 5.988 6.170 5.981 6.170 5.982 6.000 7.509 12.92 
   512k: 8.030 7.508 8.089 7.557 7.882 7.799 8.715 14.56 
  1024k: 18.44 18.58 18.36 18.57 18.96 18.89 20.95 30.58 
  2048k: 18.96 19.30 18.75 19.30 18.75 19.62 21.66 31.06 
  4096k: 54.45 42.41 49.72 42.30 49.29 43.93 46.23 59.35 
  8192k: 97.40 83.49 94.01 83.06 93.91 83.65 85.51 94.37 
 16384k: 123.9 115.1 122.0 115.1 122.1 112.8 113.8 115.5

willy · September 27, 2022, 8:25pm

Yep, so it corresponds to what I observed and decoded, indicating that using 0xff would allow all pvtm values for the associated opp.

Pvtm indicates the silicon quality but that can be compensated using higher voltages. The next step is to create all combinations of opp in terms of combinations of (frequency, voltage), and select them based on pvtm values. That’s what I intend to do on my board to enable 2.4 GHz (and possibly try higher ones).

willy · September 27, 2022, 8:34pm

Not really. I tried a lot, but the measurements are influenced by the DRAM bus width, the memory controller, the L3 cache’s speed, etc. For example I hoped that measuring the time it took for one line and two adjacent lines would reflect the DRAM frequency, but it does not. I even tried to play with prefetch instructions just in case but that didn’t give me any interesting result. In the end memchr() remains one of the most effective one because it’s supposed to be almost insensitive to RAM latency but highly dependent on bandwidth (frequency * bus size).

willy · September 27, 2022, 8:39pm

I don’t believe in it a single second. That would be a design error on their side, because measuring that voltage etc is exactly what the PVTM is for, i.e. gauge what the silicon is capable of. No, if there’s an MCU, I guess that instead it’s just dampening the curve at the highest frequencies to defeat our measurements without being caught too easily. I would suspect that past a certain multiplier, they’re just cutting excess multipliers in half so that past the highest trusted point (2208 MHz), every 48 MHz only adds 24. But that’s pure guess, of course, even though it perfectly matches what you’ve measured

jack · September 27, 2022, 10:57pm

So it’s better that @jack @hipboi can ask rockchip why the cpu clk is set to the current status.

@Stephen Please follow up this issue.

amazingfate · September 28, 2022, 8:06am

@tkaiser @willy I increased the cpu voltage to 1.05V and now the 4 big cores can reach 2.4GHz: http://sprunge.us/k7rnJj
Here is the device tree overlay: https://gist.github.com/amazingfate/883baffc614f49c8089dafd4152e99f3

tkaiser · September 28, 2022, 8:22am

Great! I’m going to apply this on my board later trying to measure the consumption/temp difference this makes.

Also I think only adding some more mV to the last 2400 OPP isn’t enough since those intermediate steps also need a little bit more voltage since PVTM is sensitive to temperature contrary to my initial belief (need to test a ‘cold boot’ in really hot state with Rock 5B lying on a radiator or something like this to see whether I then end up with 2352 as highest OPP or even less) and the board might throttle under heavy load without a good heatsink or a fan, then decreasing clockspeeds and probably becoming unstable due to undervoltage at a frequency somewhere between 2208 and 2400.

tkaiser · September 28, 2022, 11:11am

To elaborate on that… that’s how Rockchip’s defaults look like:

tk@rock-5b:~$ source sbc-bench.sh ; ParseOPPTables | grep -A15 cluster2-opp-table
   cluster2-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz   987.5 mV
      2256 MHz  1000.0 mV
      2304 MHz  1000.0 mV
      2352 MHz  1000.0 mV
      2400 MHz  1000.0 mV

Or better as graph:

We clearly see that the curve goes flat above 2208 MHz since PVTM does the job limiting weak silicon to the respective maximum clockspeed while staying on 1000 mV supply voltage.

@amazingfate’s adoption only cares about the highest DVFS OPP and this looks like this then:

This obviously does not care about the intermediate OPP between 2208 and 2352 MHz that currently are ‘addressed’ by PVTM (simply by rejecting higher clockspeeds to weak silicon ends up with same voltage but lower clocks to avoid undervoltage).

But if we want to overcome PVTM here and use the intermediate clockspeeds we need to address the clockspeed/voltage ratio. As such I would propose a slight increase of every voltage here (and lower the 1800 MHz OPP by 10mV since I doubt the buckling is by intention):

Or in numbers:

tk@rock-5b:~$ ParseOPPTables | grep -A15 cluster2-opp-table
   cluster2-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   840.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1012.5 mV
      2304 MHz  1025.0 mV
      2352 MHz  1037.5 mV
      2400 MHz  1050.0 mV

tkaiser · September 28, 2022, 11:28am

As a quick comparison how such DVFS curves usually look like:

The A15 cores in Exynos 5422 (ODROID-XU4) with Hardkernel’s 5.4 kernel:

Usually the highest DVFS OPP are a bit steeper and for whatever reason two OPP have the same voltage (1200 and 1300 MHz). Whether this is by accident or there is a specific weakness at 1.3 GHz I don’t know…

opp_table0:
   200 MHz   900.0 mV
   300 MHz   900.0 mV
   400 MHz   900.0 mV
   500 MHz   900.0 mV
   600 MHz   900.0 mV
   700 MHz   900.0 mV
   800 MHz   925.0 mV
   900 MHz   950.0 mV
  1000 MHz   975.0 mV
  1100 MHz  1000.0 mV
  1200 MHz  1050.0 mV
  1300 MHz  1050.0 mV
  1400 MHz  1062.5 mV
  1500 MHz  1087.5 mV
  1600 MHz  1125.0 mV
  1700 MHz  1162.5 mV
  1800 MHz  1200.0 mV
  1900 MHz  1250.0 mV
  2000 MHz  1312.5 mV

amazingfate · September 28, 2022, 11:33am

My pvtm value is too low to enable 2256 and 2352. So I just keep them disabled by pvtm without changing their volts. I’ve updated the device tree overlay, you can try the new version.

tkaiser · September 28, 2022, 12:06pm

Ok, I still adjusted the 2208 OPP (1000 instead of 987.5 mV) and adopted the DT overlay (the Armbian way):

mkdir -m755 /boot/overlay-user
dtc -I dts -O dtb rk3588-increase-opp-microvolt.dts -o /boot/overlay-user/rk3588-increase-opp-microvolt.dtbo
echo "user_overlays=rk3588-increase-opp-microvolt" >>/boot/armbianEnv.txt

One reboot later it looks good:

tk@rock-5b:~$ source sbc-bench.sh ; ParseOPPTables | grep -A31 cluster1-opp-table
   cluster1-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1012.5 mV
      2304 MHz  1025.0 mV
      2352 MHz  1037.5 mV
      2400 MHz  1050.0 mV

   cluster2-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1012.5 mV
      2304 MHz  1025.0 mV
      2352 MHz  1037.5 mV
      2400 MHz  1050.0 mV

Now letting measure the difference these 50mV make… (Netio=192.168.83.72/2 sbc-bench.sh -g 4-7 already fired up)

And the increased voltage does affect the full upper spectrum of DVFS OPP:

Cpufreq OPP: 2400    Measured: 2433 (2433.407/2433.235/2433.178)     (+1.4%)
Cpufreq OPP: 2352    Measured: 2414 (2414.298/2414.186/2414.129)     (+2.6%)
Cpufreq OPP: 2304    Measured: 2395 (2395.265/2395.210/2394.821)     (+3.9%)
Cpufreq OPP: 2256    Measured: 2375 (2375.109/2375.000/2374.945)     (+5.3%)
Cpufreq OPP: 2208    Measured: 2190 (2190.572/2190.572/2190.293)

tkaiser · September 28, 2022, 1:25pm

Measured with original DVFS OPP table:

tkaiser:

MHz OPP / measured 7-ZIP MIPS Temp consumption

408 / 400 2320 25.9°C 2673mW

600 / 600 3423 26.8°C 2913mW

816 / 860 4843 27.5°C 3200mW

1008 / 1050 5995 28.1°C 3336mW

1200 / 1260 7083 28.7°C 3586mW

1416 / 1430 8062 29.0°C 3780mW

1608 / 1630 9030 30.2°C 4036mW

1800 / 1820 9960 31.2°C 4450mW

2016 / 2020 10943 33.9°C 5183mW

2208 / 2190 11733 36.7°C 6106mW

2400 / 2350 12412 39.8°C 7096mW

Now with increased voltages for the upper DVFS OPP (same setup, same PSU, same port on the powermeter, same externally powered fan):

MHz OPP / measured	7-ZIP MIPS	Temp	consumption
408 / 400	2320	25.9°C	2696mW
600 / 600	3428	25.9°C	2930mW
816 / 850	4833	27.1°C	3176mW
1008 / 1050	5997	27.8°C	3423mW
1200 / 1260	7078	28.1°C	3563mW
1416 / 1440	8048	28.7°C	3790mW
1608 / 1630	9020	29.6°C	4040mW
1800 / 1820	9967	31.2°C	4466mW
2016 / 2020	10955	33.3°C	5186mW
2208 / 2190	11757	36.4°C	5960mW
2256 / 2370	12508	39.8°C	7300mW
2304 / 2390	12602	40.4°C	7490mW
2352 / 2400	12641	41.6°C	7760mW
2400 / 2420	12732	42.2°C	7856mW

This is only the A76 clusters working together, no A55 involved. We see on my board with an up to 50mV increase PVTM generously allowing the specific OPP to clock (significantly) higher. The 2256 OPP now ends up with already close to 2.4GHz (2370 MHz) and the top OPP gets a 70 MHz increase in clockspeeds, a marginal performance increase (12410 -> 12730 7-ZIP MIPS) and a whopping consumption increase by 750mW.

Difference between idle and ‘full load’ consumption is with original settings 5.08W (7096-2020) and with increased settings 5.81W (7856-2050). Performance ‘boost’ is laughable 320 7-ZIP MIPS (12730-12410).

So to get a ~2.5% ‘boost’ in performance we tried a 5% voltage increase and end up with almost 15% higher consumption.

The higher Vcore supply voltage the less efficient the whole thing gets

PDFs with all data uploaded:

tkaiser · September 28, 2022, 2:12pm

First ‘overclocking’ attempt: https://gist.github.com/ThomasKaiser/68cb5f8c50e0600d5fc15930371df261

The OPP tables are expanded:

tk@rock-5b:~$ source sbc-bench.sh ; ParseOPPTables | grep -A37 cluster1-opp-table
   cluster1-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1007.5 mV
      2304 MHz  1015.0 mV
      2352 MHz  1025.0 mV
      2400 MHz  1040.0 mV
      2448 MHz  1055.0 mV
      2496 MHz  1075.0 mV
      2544 MHz  1100.0 mV

   cluster2-opp-table:
       408 MHz   675.0 mV
       600 MHz   675.0 mV
       816 MHz   675.0 mV
      1008 MHz   675.0 mV
      1200 MHz   675.0 mV
      1416 MHz   725.0 mV
      1608 MHz   762.5 mV
      1800 MHz   850.0 mV
      2016 MHz   925.0 mV
      2208 MHz  1000.0 mV
      2256 MHz  1007.5 mV
      2304 MHz  1015.0 mV
      2352 MHz  1025.0 mV
      2400 MHz  1040.0 mV
      2448 MHz  1055.0 mV
      2496 MHz  1075.0 mV
      2544 MHz  1100.0 mV

But to no avail:

tk@rock-5b:/sys/devices/system/cpu/cpufreq/policy4$ cat cpuinfo_max_freq 
2400000

Most probably I’m missing something simple (or still suffer from not PVTM understanding fully )

amazingfate · September 28, 2022, 3:54pm

The cpu supply of rock5b is limited to 1.05V: https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L267
So you have to increase the supply to 1.1V.

tkaiser · September 28, 2022, 4:48pm

Alright then let’s stay within these 1050 mV limits. I ended up with this OPP table to test:

   408 MHz   675.0 mV
   600 MHz   675.0 mV
   816 MHz   675.0 mV
  1008 MHz   675.0 mV
  1200 MHz   675.0 mV
  1416 MHz   725.0 mV
  1608 MHz   762.5 mV
  1800 MHz   850.0 mV
  2016 MHz   925.0 mV
  2208 MHz  1000.0 mV
  2256 MHz  1002.5 mV
  2304 MHz  1005.0 mV
  2352 MHz  1010.0 mV
  2400 MHz  1020.0 mV
  2448 MHz  1030.0 mV
  2496 MHz  1040.0 mV
  2544 MHz  1050.0 mV

But PVTM at work again:

Cpufreq OPP: 2544    Measured:  394    (395.013/394.986/394.968)    (-84.5%)
Cpufreq OPP: 2496    Measured:  394    (395.004/394.977/394.968)    (-84.2%)
Cpufreq OPP: 2448    Measured:  395    (395.013/395.004/394.986)    (-83.9%)
Cpufreq OPP: 2400    Measured: 2400 (2400.218/2400.107/2399.995)
Cpufreq OPP: 2352    Measured: 2379 (2379.870/2379.760/2379.650)     (+1.1%)
Cpufreq OPP: 2304    Measured: 2369 (2369.661/2369.335/2369.335)     (+2.8%)
Cpufreq OPP: 2256    Measured: 2369 (2369.335/2369.226/2369.063)     (+5.0%)
Cpufreq OPP: 2208    Measured: 2193 (2193.270/2193.223/2193.223)
Cpufreq OPP: 2016    Measured: 2022 (2022.377/2022.179/2022.030)

I already collected such a weird result with highest OPP resulting in ~400 MHz.

Well I guess next step would be to adjust this

	rockchip,pvtm-voltage-sel = <
		0	1595	0
		1596	1615	1
		1616	1640	2
		1641	1675	3
		1676	1710	4
		1711	1743	5
		1744	1776	6
		1777	9999	7
	>;

(or find some way to further tweak the opp-supported-hw values`?). Since all of this is of limited use due to excessive consumption increases when leaving ‘default PVTM’ land I’ll give up for now.

tkaiser · September 28, 2022, 6:12pm

tkaiser:

But PVTM at work again:

Cpufreq OPP: 2544    Measured:  394    (395.013/394.986/394.968)    (-84.5%)
Cpufreq OPP: 2496    Measured:  394    (395.004/394.977/394.968)    (-84.2%)
Cpufreq OPP: 2448    Measured:  395    (395.013/395.004/394.986)    (-83.9%)
Cpufreq OPP: 2400    Measured: 2400 (2400.218/2400.107/2399.995)

I already collected such a weird result with highest OPP resulting in ~400 MHz.

And another one, this time RK3588s (Khadas Edge2): http://sprunge.us/TenXhp

It’s a weak RK3588s getting 2256/2304 MHz max:

      cpu cpu0: pvtm=1475
      cpu cpu0: pvtm-volt-sel=3
      cpu cpu4: pvtm=1700
      cpu cpu4: pvtm-volt-sel=4
      cpu cpu6: pvtm=1711
      cpu cpu6: pvtm-volt-sel=5

Idle temp when starting sbc-bench is at 49.0°C and this matters since with more demanding tasks the SoC (or the MCU inside) switch to less than 400 MHz at the highest OPP. Measured prior to benchmark execution:

Cpufreq OPP: 2256    Measured: 2235 (2235.657/2235.512/2235.415)
Cpufreq OPP: 2304    Measured: 2249 (2249.777/2249.679/2249.630)     (-2.4%)

And directly afterwards:

Cpufreq OPP: 2256    Measured:  394    (394.716/394.707/394.689)    (-82.5%)
Cpufreq OPP: 2304    Measured: 2252 (2252.278/2252.180/2252.180)     (-2.3%)

And the benchmark scores themselves proof that both A76 clusters remained almost all the time on less than 400 MHz.

When executing ramlat it’s already obvious that something’s weird. Khadas guys have not yet discovered the dmc/dfi device-tree nodes so they’re running RAM with highest clockspeeds. But cluster1 is obviously at below 400 MHz:

  size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
    4k: 10.14 10.14 10.14 10.14 10.14 10.14 10.14 19.28 
    8k: 10.14 10.14 10.16 10.14 10.14 10.14 10.14 19.75 
   16k: 10.14 10.14 10.14 10.14 10.14 10.14 10.15 19.75 
   32k: 10.14 10.14 10.14 10.14 10.14 10.14 10.14 19.77 
   64k: 10.17 10.16 10.20 10.16 10.18 10.17 10.17 19.81 
  128k: 31.54 31.53 31.52 31.54 31.52 34.96 43.05 76.76 
  256k: 36.60 36.27 36.47 36.31 36.42 36.18 44.75 76.95 
  512k: 49.43 49.09 49.28 49.11 49.22 53.76 69.54 115.5 
 1024k: 60.31 59.44 59.35 59.47 59.14 68.30 90.86 143.2 
 2048k: 68.94 67.80 70.63 67.41 67.62 78.11 106.5 158.5 
 4096k: 115.9 99.16 104.0 92.17 108.0 100.7 119.8 165.2 
 8192k: 175.6 151.1 179.1 156.3 158.8 150.9 155.5 184.7 
16384k: 192.6 191.1 192.2 192.7 189.0 186.3 190.1 209.3

…while cluster2 can run at this time with ~ 2250 MHz:

  size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
    4k: 1.773 1.773 1.773 1.774 1.773 1.773 1.773 3.377 
    8k: 1.773 1.773 1.774 1.773 1.773 1.773 1.774 3.456 
   16k: 1.773 1.773 1.773 1.773 1.773 1.773 1.774 3.456 
   32k: 1.773 1.773 1.773 1.774 1.775 1.773 1.775 3.458 
   64k: 1.774 1.774 1.775 1.775 1.774 1.774 1.775 3.459 
  128k: 5.322 5.322 5.320 5.322 5.320 5.915 7.438 13.43 
  256k: 6.216 6.377 6.207 6.393 6.228 6.232 7.799 13.42 
  512k: 12.47 11.64 12.24 11.66 12.18 12.31 14.08 20.92 
 1024k: 18.19 17.77 17.97 17.76 17.63 17.94 19.94 29.82 
 2048k: 20.02 19.75 19.19 19.77 19.21 20.04 22.25 32.31 
 4096k: 54.32 43.67 52.58 45.32 57.91 45.11 46.73 60.64 
 8192k: 104.2 88.18 102.7 87.68 106.0 86.30 84.14 99.10 
16384k: 122.4 119.1 123.4 120.9 123.2 118.1 112.4 113.7

willy · September 28, 2022, 8:03pm

Pretty interesting. The fact that the lower frequencies got a boost when increasing their voltage setting makes me think that not all bits are used for voltage alone but that maybe 1 or 2 of the lowest bits of the voltage setting are in fact sent to the clock generator as a way to artificially lower the consumption by instead slightly lowering the frequency.

I’ve been surprised at the beginning that they had such a precise Vreg taking steps of 12.5mV. But if you think that instead it takes 50mV steps and that the two remaining bits only configure how many 24 MHz bins to add/remove, it can start to make a lot of sense.

amazingfate · September 29, 2022, 2:54am

I change the cpu supply to 1.5V, and now I can use volt higher than 1.05V. Here is the sbc-bench result of volt 1.15V at opp 2.4GHz: http://ix.io/4bL8
I failed to change cpu supply using device tree overlay, so I just edited the device tree in the kernel source code and compiled a new dtb package.

amazingfate · September 29, 2022, 6:40am

@tkaiser I think clk over 2.4GHz is locked by scmi firmware like rk356x did. If you look into dmesg output you can see a lot of error saying “set clk failed”. So we have to only increase the microvolt of opp 2.4GHz if we want to over clock at this moment. If rockchip release the source code of ATF we may find other way to unlock the clk over 2.4GHz.

tkaiser · September 29, 2022, 7:05am

Well, frying the SoC at 1500mV (150% of the designed 1000mW supply voltage at ‘full CPU speed’) results in this:

Cpufreq OPP: 2400    Measured: 2530 (2530.882/2530.882/2530.572)     (+5.4%)
Cpufreq OPP: 2400    Measured: 2547 (2548.049/2547.923/2547.860)     (+6.1%)

That’s close to nothing

With only 50mV more instead of 500mV I was able to measure this:

That was just a 5% voltage increase and not 50%! At 1500mV consumption must be really ruined while performance only slightly benefits. Your 7-zip MIPS score today is lower than the one you had months ago with original DVFS settings even if now your CPU clockspeeds are 9% higher:

cpufreq	dmc settings	7-zip MIPS	openssl	memcpy	memset
~2540 MHz	dmc_ondemand (upthreshold: 40)	15090	1448890	10160	28770
~2420 MHz	dmc_ondemand (upthreshold: 20)	16720	1387470	9710	29340
~2310 MHz	performance	16290	1322410	10200	28610

The reason is simple: adjusting the dmc governor (dmc_ondemand with upthreshold=20 – my board in the middle) is the better choice than overvolting/overclocking since it gives you lower idle consumption and better performance at the same time, especially compared to ‘overclocking’ which is horrible from an energy efficient point of view. The higher the supply voltages the less efficient the CPU cores.