ROCK 5B Debug Party Invitation

solaris3308 · November 1, 2022, 11:24am

How and why!
I try to set higher voltage,but got this:
[ 1070.744651] vdd_cpu_big0_s0: Restricting voltage, 1025000-1000000uV
[ 1070.744701] vdd_cpu_big0_s0: Restricting voltage, 1025000-1000000uV
[ 1070.744726] cpu cpu4: rockchip_cpufreq_set_volt: failed to set voltage (1025000 1025000 1025000 uV): -22
[ 1070.744746] cpufreq: __target_index: Failed to change cpu frequency: -22

tkaiser · November 1, 2022, 11:27am

Six times same value. Now compare to what @amazingfate and me were posting as opp-microvolt values.

tkaiser · November 1, 2022, 11:40am

Not entirely sure what this Glmark2 testing is about (as @icecream95 already explained it’s not testing what people expect).

In general I would always test with every governor set to powersave and then with performance again to get worst and best case. Afterwards with a somewhat ‘smart’ powermeter an optimization process could happen to get a good compromise between performance (when needed) and consumption. Only then the other governors and further settings like ondemand/up_threshold (DMC) or ondemand/io_is_busy (CPU) will be explored to find a good balance (as we’ve seen ondemand/up_threshold=40 isn’t a good choice but nobody cares, same with io_is_busy).

But without power measurements that generate graphs this is a process most probably not worth the time and efforts.

BTW there’s also /sys/devices/platform/fb000000.gpu/devfreq/fb000000.gpu/ defaulting to simple_ondemand and as such dynamically clocking the GPU between 200 and 1000 MHz. Is it worth measuring ‘GL performance’ without taking this into account?

solaris3308 · November 1, 2022, 2:20pm

You means I only need to set 3 and 6 postion values?
Like this? :
opp-2400000000 { //org 2400000000
opp-supported-hw = <0xff 0xffff>;
opp-hz = <0x00 0x8f0d1800>; //org <0x00 0x8f0d1800>
opp-microvolt = <0xf4240 0xf4240 0xFA3E8 0xf4240 0xf4240 0xFA3E8>;
clock-latency-ns = <0x9c40>;
};
When I done.No warning appear again,but volts has no change,still 1000mv
root@firefly2:~# cat /sys/kernel/debug/regulator/regulator_summary | grep big
vdd_cpu_big0_s0 1 3 0 normal 1000mV 0mA 550mV 1050mV
vdd_cpu_big0_s0 0 0mA 0mV 0mV
vdd_cpu_big1_s0 1 3 0 normal 1000mV 0mA 550mV 1050mV
vdd_cpu_big1_s0 0 0mA 0mV 0mV

tkaiser · November 1, 2022, 2:39pm

You’re searching for this, right?

stuartiannaylor · November 1, 2022, 2:46pm

It has far more relevance than the synthetic benchmarks that are just walls of constant load that you keep using.
It was just the 1st thing I noticed that has a more ‘normal’ load and really shows off how bad the frequency bouncing of the ondemand governor can be.
Could be a choice of a huge range of normal applications of varying load, the Glmark2 tests also provide a secondary bonus info of how the graphics subsystem is working the score doesn’t really matter but is a bonus metric to how the scheduler is doing.

Choose from a range of running apps that many Phoronix or Geekbench base there tests on but don’t use a very synthetic wall of load that doesn’t create ondemand frequency bounce inefficiency.
Pick something that has a more normal application load profile that does ‘bounce’ around with load.
Choice is yours Glmark2 to Tensorflow-lite ASR or image detection or some Browser tests that do bounce load.

I have no idea really as the symmetrical based load governors don’t seem to be able to be particularly performant and efficient at the same time on the Arm asymmetric core layouts we have and why I was asking if anyone knew anything about Sched-capacity & Sched-energy.

Ian_Kester-Haney · November 1, 2022, 2:48pm

Benchmarks don’t mean anything. You have to test your use case and adjust your CPU governor according. A simple DNS and DHCP server doesn’t need the same policy as a media streaming server or desktop system.

stuartiannaylor · November 1, 2022, 3:30pm

Yep and that is what we should be benching real apps which Glmark2 is a closer form of than just some block mass load of 7zip.
This is what I am asking of common more desktop like applications running the way the schedulers will like be provided. Even more server like applications than just simple DNS & DHCP as doubt many will use a Rock5b solely for that, even if they could.
SuperTuxCart or maybe the demo lvl of Quake maybe as one example or anything bouncy with load that involves multiple sub systems.
I am not taking a swipe at benchmarks but I am questioning if the schedulers we commonly use are any good.
The Performance scheduler as above didn’t add that much watts to idle, so maybe it is just use performance when not mobile, its a question I am asking more than anything else. Are there better asymmetric schedulers and methods to test them?

willy · November 1, 2022, 3:53pm

I’ve had strange results yesterday when increasing the voltage. I got some combinations that worked for some frequencies. But not every time, sometimes after a reboot they were not accepted anymore. I got 2.35 GHz working at 1.037 at positions 1,2,4,5 and 1.05V at 3,6, but couldn’t reproduce it later. I’ve seen that modifying then reverting the DTS resulted in some values no more working, but I really think that instead it was caused by the reboot. I still haven’t figured the complete extent of this mess yet. After reading the document shared above, I think that the issue might be that we’re using too narrow ranges between min and max and that maybe it’s not possible to configure an exact voltage value that matches. I’ll need to do more tests on this.

icecream95 · November 1, 2022, 9:39pm

For anything non-vsynced or at a moderately high resolution, it’s pretty easy to push the GPU to “1000” (990 for me) MHz. I don’t think that’s a big concern, at least not compared to the CPU clocking itself too low.

tkaiser · November 1, 2022, 9:42pm

If you want both maximum performance and minimum consumption in this silly SBC world (ARM and not x86) you need benchmarks to be able to optimize stuff since defaults suck. Benchmarks that represent at least one use case you’re interested in (e.g. ‘server workloads in general’) are the prerequisit to

test out tunables/settings (something we as users need to do since all we get from SoC vendors is crap suited for Android use cases)
take decisions (wrt to scheduling for example on hybrid systems: which tasks or interrupts should be pinned to ‘little’ and which to ‘big’ cores, when do I need a big core since a little becomes a real-world bottleneck)

As if it would be that easy and as if there would be only the cpufreq governor. It’s about more than this and settings matter regardless which governor has been chosen (repeating myself again and again in this thread)

Example 1: clocking LPDDR with dmc_ondemand memory governor and the up_treshold=40 default --> lower idle consumption at the cost of almost 15% performance loss. up_treshold=25 is better to keep idle consumption low but ramp up DRAM clockspeed immediately when needed and as such retain max performance at min idle consumption.

Example 2: I/O activity: ondemand/io_is_busy wins over schedutil if it’s about maximum I/O performance while keeping consumption at the minimum.

I’m testing with schedutil since IIRC kernel 4.6 or something like that but on ARM never had any success compared to ondemand/io_is_busy. And again my use cases are very limited (server workloads) and my main goal is minimum consumption and maximum performance combined. Which is possible just not with SoC vendors defaults since we’re dealing here with the ‘Android e-waste world’ needing a lot of adjustments for ‘Linux use cases’.

I’m not interested in any of these use cases so why should I care about this crap. Other than your perception (‘walls of constant load’) the chosen benchmarks are of actual value to help with developing settings that combine min consumption and max performance with the use cases I’m interested in since this is my only goal (if this wasn’t the goal why would I deal with this shitty ‘Android e-waste world’ and the horrible software support situation in the first place?)

tkaiser · November 1, 2022, 9:50pm

I wouldn’t be too surprised if all of this stuff works amazingly well on expensive Android smartphones featuring triple CPU core clusters with Samsung or HiSilicon BSP kernels + userlands (tons of vulnerabilities included since SoC vendor’s BSP) while with mainline kernel you get a different experience.

Though I’ve no clue since not using anything Android and still being a fan of manually adjusting SMP/IRQ affinity since for the use cases I’m interested in it makes more sense.

@icecream95 thank you for your ‘security advisory’ wrt this 5.10 BSP kernel. Of course some issues exist with RK BSP kernels for a longer time already: https://github.com/armbian/build/search?q=CONFIG_DRM_IGNORE_IOTCL_PERMIT

stuartiannaylor · November 1, 2022, 10:01pm

There is a problem with your benchmarks when it comes to schedulers such as ondemand.
Ondemand suffers from bouncing load as the frequency and core gets bounced around.
Your benchmark of 100% load hits the 95% threshold of ondemand so it stays at max freq and the 1st big core and doesn’t change.
It doesn’t actually test how ondemand copes with normal desktop loads that often have threads changing and lesser loads aka ‘bouncy’ loads as opposed to your ‘wall of load’
So with the ondemand test you posted as an example where you did a taskset on cpu 7 and then without using the ondemand is pretty useless as ondemand would with that type of wall of load provide similar results.
But hey provide pointless scheduler results if you wish I guess maybe someone else will provide a better benchmark of more natural multi subsystem loads that common apps provide.

PS the rk3588 is a triple CPU core cluster.

Really you would think it would create 4x virtual cpus of big.Little clusters maybe and all I was doing was asking if you had any info, so thanks for the wonderful information

tkaiser · November 1, 2022, 10:15pm

No idea, what you’re talking about. sbc-bench switches to performance cpufreq governor for a reason in every mode (even in Geekbench and Phoronix modes: -G and -P) to create ‘best case’ numbers. Optimizing ondemand settings afterwards is another manual step I talked about multiple times here but obviously to no avail.

If you’re talking about this… then it’s not ‘my benchmark’ but some Phoronix stuff showing how important io_is_busy can be.

BTW: you should team up with @NicoD if you’re obsessed by ‘100% load’. ‘My benchmarks’ (whatever that should be) aren’t about ‘100% load’ and average load on Linux is misunderstood by most people anyway.

stuartiannaylor · November 1, 2022, 10:25pm

No I am talking about the useless example you posted the day before.

Also don’t bring others into some petty mindset of yours just keep it to what is being discussed.
The answer would seem to be just use the performance scheduler, but keep thinking there must be something better than that. The Soc does such a great job it only increases .5 of a watt on idle with the perf sched, so guess no big deal.

tkaiser · November 1, 2022, 10:46pm

Discussed? The ‘useless example’ was about answering your question whether the scheduler does its job. Comparing the result of 7zr b -mmt=1 with taskset -c 7 7zr b -mmt=1 and getting same numbers confirms the scheduler does it correctly and prefers the big cores. Cpufreq scaling not involved here at all.

You confuse profiles with policies and schedulers with governors as such note to myself: only answer Stuart in the future when you want to make fun of him.

What an insane BS conclusion. A really great example of ignorance especially after having ‘discussed’ plenty of different governors at work.

stuartiannaylor · November 1, 2022, 10:55pm

Look I am not going to entertain forum tenis as you spam your absolutely awful attitude on everybody else. We all get the ‘You’ about you.
The SoC is so efficient that I am not going to bother about it running on the performance governor as on the example I had the watt meter running it was only .5 watt more on idle and there was no BS about that.

Your example of a 7zip thrashing a core obviously stays above the threshold and it stays on a big core as the ondemand would and hence why the benchmark score had little to no difference to the benchmark where you set it to a big core with task sel.
So the conclusion ondemand governor being performant and efficient with a wide range of loads on that example is the very definition of BS.

But I will for the sake of others call it quits and kindly show them the same respect.

stuartiannaylor · November 2, 2022, 1:44pm

v1.42

rock@rock-5b:~$ dmesg|grep cpu.cpu
[    5.614349] cpu cpu0: leakage=11
[    5.614370] cpu cpu0: Looking up cpu-supply from device tree
[    5.615837] cpu cpu0: pvtm=1440
[    5.615927] cpu cpu0: pvtm-volt-sel=2
[    5.615955] cpu cpu0: Looking up cpu-supply from device tree
[    5.616074] cpu cpu0: Looking up mem-supply from device tree
[    5.616413] cpu cpu4: leakage=9
[    5.616430] cpu cpu4: Looking up cpu-supply from device tree
[    5.622969] cpu cpu4: pvtm=1701
[    5.626926] cpu cpu4: pvtm-volt-sel=4
[    5.626958] cpu cpu4: Looking up cpu-supply from device tree
[    5.627456] cpu cpu4: Looking up mem-supply from device tree
[    5.628172] cpu cpu6: leakage=9
[    5.628188] cpu cpu6: Looking up cpu-supply from device tree
[    5.634664] cpu cpu6: pvtm=1683
[    5.638598] cpu cpu6: pvtm-volt-sel=4
[    5.638621] cpu cpu6: Looking up cpu-supply from device tree
[    5.639116] cpu cpu6: Looking up mem-supply from device tree
[    5.640219] cpu cpu0: avs=0
[    5.640954] cpu cpu4: avs=0
[    5.641666] cpu cpu6: avs=0
[    5.641819] cpu cpu0: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[    5.641902] cpu cpu0: EM: created perf domain
[    5.641953] cpu cpu0: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=1608000000 h_table=0
[    5.642377] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 2 >= em_perf_state1
[    5.642383] cpu cpu4: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[    5.642499] cpu cpu4: EM: created perf domain
[    5.642542] cpu cpu4: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=2208000000 h_table=0
[    5.650656] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 2 >= em_perf_state1
[    5.650661] cpu cpu6: EM: hertz/watts ratio non-monotonically decreasing: em_perf_state 3 >= em_perf_state2
[    5.650778] cpu cpu6: EM: created perf domain
[    5.651117] cpu cpu6: l=10000 h=85000 hyst=5000 l_limit=0 h_limit=2208000000 h_table=0

willy · November 3, 2022, 5:13am

Note that in this case it’s different from an SBC benchmark, since it only tests how the scheduler reacts to a varying load based on its settings. It’s very interesting and important, but totally independent from the hardware and more related to academic work leading to best practices regarding how to best tune a scheduler for fast response on heterogenous CPU cores while optimizing energy.

In an ideal world, attempts to study the scheduler’s behavior when facing various workloads should result in good settings that would allow ondemand to always be used instead of performance. But we’re far from this situation and FWIW I never use ondemand on machines whose responsiveness matters to me, I only use performance, starting from my laptop where intel_pstate is in charge of the scaling there and you don’t want to add extra steps before deciding to boost the frequency.

I’m among the many users who only need two speeds: low for energy savings while in idle/typing at the keyboard, and high when doing work (compiling, starting programs, loading a web page etc). My goal definitely is the race-to-idle: do something fast and stay idle as long as possible. Some people also need intermediary steps for a specific situation which is video playing: there’s no point in doing it faster than needed so better save more energy by finding the right spot. Maybe on non-x86 systems ondemand may provide something there, I don’t know. But since for me such machines run on very little resources and consume less at full load than any PC in idle, I don’t really care and they can stay with performance

stuartiannaylor · November 3, 2022, 5:38am

Yeah I know and also was not having a pop at sbc-bench just that one test just creates > 95% load so on demand just acts like perf on that one.
I was asking actually more than making any statements as was just watching ondemand in htop and thinking it aint that great and with a brief google wondered if anyone had figured out the newer schedulers such as EAS.
I get a tad more wattage and not enough to care about with race2idle and the perf sched / governor or whatever you want to call em so no boths.

Anyway did you see my 1.42 board that arrived as it is clocking much lower than the 1.3 initial board I got cpu_MHz=2271.744.