ROCK 3B ideas and discussion

stuartiannaylor · August 10, 2022, 12:56pm

Its more feedback than anything as usually give whatever Radxa is offering a go but always felt the rk356x SoCs where extremely application focussed whilst the big dog rk3588 is the general purpose desktop.

I prob could use if b key thought it was a+e but still undecided as that port would be sataII as opposed to the others being sataIII which guess all would drop to the lower port speed or at least be queing and waiting for it.
I just don’t like the idea or the Pi like format.

DIYprojectz · August 11, 2022, 6:22am

That port will be SATA3 even if you use muxed SATA via simple adapter, let alone one of the PCIe to SATA3 controllers linked above.

enoch · August 21, 2022, 7:18am

you’re right, forgotten that it has 8GB max, better to have onboard RAM

just did a quick search on Geekbench, the best scores I can find are approximately that
Pi3+ r1.3 @1.4GHz = 120/350 (80%/78%)
RK3568 @2GHz = 150/450 (100%/100%)
Pi4 r1.1 @1.5GHz = 230/680 (153%/151%) <-- have it running 1.8GHz passively cooled
Pi4 r1.4 @1.8GHz = 270/740 (180%/164%)
RK3399 @1.4GHz = 270/750 (180%/167%)
RK3588 @1.8GHz = 500/2000 (333%/444%)

Seems like Pi is quite a tad faster, though it has no crypto engine; currently Pi 4 4GB is around US$100, and I think Rock 3 with 4GB RAM only should be quite a bit cheaper…

Not sure that QSGMII can be used for 2x2.5GbE at all… normally it is used for 4xGbE, but I doubt Rock 3B would have that.

tkaiser · August 21, 2022, 9:38am

Sure? Why not taking 183/589 instead?

Which use case does Geekbench represent? And what do these numbers mean? Those combined scores?

How is this huge result variation possible given that Geekbench claims to benchmark hardware?

Why do you list RK3588 with 1.8 GHz?

enoch · August 21, 2022, 11:27am

I am not very sure if you really know how benchmarking works…

Anyway, variation happens when one use a different OS, hw rev, config / DTS, and sometimes even connected devices. Typical benchmarking requires a CONTROLLED environment, and I don’t think it is easy to come by, like a temperature / humidity controlled room.

Back to the “comment”, the result I used are based on Android 12, while the one you stated is based on Ubuntu 20.04 LTS, that’s why. And why 1.8GHz for RK3588? you may go check it out yourself.

https://browser.geekbench.com/v5/cpu/search?utf8=✓&q=rk3588

And perhaps I should just focus on INT performance too…

tkaiser · August 21, 2022, 12:04pm

LOL! Too funny!

I did exactly that. Of course not relying on this Geekbench garbage but by examining SoC and software behaviour.

And on RK3588 (as well as RK3566/RK3568) we’re dealing with PVTM which usually results in the rather uninteresting little cores clocking in at 1800 MHz but the more important A76 ones being clocked up to 2400 MHz: https://github.com/ThomasKaiser/Knowledge/blob/master/articles/Quick_Preview_of_ROCK_5B.md#pvtm

We’ve recently seen some comical PVTM failure resulting in the A76 cores being clocked at only 400 MHz.

While Geekbench is that stupid to rely on some sysfs entries that claim certain clockspeeds. And the 1800 MHz you were fooled by is due to Geekbench presenting the clockspeed of cpu0 which on ARM is usually a little core. Geekbench just generates some funny numbers and its target audience usually doesn’t give a sh*t but blindly trusts into these numbers.

Then maybe you should better look at my sbc-bench than this Geekbench garbage…

sbc-bench tries to generate insights and not just numbers. See these two Rock 3A results:

http://ix.io/40TX: 7-zip multi-threaded score of 5110
http://ix.io/48eg: 7-zip multi-threaded score of just 2690

How is that possible? Fortunately sbc-bench results also contain the answers. It’s not just silicon variation (the RK3568 on the 2nd test clocks severly lower than 2000 MHz) but it’s broken thermal trip points AKA bad settings.

As for results variations with RK3568 we have

PVTM
boot BLOBs that do DRAM initialization (I have a couple of older sbc-bench results in my collection with really bad memory performance)
relevant kernel config like CONFIG_HZ or thermal trip points resulting in moderate up to absurd throttling like above
different environmental conditions like temperature
background activitly that ruins benchmark numbers

Geekbench doesn’t care about any of these, just generates and uploads random numbers.

jack · August 21, 2022, 12:47pm

the result I used are based on Android 12, while the one you stated is based on Ubuntu 20.04 LTS, that’s why.

LOL! Too funny!

enoch · August 21, 2022, 12:53pm

Sure, if you have the time to go into all these, it is good for you; I used to play with benchmarking when I was in college (yep they have class for that too), but that was so many years ago and I don’t have time to “study all these” anymore.

And of course Geekbench does not care… how would they support a SoC is beyond me, but there are so many SoCs and SBCs in the world, I don’t think it is economical for them to “support everything”. Perhaps the best way to benchmark is to make it work like memtest86, that the system only runs what it intended to (benchmarking), and no more.

Obviously Geekbench is targeting a much wider audience than SBC, so can you tell how a Rock 5 using a cheap PCIe x4 SSD fair against a M2 Macbook Air? or just some old tablet that runs Intel Z8350 dual booting Windows 10 and Android? with your benchmark? Different benchmark serves different purposes, so I won’t just call them crap, unless they deliberately lie.

tkaiser · August 21, 2022, 8:00pm

There is no need to ‘support’ a SoC. It’s due diligence when generating these numbers. And Geekbench sucks here big time.

They don’t give a sh*t about a sane benchmarking environment and deliberately upload garbage numbers to their website even if those are just numbers without meaning.

It is not hard to

measure CPU clockspeeds instead of trusting into some (often faked) sysfs entry. An ‘amateur’ like me uses @willy’s mhz utility for this in sbc-bench. Geekbench clockspeed reporting is pure BS. They report 2700MHz or 4000MHz for the same Core i7 in my old MacBook Pro depending on whether the benchmark runs on Windows or Linux. ARM’s big.LITTLE is there for a decade now but Geekbench doesn’t care. They (try to) measure single-threaded performance on a big core but report the little core’s sysfs clockspeed. Stupid isn’t it?
check background activity. It’s easy to spot this and to refuse to upload crappy numbers if this happened
check throttling or thermals in general. It’s easy to spot this and to refuse to upload crappy numbers if this happened
get the CPU’s cluster details and report them correctly and also test cores of all clusters individually. If ‘amateurs’ like me are able to do this in sbc-bench those Geekbench guys should be able to do this as well. At least benchmarking is their only job

Use case first! Always!

Which use case does Geekbench represent other than… none? I asked this many times various people and never got an answer. Which f*cking use case do these Geekbench combined scores represent?

With a ‘use case first’ approach it’s easy: I need macOS for my main computer so regardless of any performance considerations it’s the MacBook.

Then if the use case is ‘use this thing as desktop computer’ the important stuff is

high random I/O performance of the boot drive (the drive the OS is installed on)
as much graphics acceleration possible (both GPU and VPU)
sufficient amount of RAM since today’s software is a sh*tload of complexity (crappy software stacking layers of complexity one over another) needing tons of RAM. Once the machine starts to pageout/swap especially on storage with low random I/O performance it’s game over

What will Geekbench tell you about this? Nothing since it only focusses on (multi-threaded) CPU performance which is something for totally different use cases. The target audience is consumers not willing/able to think about what’s important but just interested in numbers/graphs and the ‘less is better’ or ‘more is better’ lable so they can compare $whatever without using their brain.

When talking about an M2 MacBook… why don’t you talk about an M1 MacBook Air?

You can get them less expensive (secondhand/refurbished) compared to the recent M2 model. Geekbench will tell you that there’s only a minor difference in ‘performance’ but if your use case is video editing and you’re stuck to the Apple ecosystem then there’s a massive performance difference since M2 has an Apple ProRes encoder/decoder ‘in hardware’.

It’s not my benchmark. sbc-bench is just a primitive tool that executes a small number of well established benchmarks in a controlled environment to be able to trash all those results that are garbage. It’s designed to get some real insights and not just numbers/graphs. IMHO developing a new benchmark is almost as stupid as developing cryptography if you’re not an absolute expert like @NicoD

Benchmarking is part of my day job and honestly +95% of all my benchmarking results need to be trashed since something went wrong (some detail you overlooked, some strange background activity, something you need to take into account since you learned something important on the way and so on). Compare with Geekbench or the Phoronix crap: both generate a cemetery of broken numbers. But this is part of their business model so why should this change?

As long as users are happy swallowing these BS numbers it works for them…

willy · August 21, 2022, 10:07pm

I have to agree with @tkaiser here. Geekbench is widely known for reporting random numbers that depend on the OS, its version, geekbench’s version, plenty of other variables nobody knows about, and probably the user’s mood and expectations, and all this without accurately reporting the execution conditions. Over the year we’ve read so many garbage values that its metric is not even a reliable one to validate how geekbench itself would perform on this or that machine in the unlikely case that your target use case would only be to run geekbench on the machine. As to how this translates to real world use cases… Honestly dickbench is only useful to compare one’s to others and claim “mine is bigger”.

And indeed, benchmark serves to provide a figure. The most important part of a benchmark is that it is reproducible. There’s no problem if reproducing it requires 10 pages of setup, provided that at the end you get the same results again, as it proves you’ve got the model right. That definitely requires eliminating all background noise or at least limiting it to a known level. Your indications seem extremely far from this at the moment, with only random elements being extracted to characterize the environment, with the most important one (CPU frequency) being reported wrong on a machine when it can vary 6-fold.

As a hint, a benchmark that only returns one value is garbage and totally useless. You need to find multiple dimensions and figure which one best matches your use case (i.e. mostly I/O, mostly CPU, mostly RAM, mostly GPU etc), and make sure about the condition they were executed in.

You don’t need to create a new benchmark, running a specific application that matches your use case can be a perfectly valid benchmark for your use case. You just need to limit undesired variations. For example I’m interested in compilation times and for this I’m keeping an old tar.gz of some software relevant to me, and build it with a binary build of the compiler I used on all benchmarks. That doesn’t prevent me from testing other combinations but at least I can compare the machines’ performance on this specific test, knowing that there are limits (e.g. number of cores etc).

dominik · August 25, 2022, 11:01am

@willy @tkaiser - everything You complain about GB it’s true and it’s easy to find out that produced result are not super reliable. It’s easy to get much lower scores as well as there are sometimes manipulated score uploaded to GB browser (like 80k results from cellphones, much better than big&hungry amd epyc).
Right now best scores are flooded with some AMD eng. samples with 192 cores. It’s that something real or yet again some fun from developers or bored users? We also know nothing about any other result, was cpu overclocked, was and how it was cooled? All those are unknown (as well as many other things that are very important) and this can alter result.
We all know that GB is also outdated and not precise about “core”. Right now we have little.big cores including newest intel cpus. Software (system and kernel) also can limit some numbers and it’s not clear if You don’t have some feature or if it’s just turned off/configuration is wrong. Probably it’s just easier to get lower scores than too high.
So what You expect from this benchmark? Should it just split from “single/multi core” to something like “single small, single big, all cores”? Some results depends on storage, so maybe it should also benchmark this? And what about kernel and system? I think that You will never have precise number that represents those resources. Also I think that if it’s worth to change method no, sooner or later there should be something like geekbench6.

I saw discussion at sbc-bench for result web-interface. Even when most of tests will not use full cpu or have some bottlenecks those represents some state. For same reason GB scores are somehow useful plus it’s easy to view, save, compare etc. Probably it’s easier to get to low score than too high. One number alone means almost nothing if You can’t compare that to something else like other SBC, other OS, other components. The real advantage is when You find that You setup perform much slower than most others, You can get some idea that there is something that limits your score and maybe fix that (or not if You find out that it’s not Your specific usage). The more recorded result the higher probability that You have same setup.

So GB score is not perfect and I don’t expect it to be. It gives some overview, something to compare wisely. Like pi-benchmarks for storage gives some idea what You can get with some sd cards or m.2 via adapters. Original topic was about rock3a 4x small cores vs older big cores from pi4 and that can be checked there via some results, some day maybe something like cortex A59 small cores would be better and much more energy efficient than those from pi4.
For now I said that rk3568 is rather slower than pi4 and that is visible in results. You could get two boards, perform any task to You want (like @willy tar.gz) measure time and be sure. Also You can ask for those files and .sh script and find out yourself, but I think there is something similar inside GB subtasks, and You can compare this task alone. It’s just easier to browse results than looking for someone who have that board/computer etc.

tkaiser · August 25, 2022, 11:42am

In a ‘cemetery of bogus numbers’ which is what Geekbench browser relies on (and which also applies to the PTS stuff over at openbenchmarking.org).

How do you explain @enoch coming up with these GB single/multi scores:

While realistic scores for RK3568 are much higher, e.g. 183/589 instead. He spent/wasted some time to create a comparison table based on BS numbers.

The main problem is Geekbench allowing to upload garbage. But that’s part of their business model so nothing will change.

Want to search for RK3588? Even 2 results today! RK3588S even 10 hits! So 12 combined!

While in reality it were already 205 a month ago: https://github.com/ThomasKaiser/sbc-bench/blob/master/results/geekbench-rk3588/results.md (this was an attempt to do some data mining in the garbage bin – I was interested in RK3588 PVTM variation).

They even manage to hide the garbage they collect…

tkaiser · August 25, 2022, 12:58pm

One last time about this Geekbench garbage.

The ‘natural’ clockspeed of RPi 4 is 1800 MHz (the 1500 MHz limitation applied only to the early BCM2711 Rev. B0 SoCs). Let’s assume BCM2711 C0 with natural clockspeed gets a GB 280/730 score.

Ok, we shouldn’t assume but browse the results: https://browser.geekbench.com/search?q=bcm2711

First surprise: RPi 4 when running in Linux has only 1 CPU core while being equipped with 4 when running Android!

Then looking at 1st results page only we see these scores for 1800 MHz:

184/280
194/429
210/529
212/637
193/451
206/450

Impressive since far far away from the assumed 280/730 score. And also the result variation is that high that we already know that we’re not dealing with a benchmark but with a random number generator.

Anyway: back to the assumed 280/730 score. If we compare with RK3568 and use 180/590 instead the A55 are at 65% percent of A72 with single-threaded tasks but 80% with tasks utilizing all 4 cores.

BCM2711 achieves a 2.6 ratio when comparing single-threaded with multi-threaded (with silly synthetic benchmarks we would see a 4.0 ratio when comparing one core with 4)
RK3568 achieves a 3.3 ratio when comparing single-threaded with multi-threaded

So RK3568 is 1.26 more efficient when all cores are busy compared to BCM2711 which hints at the latter being a SoC suffering from internal bottlenecks. And if we take these 1.26 factor and look at the multi-threaded efficiency (80% / 1.26 = 64%) we see that GB results are at least consistent.

But which use case do Geekbench combined scores represent? Exactly: none.

When we look at some real-world task like compression/decompression with some specific algorithm (e.g. 7-zip’s internal benchmark which is a rough representation of ‘server workloads in general’ as multi-threaded benchmark score) then the A55 are at 83% percent of A72 with single-threaded tasks but 91% with tasks utilizing all 4 cores (see below). I’m comparing RPi 4 at 1.8 GHz with Rock 3A at 2.0 GHz (‘natural’ clockspeeds, no ‘overclocking’ or the like).

While @enoch got fooled by Geekbench result browser to believe the A55 are at 56% percent of A72 with single-threaded tasks but 61% with tasks utilizing all 4 cores.

56% vs. 83% single-threaded and 61% vs. 91% multi-threaded. By relying on numbers without meaning.

Single-threaded 7-zip

1 x Cortex-A72 @ 2000 MHz (BCM2711)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:    3794 MB,  # CPU hardware threads:   4
RAM usage:    435 MB,  # Benchmark threads:      1

                      Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
        KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1675   100   1631   1630  |      24872   100   2125   2124
23:       1605   100   1637   1636  |      24365   100   2111   2109
24:       1527   100   1642   1642  |      23919   100   2100   2100
25:       1420   100   1622   1622  |      23271   100   2072   2071
----------------------------------  | ------------------------------
Avr:             100   1633   1632  |              100   2102   2101
Tot:             100   1868   1867

1 x Cortex-A72 @ 1800 MHz (BCM2711)

Executing benchmark single-threaded on cpu0 (Cortex-A72)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: - 64000000 - - - - - - -

RAM size:     958 MB,  # CPU hardware threads:   4
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1688   100   1643   1643  |      22726   100   1941   1940
23:       1531   100   1561   1561  |      22395   100   1939   1939
24:       1442   100   1551   1551  |      21999   100   1932   1931
25:       1351   100   1543   1543  |      21476   100   1912   1912
----------------------------------  | ------------------------------
Avr:             100   1575   1574  |              100   1931   1930
Tot:             100   1753   1752

1 x Cortex-A55 @ 2000 MHz (RK3568)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:    3924 MB,  # CPU hardware threads:   4
RAM usage:    435 MB,  # Benchmark threads:      1

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       1039   100   1011   1011  |      22872   100   1953   1953
23:        955   100    974    974  |      22343   100   1934   1934
24:        906   100    974    974  |      21790   100   1913   1913
25:        844   100    965    965  |      21048   100   1874   1873
----------------------------------  | ------------------------------
Avr:             100    981    981  |              100   1919   1918
Tot:             100   1450   1450

Multi-threaded 7-zip

4 x Cortex-A72 @ 2000 MHz (BCM2711)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:    3794 MB,  # CPU hardware threads:   4
RAM usage:    882 MB,  # Benchmark threads:      4

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       4257   348   1188   4141  |      94921   396   2043   8098
23:       3910   367   1086   3984  |      92508   395   2024   8004
24:       3892   374   1120   4185  |      90618   396   2009   7955
25:       3824   370   1179   4366  |      88417   397   1983   7869
----------------------------------  | ------------------------------
Avr:             365   1143   4169  |              396   2015   7982
Tot:             381   1579   6075

4 x Cortex-A72 @ 1800 MHz (BCM2711)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: - - - - - - - - -

RAM size:     958 MB,  # CPU hardware threads:   4
RAM usage:    882 MB,  # Benchmark threads:      4

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       4232   350   1177   4117  |      88569   399   1894   7556
23:       4030   359   1145   4107  |      87096   399   1887   7536
24:       3978   372   1151   4278  |      85017   398   1877   7463
25:       2601   307    968   2970  |      82219   399   1834   7317
----------------------------------  | ------------------------------
Avr:             347   1110   3868  |              399   1873   7468
Tot:             373   1491   5668

4 x Cortex-A55 @ 2000 MHz (RK3568)

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)

LE
CPU Freq: - - 64000000 64000000 - - - - -

RAM size:    3924 MB,  # CPU hardware threads:   4
RAM usage:    882 MB,  # Benchmark threads:      4

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       3115   359    845   3031  |      86765   399   1857   7402
23:       2966   367    824   3022  |      84797   399   1838   7337
24:       2903   378    826   3122  |      82388   399   1811   7233
25:       2759   383    823   3151  |      78741   395   1772   7008
----------------------------------  | ------------------------------
Avr:             372    829   3081  |              398   1820   7245
Tot:             385   1325   5163

dominik · August 25, 2022, 3:47pm

What You think “realistic” is?
You easily found out that scores are not consistent and easy to reproduce. You can get some picture if there are many results at some level (still not all). Of course thay are not perfect, but they test few things and gives some vision about whole set (software and hardware). Easy to get BS but also gives some insights, let’s say on pi number calculation

Yes, totally agree that results for Android are usually higher than those for ubuntu. Maybe there is something optimized so it gives better score. Of course information about cores is not correct.
OC makes all results much harder ro compare, but it’s still some information because not everything is able to do that. I try to avoid such score because those usually require additional cooling (or just bigger cooling) and can cause some stability issues, but some can just do oc and be happy.

dominik · August 25, 2022, 3:59pm

This may be something related to soc or memory (or any resource). In real world pi would probably stuck on slower disk i/o when doing zips. Sure newer RK3568 can make more usage of it’s multithread

This is true, those are just numbers and You can’t directly compare and judge that one is better in those %. But You can probably find result for same OS and configuration and compare with Your, make some idea how it can perform.

willy · August 25, 2022, 4:06pm

The main problem precisely is that such tools tend to provide a single dimension so that anyone can proudly compare to friends, but that’s the root of the problem. What if one test got significantly slowed down by eMMC accesses, or even by unattended_upgrade running in the background, ruining the overall score ?

One needs to understand what their expected target application will look like and test something similar. It will always be better. And that also counts for the thread count by the way. Relying on a multi-threaded result for a single-threaded application makes no sense, and conversely.

The example above with 7z decompression shows something we’ve known for a while now, which is that the armv8.2 and above memory controller found in A55/A76 etc is way ahead of the older one found in A53/A72, to the point that the A55 can sometimes beat the A72. If your workload consists in looking up or converting raw data (or even encryption) then you must not trust GB’s output at all and use the RK356x over the BCM2711 (particularly for crypto by the way).

As long as some people will be looking for a single metric it will remain pointless.

tkaiser · August 25, 2022, 6:09pm

Remember what @enoch said:

‘His’ Android scores were way lower than ‘my’ Linux result.

I tried out the Geekbench garbage now (first time in my life) on my Rock 5B. Initially without any modifications to the system: https://browser.geekbench.com/v5/cpu/16865661

Cpufreq governor was ondemand and reported clockspeeds see below [1]. Those tests are that short and GB doesn’t give a sh*t about switching to performance that this alone already explains results variation. Cpufreq settings slowly adjusting clockspeeds with cpufreq scaling will end up with crappy GB scores by design.

With performance governor on RK3588 results are (obviously) slightly better: https://browser.geekbench.com/v5/cpu/compare/16865735?baseline=16865661

But with slower SoCs and more unfortunate settings this behaviour to not switch cpufreq governor to performance prior to ‘benchmarking’ already explains a lot of this results garbage.

Let’s compare my results with prior scores collected by Geekbench browser… not possible since searching there doesn’t reveal anything more recent than 9th of July this year.

So let’s look at my redacted results collection: https://github.com/ThomasKaiser/sbc-bench/blob/master/results/geekbench-rk3588/results-redacted.md

My score is 587/2462. The multi score is higher than anything there (all results made with Android since due to Geekbench being that sort of crap you can’t search for Linux results [2]). But it can also clearly be seen that those first two results in my list (made internally by Rockchip) show better single-threaded results. Most probably the result of Rockchip engineers back then playing with DRAM timings and optimizing the sh*t out of it: https://browser.geekbench.com/v5/cpu/compare/11560856?baseline=16865735

Of course later they went with conservative memory settings when their customers started to play with RK3588 and their own designs.

Maybe that’s the reason I always get angry on Geekbench immediately: the cluelessness of its fan base. A decade ago when Apple relied on Intel CPUs they phased out quad-core i7 CPUs in their Mac Minis and continued only with dual-core i5 which got lower multi cores.

And ‘Mac admins’ got mad, ordered old quad-core i7 (‘server’) Mac Minis for their average users which were way worse for the use case where single-threaded performance was more important (much better with the new i5 Model compared to the older i7) and especially GPGPU performance. Back then even parts of the window manager already ran on the (i)GPU but those stupid Geekbench numbers fooled clueless admins to order older (and slower) machines for their unfortunate users.

[1] sbc-bench -m output while running Geekbench:

Rockchip RK3588 (35880000), Kernel: aarch64, Userland: arm64
CPU sysfs topology (clusters, cpufreq members, clockspeeds)
                 cpufreq   min    max
 CPU    cluster  policy   speed  speed   core type
  0        0        0      408    1800   Cortex-A55 / r2p0
  1        0        0      408    1800   Cortex-A55 / r2p0
  2        0        0      408    1800   Cortex-A55 / r2p0
  3        0        0      408    1800   Cortex-A55 / r2p0
  4        1        4      408    2400   Cortex-A76 / r4p0
  5        1        4      408    2400   Cortex-A76 / r4p0
  6        2        6      408    2400   Cortex-A76 / r4p0
  7        2        6      408    2400   Cortex-A76 / r4p0

Thermal source: /sys/devices/virtual/thermal/thermal_zone0/ (soc-thermal)

Time       big.LITTLE   load %cpu %sys %usr %nice %io %irq   Temp
19:07:11:  408/1800MHz  0.09   1%   0%   0%   0%   0%   0%  29.6°C
19:07:16:  408/ 408MHz  0.08   0%   0%   0%   0%   0%   0%  28.7°C
19:07:21:  408/ 408MHz  0.08   0%   0%   0%   0%   0%   0%  28.7°C
19:07:26:  408/1800MHz  0.07   1%   1%   0%   0%   0%   0%  28.7°C
19:07:31:  408/ 816MHz  0.06   0%   0%   0%   0%   0%   0%  28.7°C
19:07:37:  408/ 408MHz  0.06   0%   0%   0%   0%   0%   0%  27.8°C
19:07:42:  408/ 408MHz  0.05   0%   0%   0%   0%   0%   0%  28.7°C
19:07:47:  408/ 408MHz  0.05   0%   0%   0%   0%   0%   0%  28.7°C
19:07:52:  408/1800MHz  0.13   9%   1%   8%   0%   0%   0%  29.6°C
19:07:57:  816/1800MHz  0.20  12%   0%  12%   0%   0%   0%  30.5°C
19:08:02:  408/1800MHz  0.26  10%   0%   9%   0%   0%   0%  30.5°C
19:08:07:  408/1800MHz  0.32  10%   0%   9%   0%   0%   0%  30.5°C
19:08:12:  408/1800MHz  0.37  12%   0%  12%   0%   0%   0%  31.5°C
19:08:17:  408/ 600MHz  0.43   5%   0%   5%   0%   0%   0%  29.6°C
19:08:22:  408/1800MHz  0.47   7%   0%   7%   0%   0%   0%  30.5°C
19:08:28:  408/ 600MHz  0.51   8%   0%   8%   0%   0%   0%  29.6°C
19:08:33:  408/1800MHz  0.55  11%   0%  11%   0%   0%   0%  31.5°C
19:08:38:  408/1800MHz  0.59  12%   0%  12%   0%   0%   0%  31.5°C
19:08:43:  408/1800MHz  0.54   7%   0%   7%   0%   0%   0%  32.4°C
19:08:48:  408/1800MHz  0.58  10%   0%   9%   0%   0%   0%  31.5°C
19:08:53:  408/1800MHz  0.61  10%   0%   9%   0%   0%   0%  31.5°C
19:08:58:  408/1800MHz  0.56   7%   0%   7%   0%   0%   0%  31.5°C
19:09:03:  408/1800MHz  0.60  10%   0%  10%   0%   0%   0%  31.5°C
19:09:08:  408/1800MHz  0.63  10%   0%   9%   0%   0%   0%  32.4°C
19:09:13:  408/1800MHz  0.66  12%   0%  12%   0%   0%   0%  32.4°C
19:09:18:  408/1800MHz  0.61  10%   0%  10%   0%   0%   0%  32.4°C
19:09:23: 2400/1800MHz  0.64   9%   0%   8%   0%   0%   0%  35.2°C
19:09:28: 2400/1800MHz  0.59  71%   0%  70%   0%   0%   0%  35.2°C
19:09:33: 2400/1800MHz  0.86  80%   0%  80%   0%   0%   0%  36.1°C
19:09:38: 2400/1800MHz  1.43  80%   0%  80%   0%   0%   0%  37.0°C
19:09:43: 2400/1800MHz  1.96  85%   0%  85%   0%   0%   0%  37.9°C
19:09:48: 2400/1800MHz  2.44  65%   0%  65%   0%   0%   0%  34.2°C
19:09:54:  408/1800MHz  2.89  59%   0%  58%   0%   0%   0%  35.2°C
19:09:59: 2400/1800MHz  2.74  55%   0%  55%   0%   0%   0%  40.7°C
19:10:04:  408/ 408MHz  3.16  63%   0%  63%   0%   0%   0%  36.1°C
19:10:09:  408/ 600MHz  2.67  41%   0%  41%   0%   0%   0%  36.1°C
19:10:14: 1416/1800MHz  2.46  52%   1%  50%   0%   0%   0%  37.9°C
19:10:19: 2400/1800MHz  2.90  62%   0%  62%   0%   0%   0%  41.6°C
19:10:24: 2400/1800MHz  2.67  42%   1%  40%   0%   0%   0%  38.8°C
19:10:29: 2400/1800MHz  2.78  62%   0%  61%   0%   0%   0%  41.6°C
19:10:34:  408/ 408MHz  2.55  32%   0%  32%   0%   0%   0%  36.1°C
19:10:39: 2400/1800MHz  2.99  58%   1%  56%   0%   0%   0%  39.8°C
19:10:44: 2400/1800MHz  3.47  60%   0%  60%   0%   0%   0%  41.6°C
19:10:49: 2400/1800MHz  3.51  61%   0%  61%   0%   0%   0%  42.5°C
19:10:55: 2400/1800MHz  3.55  79%   0%  78%   0%   0%   0%  45.3°C
19:11:00: 2400/1800MHz  3.27  44%   0%  44%   0%   0%   0%  44.4°C
19:11:05: 2400/1800MHz  3.65  73%   0%  72%   0%   0%   0%  44.4°C
19:11:10: 2400/1800MHz  4.00  84%   0%  83%   0%   0%   0%  46.2°C
19:11:15:  408/ 600MHz  3.92  55%   0%  54%   0%   0%   0%  39.8°C
19:11:20:  408/1800MHz  3.68  11%   0%  11%   0%   0%   0%  37.9°C
19:11:25: 2400/1800MHz  3.71  77%   0%  77%   0%   0%   0%  43.5°C
19:11:30:  408/ 408MHz  4.05  70%   0%  70%   0%   0%   0%  39.8°C
19:11:35:  408/1800MHz  3.73  16%   0%  15%   0%   0%   0%  36.1°C

[2] You can’t search for RK3588 in Geekbench browser since their internal search queries are that crappy. If you search for a string this is only applied to some database fields and always combined with ‘whitespace before’ and ‘whitespace or end-of-line after’. To get RK3588 results you need to search for rk30sdk since this is for whatever stupid reason the ‘Motherboard’ name of various Rockchip SoCs when Geekbench Android is running (a software reporting the SDK name as hardware, next step is to eliminate all those other Rockchip SoCs that also show up with this search).

With Linux this is impossible since the ‘Motherboard’ is ‘N/A’ and the CPU is ‘ARM ARMv8’ and you can’t search in a sane way for the device’s name.

tkaiser · August 26, 2022, 2:34pm

sbc-bench learned the -G switch to provide a sane Geekbench environment:

root@rock-5b:/home/tk# sbc-bench -G

Average load and/or CPU utilization too high (too much background activity). Waiting...

Too busy for benchmarking: 16:07:41 up 36 min,  2 users,  load average: 0.15, 0.94, 1.01,  cpu: 15%
Too busy for benchmarking: 16:07:46 up 37 min,  2 users,  load average: 0.14, 0.92, 1.00,  cpu: 0%
Too busy for benchmarking: 16:07:51 up 37 min,  2 users,  load average: 0.13, 0.91, 0.99,  cpu: 0%
Too busy for benchmarking: 16:07:56 up 37 min,  2 users,  load average: 0.12, 0.89, 0.99,  cpu: 0%
Too busy for benchmarking: 16:08:01 up 37 min,  2 users,  load average: 0.11, 0.88, 0.98,  cpu: 0%
Too busy for benchmarking: 16:08:06 up 37 min,  2 users,  load average: 0.10, 0.86, 0.98,  cpu: 0%
Too busy for benchmarking: 16:08:11 up 37 min,  2 users,  load average: 0.09, 0.85, 0.97,  cpu: 0%

sbc-bench v0.9.8

Installing needed tools: Done.
Checking cpufreq OPP. Done.
Executing RAM latency tester. Done.
Executing Geekbench. Done.
Checking cpufreq OPP. Done (7 minutes elapsed).

   Single-Core Score     581                 
   Crypto Score          817                 
   Integer Score         564                 
   Floating Point Score  578                 
   
   Multi-Core Score      2440               
   Crypto Score          3276               
   Integer Score         2319               
   Floating Point Score  2562               

   https://browser.geekbench.com/v5/cpu/16883114

Full results uploaded to http://ix.io/48HU.

willy · August 26, 2022, 4:18pm

Interesting for comparisons later with results provided without adjusting/monitoring. However, don’t you think you should run the single-core tests on each CPU cluster individually ? Otherwise you’re facing the same problem as everyone else which is that a random core will be used for single-core tests and that the associated numbers are not trustable.

tkaiser · August 26, 2022, 5:12pm

Yes, that was the idea. I slightly improved output in the meantime also collecting the individual test results…

Yes, but the problem is that the geekbench binary only allows ‘all or nothing’ so when pinning execution to a core also the multi-core test then runs with n threads in parallel on this single core which takes ages

Tested on Rock 5B:

taskset -c 1 ./geekbench5 --cpu: https://browser.geekbench.com/v5/cpu/16883949
taskset -c 7 ./geekbench5 --cpu: https://browser.geekbench.com/v5/cpu/16884179
no adjustments: https://browser.geekbench.com/v5/cpu/16883114

Last two runs in compare mode: https://browser.geekbench.com/v5/cpu/compare/16884179?baseline=16883114

I’m firing up my NanoPi R5S later to compare the A55 scores there with the 1st score above to see whether/how RK3588’s way better memory performance has an influence. Ah, then I also need to retest on Rock 5B with ``taskset -c 0-3`.

Well, if the scheduler is properly set up then with different clusters the run should end up on a big core (see Rock5B result comparison above).

The annoying thing with Geekbench is while single-threaded a big core is used, the information about clockspeeds is derived from a little core (grabbing sysfs nodes for cpu0).

But you’re right, I should execute the benchmark on a core from each cluster, then filter out the irrelevant multi-core results. On Rock 5B for example this will result in duration of the whole test more likely an hour than the few minutes now.