Introduce ROCK 5B - ARM Desktop level SBC

Jeff_McWilliams · July 6, 2022, 5:22pm

For the sbc-bench numbers, do they give us any sort of ballpark performance estimate of how this SBC compares to RPi4, a Jetson Xavier NX, etc?

Thanks

stuartiannaylor · July 6, 2022, 5:51pm

github.com

ThomasKaiser/sbc-bench/blob/master/Results.md

# Results

Below some results collected. Please keep in mind that these are **NOT** hardware performance numbers but depend on software/settings (see the differences kernel version makes for RockPro64 for example). The purpose of `sbc-bench` is to generate insights and not colorful graphs representing numbers without meaning. It's perfectly fine for the same hardware appearing multiple times with different numbers since those differ for a reason (software/settings).

Especially *openssl* numbers should be taken with a huge grain of salt since the benchmark numbers depend on kernel features and performance with other use cases (e.g. disk/filesystem encryption) [might look  differently](https://forum.armbian.com/topic/7763-benchmarking-cpus/?do=findComment&comment=59235).

So do **not** rely on collected numbers unless you carefully read through all the explanations and insights below and be prepared to conduct your own benchmarks if you really want to choose appropriate hardware for **your** use case.

## Some numbers

*ODROID-M1, Quartz64, ROCK 3A, ROCK 5B, RK3568-ROC-PC and Khadas VIM4 numbers are preliminary since software support situation for RK3566/RK3568/RK3588 and A311D2 is still in a very early stage. Please also note that with RK35xx SBC so far [real measured clockspeeds are somewhat unstable](https://forum.odroid.com/viewtopic.php?p=350782#p350782).*

| Device / details | Clockspeed | Kernel | Distro | 7-zip | AES-128 (16 byte) | AES-256 (16 KB) | memcpy | memset | kH/s |
| ----- | :--------: | :----: | :----: | ----: | ------: | ------: | -----: | -----: | ---: |
| [Akaso M8S](http://ix.io/3R3N) | 1200 MHz | 5.10 | Buster armhf | 3050 | 32050 | 32120 | 1160 | 3330 | - |
| [Amazon a1.xlarge](http://ix.io/2iFY) | 2300 MHz | 4.15 | Bionic arm64 | 8610 | 458500 | 1297960 | 4280 | 14220 | - |
| [AMedia X96 Max+](http://ix.io/3QOj) | 2100 MHz | 5.15 | Focal arm64 | 5270 | 197690 | 981830 | 2630 | 5150 | - |
| [Apple M1 power core](http://ix.io/2Gjg) | 3200 MHz | 5.8 | Groovy arm64 | 5100 | 638810 | 1119670 | 27960 | 63950 | 5.30 |
| [Apple M1 efficiency](http://ix.io/2Gtf) | *600* MHz | 5.8 | Groovy arm64 | 650 | 150340 | 224520 | 5130 | 7640 | 0.76 |
| [BPi M2U](http://ix.io/3TKh) | 1010 Mhz | 5.16 | Buster armhf | 2230 | 15550 | 19540 | 790 | 2540 | - |

This file has been truncated. show original

tkaiser · July 6, 2022, 6:00pm

Of course there’s a results list.

But you need to keep in mind that synthetic benchmarks are something different than real-world tasks and that ‘use case’ always matters more than some numbers or graphs that could be compared without using a brain. If the benchmark numbers do not represent your use case then they’re worthless for you.

And since sbc-bench focuses on ‘server workloads in general’ (the 7-zip-MIPS score) I highly doubt these numbers are that useful for average Rock 5B buyers.

Since with this focus on integer and memory performance all the other stuff that makes up RK3588 is not even remotely considered:

GPU (2D/3D acceleration)
VPU (accelerated video encoding/decoding)
NPU
in general the media capabilities like display and camera support, ‘picture in picture’ and so on (maybe only ever working in Android and not Linux)
IO: RK3588 has serious IO capabilities especially compared to toys like an RPi 4

Oneofmany888 · July 6, 2022, 8:39pm

Raspberry Pi V 4xC A72 4xc A52 + G78 MP8 with RISCV ISA with LPDDR4x 4266MHz from 5 to 10v overclockable is perfect for people.
2x faster than previous one that can run every OS from Ubuntu to Manjaro, Linux mint up to custom Raspbian OS (mix between RaspOS and Twister funny and original than others).

What I want is, the team should have to add PCI-E 3.0 for VGA eGPU that works in parallel with iGPU of the processor in three cuts:1 G78
MP14 MP20 and the most powerful MP24 .

They will be millionaires .

We happy because save energy

enoch · July 6, 2022, 8:43pm

thanks for the link, it seems like even 3A is almost as fast as Pi 4B, so perhaps it is time for me to sell it (just don’t wanna waste it) and consider a 3A as replacement, and perhaps overclock it a bit to match Pi 4B;

and 5B is like 3x of Pi 4B… on that “server workload in general”. I think it is still useful, as it should tell the integer ALU performance and likely branching performance as well. for GPU, VPU, NPU and else, they depend on the maturity of drivers, so I won’t be considering them for the time being.

tkaiser · July 6, 2022, 9:20pm

Do you have a recipe for this with any recent Rockchip SoC like RK3568 or RK3588?

We’re currently struggling to understand what determines clockspeeds since all those devices show ‘random’ behaviour, see here and there.

The mechanism at work seems to be called Process-Voltage-Temperature Monitor (PVTM).

Jeff_McWilliams · July 6, 2022, 10:21pm

@tkaiser

Thanks. I’m probably not an average Arm SBC user.
I’m a developer for Altair Accelerator, and we have some limited support in our product for Armv8. Whatever customers are using that are probably doing so on Arm server systems, maybe even proprietary ones that aren’t available to the generic public.

But because we do support the platform, I’m interested in tinkering at home on an Arm system, but more as a C/C++/Python development box rather than a Desktop, NAS, or a game emulation platform. Sure, I have access to some Arm hosts in the cloud, but they’re slow and inconvenient for me to use, and I don’t want to ask for $$ to spin up an AWS instance just so that I can play. The Honeycomb LX2, NVidia Jetson SBC, or rk3588 are probably closest to what I’m looking for, but there’s a lot in the NVidia hardware that I probably wouldn’t ever make use of. For me, it would be a host I put on my LAN and ssh into.

Thanks for the numbers.

stuartiannaylor · July 6, 2022, 11:49pm

Presume it just tweaks the timings so everything is optimum, dunno PVTM is new to me but been wondering if the OPP table does anything or really its like the amlogic where everything seems hardcoded in the MCU blob.

Have you tried DTC and editing the OPP table either under or overclock and see if its totally ignored?
Preferably OC to see if it does?

linuxlion · July 7, 2022, 12:37am

Hi @Mecca, read this yesterday and gained even more appreciation for the work that goes into finalizing a SBC for release:

linuxlion · July 7, 2022, 12:42am

Hi @NGBRO, this is what I did with my R0. Thinking I’ll do something similar with the 5B, likely adding an inlet and outlet for the fan until something I like becomes available to purchase or 3D print

tkaiser · July 7, 2022, 5:33am

Check sbc-bench output in detail. That’s two different phenomenons:

the cpufreq driver is hiding certain cpufreq OPP (on @willy’s board also the top ones)
real clockspeeds differ from cpufreq OPP in different directions

As an example ‘my’ board:

Checking cpufreq OPP for cpu0-cpu3 (Cortex-A55):

Cpufreq OPP: 1800    Measured: 1828 (1828.663/1828.622/1828.125)     (+1.6%)
Cpufreq OPP: 1608    Measured: 1645 (1645.519/1645.452/1644.982)     (+2.3%)
Cpufreq OPP: 1416    Measured: 1422 (1422.748/1422.654/1422.544)
Cpufreq OPP: 1200    Measured: 1230 (1231.014/1230.882/1230.354)     (+2.5%)
Cpufreq OPP: 1008    Measured: 1062 (1062.635/1062.504/1061.903)     (+5.4%)
Cpufreq OPP:  816    Measured:  845    (845.559/845.516/844.695)     (+3.6%)
Cpufreq OPP:  600    Measured:  587    (590.172/589.922/583.196)     (-2.2%)
Cpufreq OPP:  408    Measured:  391    (391.348/391.180/390.991)     (-4.2%)

Checking cpufreq OPP for cpu4-cpu5 (Cortex-A76):

Cpufreq OPP: 2400    Measured: 2348 (2348.432/2348.405/2348.268)     (-2.2%)
Cpufreq OPP: 2208    Measured: 2185 (2185.642/2185.619/2185.571)
Cpufreq OPP: 2016    Measured: 2016 (2017.078/2016.977/2016.750)
Cpufreq OPP: 1800    Measured: 1817 (1817.134/1817.114/1816.991)
Cpufreq OPP: 1608    Measured: 1625 (1625.664/1625.632/1625.255)     (+1.1%)
Cpufreq OPP: 1416    Measured: 1437 (1437.125/1437.110/1436.982)     (+1.5%)
Cpufreq OPP: 1200    Measured: 1259 (1259.240/1259.132/1258.933)     (+4.9%)
Cpufreq OPP: 1008    Measured: 1056 (1056.646/1056.527/1056.387)     (+4.8%)
Cpufreq OPP:  816    Measured:  849    (850.073/850.012/849.793)     (+4.0%)
Cpufreq OPP:  600    Measured:  592    (592.260/592.234/592.154)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.444/394.302/394.288)     (-3.4%)

Checking cpufreq OPP for cpu6-cpu7 (Cortex-A76):

Cpufreq OPP: 2400    Measured: 2348 (2348.350/2348.268/2348.159)     (-2.2%)
Cpufreq OPP: 2208    Measured: 2185 (2185.548/2185.453/2185.311)
Cpufreq OPP: 2016    Measured: 2015 (2015.390/2015.315/2015.114)
Cpufreq OPP: 1800    Measured: 1813 (1813.888/1813.888/1813.847)
Cpufreq OPP: 1608    Measured: 1620 (1620.361/1620.263/1620.149)
Cpufreq OPP: 1416    Measured: 1429 (1429.315/1429.204/1429.204)
Cpufreq OPP: 1200    Measured: 1246 (1246.176/1246.056/1246.026)     (+3.8%)
Cpufreq OPP: 1008    Measured: 1048 (1048.325/1048.240/1048.229)     (+4.0%)
Cpufreq OPP:  816    Measured:  842    (842.671/842.611/842.422)     (+3.2%)
Cpufreq OPP:  600    Measured:  592    (592.273/592.234/592.161)     (-1.3%)
Cpufreq OPP:  408    Measured:  394    (394.366/394.288/394.233)     (-3.4%)

The cpufreq driver decides to hide 2256, 2304 and 2352 from the OPP table (which might be a reasonable choice since the 200 MHz step between 2208-2400 is ok) but in @willy’s case cpu4-5 get 2304 MHz as highest OPP and cpu6-7 get 2352 MHz. The driver hides the 2400 OPP regardless what the MCU inside the SoC decides to use as real clockspeeds for each cpufreq OPP.

And you should be aware that Amlogic’s clockspeed cheating stopped with most recent SoCs. The last ones where we observed a hardcoded difference between cpufreq OPP and real clockspeeds was GXL/GXM (S905X and S912 where cpufreq OPP says 1.5 GHz while in reality it’s 1.4 GHz)

tkaiser · July 7, 2022, 6:50am

For such a use case I would ssh into a locally running Linux VM (having to admit that I’m typing these lines on an ARMv8.5-A laptop)

Allen.Smithee · July 7, 2022, 9:50am

Anyway, I don’t understand the 8nm process
I put this down here,

stuartiannaylor · July 7, 2022, 10:31am

Allen 8nm is just the minimum resolution of lithographic process on the silicon wafer and how dense the resultant SoC can be (in rough terms).
None of us understand that process as its massively valuable IP, but we don’t need to understand manufacture process just operation of the SoC and as @tkaiser posted we already have some absolutely whopping sized technical references datasheets for the RK3588.

@tkaiser as a test if you OC the OPP does the soc take any notice at all would still be interesting.
The amlogic was not a cheat as likely 10% is likely within tolerance of quoted specs what they where doing is locking the speed with a licensed blob.
The quoted specs was just the standard sales speak as many things are cheats where ever they can legally and presume it was.
The GPU on the S905Y2 seemed to be locked whilst on the a311d clockspeed is locked via a mcu blob.
I guess its the prerogative of the SoC manufacture and wondering even though we can not test GPU does the CPU OC?

Allen.Smithee · July 7, 2022, 11:05am

I completely accept the things,
I will certainly compared to others have the most recreational and non-critical use of this SOC.

Frankly I’m waiting a RK3588 Pico/nano ITX fonctionnal board with full width heatsink and sell it including postage to me at a low price, and it’s all good !

Jeff_McWilliams · July 7, 2022, 11:14am

I do use KVM on my Ubuntu 20 Linux desktop running a Ryzen 9 5950 with 64GB RAM, and I’ve even managed to set up an Arm (aarch64) VM via Qemu as well, but a) It was VERY difficult to get working and b) It feels slow.

It also wouldn’t be my first Arm SBC. I have a RPi 3B+ running Raspbian. It’s a DNS for my local LAN and it drives my old Epson flatbed scanner, which no longer has Windows support.

tkaiser · July 7, 2022, 11:29am

There is no ‘8nm process’ since these numbers today are just marketing BS. At least according to TSMC’s vice president of corporate research (TSMC is the foundry having the most advanced process nodes today so maybe he knows a little bit what he’s talking about):

tkaiser · July 7, 2022, 11:34am

Yeah, but I was talking about running an arm64 Linux distribution in a VM on one of these cores which is really easy and super fast since both cores and OS support virtualization. But running macOS isn’t everybody’s 1st choice so nevermind

Jeff_McWilliams · July 7, 2022, 11:50am

Oh, on an M1/M2 Mac? Yeah, since I don’t already own one, it would be a really expensive way to go.

tkaiser · July 7, 2022, 10:59pm

Collection of insights so far