Video : My first CPU benchmarks on the Rock5 in Linux

NicoD · July 20, 2022, 11:49am

Hi all. I’ve gotten access to a Rock5b test board. I’ve done some quick benchmarks and made a video about it.
Interesting to see the great A76 performance.
Here it is, greetings. NicoD

tkaiser · July 20, 2022, 1:01pm

OMG, you still spread this insane BS at 5:50

While I already explained this to you multiple times another attempt:

What you’re seeing with 7-zip not utilizing all cores at 100% has nothing to do with ‘uneven performance’ or whatever else sick explanation you pulled from somewhere but with the task being bottlenecked by memory access.

The four A55 cores tested by sbc-bench at 1830 MHz:

RAM size:   15723 MB,  # CPU hardware threads:   8
RAM usage:    882 MB,  # Benchmark threads:      4

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       4384   346   1233   4265  |      86337   396   1859   7366
23:       4254   353   1229   4335  |      85544   399   1854   7402
24:       4267   364   1262   4589  |      83051   397   1838   7291
25:       4156   369   1284   4746  |      81616   399   1822   7264
----------------------------------  | ------------------------------
Avr:             358   1252   4484  |              398   1843   7331
Tot:             378   1548   5907

RK3568, same cores, same kernel, same settings, same OS, same benchmark, but A55s clocked ~100 MHz higher:

RAM size:    1958 MB,  # CPU hardware threads:   4
RAM usage:    882 MB,  # Benchmark threads:      4

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:       2897   354    797   2818  |      82936   394   1797   7076
23:       2853   370    786   2907  |      80935   394   1777   7003
24:       2728   376    780   2933  |      78674   393   1756   6906
25:       2627   381    788   3000  |      76137   393   1723   6776
----------------------------------  | ------------------------------
Avr:             370    788   2915  |              394   1763   6940
Tot:             382   1275   4928

What can we see from these numbers? Memory access matters!

While the A55 in little RK3568 are clocked higher, the 7-ZIP score is lower and also CPU utilization is lower when decompressing: 99.5% on RK3588 and only 98.5% on RK3568.

According to your theory about ‘uneven performance’ (or whatever you called it in the past – you know why YouTube tech videos are crap? Since no text you could quickly search through! One must go through all the annoying babbling all the time!) you would expect 100%, right?

Quick check of sbc-bench’s results list [1] reveals the following CPU utilization for different SoCs when executing 7z b on all cores:

SoC	compression	decompression
BCM2835	99	98
BCM2836/BCM2709	309	397
Allwinner A64 or https://tinyurl.com/yyf3d7fg	357	398
Allwinner H3/H2+	333	398
Amlogic S905X2/S905Y2/S905D2/T962X2	331	394
SigmaStar SSD201/SSD202D	160	199
Allwinner H3/H2+	332	395
2 x ThunderX CN8890	8692	8937
Amlogic S905	321	394
Allwinner H5	360	397
BCM2711B0	366	399
Amlogic S905X3	326	391
Allwinner A20	162	189
BCM2835	99	98
Amlogic S922X	567	508
Rockchip RK3399	545	530
Allwinner A20	164	189
BCM2711B0	349	397
Nvidia Jetson Nano	303	395
2 x ThunderX CN8890	7947	8663
2 x ThunderX CN8890	8344	8439
Allwinner A20	164	190
Amlogic S905X2/S905Y2/S905D2/T962X2	327	382
Rockchip RK3399	573	517
Amlogic Meson GXL (S905X) Revision 21:c (84:2)	301	394
Amlogic Meson SM1 (S905X3) Revision 2b:c (10:2)	370	392
Amlogic Meson GXBB (S905) Revision 1f:c (13:1)	299	377
Amlogic Meson G12B (S922X) Revision 29:c (40:2)	566	507
Amlogic Meson G12B (A311D) Revision 29:b (10:2)	568	508
Amlogic Meson8m2 (S812) RevA (1d - 0:74E) detected	330	395
Rockchip RK3568 (35681000)	367	394
Amlogic Meson8m2 (S812) RevA (1d - 0:74E) detected	339	387
Allwinner R40/V40	322	385
Amlogic Meson SM1 (S905X3) Revision 2b:c (10:2)	325	396
Rockchip RK3566 or RK3568	372	396
Nvidia Jetson Nano	301	367
Rockchip RK3568 (35682000)	365	395
BCM2711B0	370	395
Amlogic Meson SM1 (Unknown) Revision 2b:b (40:2)	359	394
Amlogic Meson G12B (A311D) Revision 29:b (10:2)	564	509
Amlogic A311D2	732	715
Rockchip RK3288	353	397
Rockchip RK3588 (35880000)	743	663
Rockchip RK3588 (35880000)	742	692
Nvidia AGX Xavier	509	589
Rockchip RK3328	367	398
Samsung/Nexell S5P6818	670	796
Rockchip RK3188	325	377
Allwinner D1	93	97
Rockchip RK3568	369	393
Amlogic Meson GXM (S912) Revision 22:a (82:2)	701	712
NXP i.MX6 Quad	320	388
Rockchip RK3588 (35880000)	742	679
Amlogic Meson8 (S802) RevC (19 - 0:27ED)	303	395
Kendryte K510	141	199
Phytium D2000	755	782
Phytium D2000	743	782

Even single core SoCs that feature a crappy memory controller are far away from reaching 100%, see Allwinner D1 for example. It has nothing to do with type of cores or ‘something uneven’ like you spread since years but with cores fighting over memory access!

So please stop spreading this BS! As well as wrong info about clockspeeds in this video when all you was reporting was cpufreq OPP and not clockspeeds (they need to be measured like sbc-bench is doing it).

BTW: if all you’re checking for is 100% CPU utilization then some lightweight joke like while true ; do yes >/dev/null; done on all cores is all that’s needed!

[1] just parsing the info everybody has at his hands since it happens in the open:

tk@mac-tk results % grep "SoC guess" *.txt | while read ; do
    SoCName="$(awk -F": " '{print $2}' <<<"${REPLY}")"
    ResultsFile=$(cut -f1 -d':' <<<"${REPLY}")
    echo "|  [${SoCName}](http://ix.io/$(basename ${ResultsFile} .txt)) | $(awk -F" " '/^Avr/ {print $2" | "$6}' "${ResultsFile}" | tail -n1) |"
done

tkaiser · July 20, 2022, 1:38pm

Utilization divided by core count and also older sbc-bench results considered that missed SoC guessing:

Device	cores	comp single	decomp single	comp multi	decomp multi
v0.6.3 ODROID-N2	6	99	99	91	88
v0.6.2 ODROID-N2	6	99	99	93	90
v0.6.6 Realtek_Lion_Skin_1GB	4	-	-	-	-
v0.6.6 SolidRun LX2160A COM express type 7 module	16	100	100	85	97
v0.6.7 jetson-nano	4	99	100	75	98
v0.6.7 Khadas VIM3	6	100	100	93	84
v0.6.7 RPi 4B Rev 1.1	4	99	100	89	98
v0.6.8 icosa	4	100	100	80	98
v0.6.9 Radxa ROCK Pi S	4	100	100	83	99
v0.4.3 Odroid XU4	8	99	99	64	78
v0.4.6 nexell soc	8	100	100	78	99
v0.4.6 Libre Technology CC	4	100	100	75	99
v0.4.6 Rockchip RK3288 Tinker Board	4	100	100	89	95
v0.4.6 ODROID-C2	4	100	100	77	96
v0.4.6	4	100	100	82	96
v0.5 Odroid XU4	8	100	100	64	78
v0.5 FriendlyElec NanoPC-T4	6	100	100	65	88
v0.5 Orange Pi Plus / Plus 2	4	100	100	82	97
v0.5 Pine64 Rock64	4	100	100	84	98
v0.5 Pine64 Rock64	4	100	100	84	98
v0.5 FriendlyElec NanoPC-T4	6	100	100	77	88
v0.5 Olimex A10-OLinuXino-LIME	1	93	94	-	-
v0.5 Orange Pi PC Plus	4	100	100	81	98
v0.5.3 Helios4	2	100	100	79	99
v0.5.4 Pine H64	4	100	100	88	99
v0.5.3 nexell soc	8	100	100	78	99
v0.5.3 nexell soc	8	100	100	78	99
v0.5.4 Globalscale Marvell ESPRESSOBin Board	2	100	100	74	99
v0.6.1 Pine64 RockPro64	6	100	100	85	88
v0.6.1 Globalscale Marvell ESPRESSOBin Board	2	100	100	77	97
v0.5.4 FriendlyElec NanoPC-T4	6	100	100	88	86
v0.5.6 FriendlyElec NanoPi M4	6	100	100	70	86
v0.6.1 RPi Zero W Rev 1.1	1	98	99	-	-
v0.6.1 FriendlyElec NanoPi M4	6	100	100	85	87
v0.6.1 FriendlyElec NanoPi NEO4	6	100	100	82	86
v0.6.1 FriendlyElec NanoPi M4	6	100	100	85	87
v0.6.1 FriendlyElec NanoPi NEO4	6	100	100	80	88
v0.6.1 ROCK PI 4B	6	100	100	86	90
v0.6.2 Khadas Captain	6	100	100	88	86
v0.6.2 ROCK PI 4B	6	100	100	87	86
v0.6.2 Olimex A64 Teres-I	4	100	100	83	97
v0.6.2 Cubietech Cubietruck	2	100	100	81	93
v0.6.2	4	100	100	93	99
v0.6.2 Khadas Captain	6	100	100	87	87
v0.6.2 Pine64 RockPro64	6	100	100	82	88
v0.6.9 Radxa ROCK Pi 4	6	100	100	89	86
v0.6.9 Pine H64	4	100	100	89	99
v0.6.9 Khadas VIM3L	4	100	100	88	98
v0.6.9 SolidRun i.MX8MQ HummingBoard Pulse	4	100	100	81	99
v0.7.4 RPi 400 Rev 1.0	4	100	100	92	98
v0.7.4	32	100	100	89	99
v0.7.5	1	100	100	-	-
v0.7.5	1	100	100	-	-
v0.7.5 Hugsun X99 TV BOX	6	99	100	90	89
v0.7.1	4	100	100	85	99
v0.7.2 Orange Pi Prime	4	100	100	88	98
v0.7.1 ODROID-C4	4	100	100	88	98
v0.7.2 Pine64 RockPro64	6	100	100	91	86
v0.7.2 Pine64 RockPro64 v2.1	6	100	100	94	86
v0.7.9 RPi Zero 2 Rev 1.0	4	100	100	81	99
v0.7.9 RPi Zero 2 Rev 1.0	4	100	100	75	99
v0.8.1 Nintendo Switch	4	97	98	76	97
v0.8.1 ODROID-N2Plus	6	99	100	93	84
v0.8.3 RPiB Rev 2	1	96	97	-	-
v0.8.3 RPi Zero 2 Rev 1.0	4	100	100	76	99
v0.8.4 Olimex A20-OLinuXino-LIME2-eMMC	2	92	92	81	94
v0.8.4 RPi 4B Rev 1.1	4	100	100	84	99
v0.8.4 RPi 4B Rev 1.1	4	91	100	74	99
v0.8.4 RPi 4B Rev 1.1	4	100	100	88	99
v0.8.4 RPi 4B Rev 1.1	4	100	100	88	99
v0.8.5 RPi 4B Rev 1.1	4	100	100	88	99
v0.8.6 FriendlyARM NanoPi NEO4	6	100	100	92	86
v0.8.6 Odroid XU4	8	99	99	94	84
v0.8.8 RPi 4B Rev 1.1	4	100	100	86	99
v0.8.8 Radxa Zero	4	99	100	83	98
v0.9.0 RPi 4B Rev 1.4	4	100	100	91	97
v0.9.0 RPi 4B Rev 1.4	4	100	100	88	97
v0.9.1 ASUS Tinker Board	4	100	100	90	95
v0.8.7 RPi 4B Rev 1.4	4	100	100	90	98
v0.9.1 ODROID-N2Plus	6	99	100	92	84
v0.9.1 Generic RK322x TV Box board	4	100	100	85	96
v0.9.1 FriendlyElec NanoPi M4 Ver2.0	6	100	100	93	86
v0.9.1 BCM2835	1	99	98	-	-
v0.9.1 BCM2836/BCM2709	4	100	100	77	99
v0.9.1 Allwinner A64 or https://tinyurl.com/yyf3d7fg	4	100	100	89	99
v0.9.1 Allwinner H3/H2+	4	100	100	83	99
v0.9.1 Amlogic S905X2/S905Y2/S905D2/T962X2	4	99	100	82	98
v0.9.1 ODROID-HC4	4	99	100	84	98
v0.9.1 TRONFY MXQ S805	4	100	100	80	99
v0.9.1 ODROID-N2	6	99	100	90	91
v0.9.1 Khadas VIM	4	99	99	77	97
v0.9.2 SigmaStar SSD201/SSD202D	2	100	100	79	99
v0.9.2 Allwinner H3/H2+	4	100	100	83	97
v0.9.2 2 x ThunderX CN8890	96	100	100	90	91
v0.9.2 Amlogic S905	4	99	100	80	98
v0.9.2 Allwinner H5	4	100	100	89	99
v0.9.2 BCM2711B0	4	100	100	91	99
v0.9.2 Amlogic S905X3	4	99	99	81	98
v0.9.2 Allwinner A20	2	92	92	81	94
v0.9.2 BCM2835	1	99	98	-	-
v0.9.2 Amlogic S922X	6	99	100	94	84
v0.9.2 Rockchip RK3399	6	100	100	92	88
v0.9.2 Allwinner A20	2	100	100	81	94
v0.9.2 BCM2711B0	4	100	100	87	99
v0.9.3 Nvidia Jetson Nano	4	100	100	75	99
v0.9.2 2 x ThunderX CN8890	96	100	100	84	89
v0.9.2 2 x ThunderX CN8890	96	100	100	85	89
v0.9.3 Allwinner A20	2	92	92	81	95
v0.9.3 Amlogic S905X2/S905Y2/S905D2/T962X2	4	99	100	81	95
v0.9.3 Rockchip RK3399	6	100	100	93	86
v0.9.3 Amlogic Meson GXL (S905X) Revision 21:c (84:2)	4	99	100	75	98
v0.9.3 Amlogic Meson SM1 (S905X3) Revision 2b:c (10:2)	4	100	100	92	98
v0.9.3 Amlogic Meson GXBB (S905) Revision 1f:c (13:1)	4	96	96	74	94
v0.9.3 Amlogic Meson G12B (S922X) Revision 29:c (40:2)	6	99	100	94	84
v0.9.3 Amlogic Meson G12B (A311D) Revision 29:b (10:2)	6	99	99	94	84
v0.9.3 Amlogic Meson8m2 (S812) RevA (1d - 0:74E) detected	4	100	100	81	98
v0.9.3 Rockchip RK3568 (35681000)	4	100	100	91	98
v0.9.3 Amlogic Meson8m2 (S812) RevA (1d - 0:74E) detected	4	100	100	83	96
v0.9.3 Allwinner R40/V40	4	100	100	80	95
v0.9.3 Amlogic Meson SM1 (S905X3) Revision 2b:c (10:2)	4	99	100	81	98
v0.9.3 Rockchip RK3566 or RK3568	4	100	100	93	98
v0.9.3 Nvidia Jetson Nano	4	99	99	76	91
v0.9.3 Rockchip RK3568 (35682000)	4	100	100	91	98
v0.9.3 BCM2711B0	4	100	100	92	99
v0.9.3 Amlogic Meson SM1 (Unknown) Revision 2b:b (40:2)	4	99	100	89	98
v0.9.3 Amlogic Meson G12B (A311D) Revision 29:b (10:2)	6	99	99	93	84
v0.9.3 Khadas VIM4	8	99	99	93	90
v0.9.4 Amlogic A311D2	8	99	99	92	89
v0.9.4 Rockchip RK3288	4	100	100	87	99
v0.9.6 Rockchip RK3588 (35880000)	8	100	100	92	83
v0.9.6 Rockchip RK3588 (35880000)	8	100	100	92	86
v0.9.6 keeper.lan	16	100	100	92	98
v0.9.7 Nvidia AGX Xavier	6	98	99	84	98
v0.9.8 Rockchip RK3328	4	100	100	92	99
v0.7.7 Pine64 RK3566 Quartz64-A Board	4	99	99	83	99
v0.7.7 Radxa Zero	4	99	100	83	98
v0.9.8 Rockchip RK3188	4	95	97	81	94
v0.9.8 Allwinner D1	1	93	97	-	-
v0.9.8 Rockchip RK3568	4	100	100	92	98
v0.9.8 Amlogic Meson GXM (S912) Revision 22:a (82:2)	8	100	100	87	88
v0.9.8 NXP i.MX6 Quad	4	97	98	79	97
v0.9.8 T-HEAD c910 ice	2	100	100	81	99
v0.9.8 / Celeron J1900 @ 1.99GHz	4	100	100	85	98
v0.9.8 Amlogic Meson8 (S802) RevC (19 - 0:27ED)	4	100	100	79	98
v0.9.8 Kendryte K510	2	100	100	71	99
v0.9.8 Apple MacBook Pro	10	100	100	90	77
v0.9.8 Phytium D2000	8	100	100	93	98
v0.9.8 Phytium D2000	8	100	100	93	98
v0.9.8 Silicom Minnowboard Turbot D0/D1 PLATFORM D0/D1 / Atom	2	99	99	85	98

Look at those SoCs/CPUs that are built for server tasks with 16 or even 2 x 48 cores. They can keep up with memory access since they’re designed for the task unlike cheap SoCs from the Android e-waste category!

NicoD · July 20, 2022, 3:59pm

4 x the same cores

400% usage in decompression what I only use.
Small difference in core sizes. 2.25 vs 2.3Ghz for cluster 2 and 3.

You see small difference in usage. 397%
When difference is larger you see a lower percentage.

All 8 cores and only 676% used of the 800%.
So my point is that using all core 7z b isn’t a reliable source. Better to do clusters seperate, or only single cores. That’s just my point.

tkaiser · July 20, 2022, 8:42pm

Nice example of ignorance. Care to understand that there’s tons of examples above with ‘4 x the same cores’ that do not get a 400% decompression utilization for the simple reason that these SoCs originate from the Android e-waste world and chip internals are massively bottlenecking fully parallel operation of all cores.

Check ‘v0.9.8 Rockchip RK3188’, ‘v0.9.1 ASUS Tinker Board’ or the Amlogic S905 and RPi 4B results.

As such it’s the obvious result of adding more cores that CPU utilization further decreases since that stuff is too demanding. And actually CPU utilization is information – see the notes about ODROID XU4.

And why do you use only the decompression value? Which real-world use case is represented by this task?

Why the hell? Since you’re obsessed by 100% CPU utilization or is there another reason why you throw away useful information?

tkaiser · July 20, 2022, 9:08pm

SoC/device	compression	decompression
RK3188	325	377
RK3288	354	399
RK3288	332	387
RK3288	310	385
RK3288	284	386
RK3288	302	386
RK3288	353	397
RK3288	323	395
RK3288	335	396
RK3288	359	381
RK3288	358	381
RK3288	364	381
S905	320	397
S905	310	393
S905	310	393
S905	321	394
S905	312	377
RPi 4B	354	398
RPi 4B	351	396
RPi 4B	368	393
RPi 4B	353	392
RPi 4B	318	354
RPi 4B	339	387
RPi 4B	350	385
RPi 4B	315	375
RPi 4B	284	381
RPi 4B	364	395
RPi 4B	354	399
RPi 4B	362	388
RPi 4B	363	395
RPi 4B	357	397
RPi 4B	347	398
RPi 4B	365	393
RPi 4B	348	395
RPi 4B	357	397
RPi 4B	340	372

Don’t you agree that CPU utilization is actual information?

And keep in mind that these are all quad-core SoCs with same core types, no ‘different core types’ or ‘uneven performance’ BS.

tkaiser · July 20, 2022, 9:22pm

When looking at RK3288 alone and checking also the kernel version (again, check the ODROID XU4 notes in sbc-bench documentation) then it should be obvious why CPU utilization is information:

SoC/device	compression	decompression	kernel
RK3288	354	399	5.10
RK3288	332	387	5.15
RK3288	310	385	5.15
RK3288	284	386	5.15
RK3288	302	386	5.15
RK3288	353	397	5.10
RK3288	323	395	5.15
RK3288	335	396	5.15
RK3288	359	381	5.15
RK3288	358	381	5.15
RK3288	364	381	5.15

7-zip’s internal benchmark is such a cheap and effective way to do regression testing but people (like the Armbian folks) still just ignore it.

NicoD · July 25, 2022, 4:43pm

You are correct, memory bandwidth seems important here.
I don’t trow the info away, I often show it, but not this time.(not a hardware review video but CPU benchmark)

I prefer to use Blender for multi-core CPU benchmark, and 7zip for single core. But I still do the multi-core and keep the data.

My theory was that 7z b always gives equal tasks. So for 8-cores. 8 equal tasks for each core. When performing them, the big cores are finished before the small cores.
So I might be wrong and am not afraid to admit this.

No idea what that has to do with this discussion? It is a discussion between you and me, Armbian is not involved in me making this video or my benchmarks.

Thank you for sharing the information. Have a nice day.

tkaiser · July 26, 2022, 7:42am

Bandwidth? You know the difference between bandwidth and latency?

Why? Also why do you use only the 7-zip decompression score (already asked 5 days ago)?

tkaiser · September 7, 2022, 9:09pm

Hey @NicoD

Few weeks later… do you now have a clue about basics? Or still just generating numbers without meaning?