Unofficial Dietpi image build for Radxa Cubie A5E

Hello, I’ve recent get a Cubie A5E on hand and of course, I get the dietpi for it! :slight_smile:

So far, HDMI output is not working, but other than that, the ethernet, wifi and nvme seems to working smooth for me.

Regarding performance, it is around ~6 second in dietpi benchmark, which resemble the performance of RK3399, which I think is quite decent.


Generic Device (aarch64) | IP: 192.168.1.184
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ DietPi-Config β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DietPi-Benchmark | https://dietpi.com/survey#benchmark : β”‚
β”‚ - CPU Performance : Duration = 5.79 seconds (lower is faster) β”‚
β”‚ - CPU Temp : Idle = 42’c | Full load = 56’c β”‚
β”‚ - RootFS : Write = 108 MB/s | Read = 236 MB/s β”‚
β”‚ - RAM : Write = 669 MB/s | Read = 1498 MB/s β”‚
β”‚ β”‚
β”‚ Additional benchmarks: β”‚
β”‚ - Custom Filesystem : Write = Not tested MB/s | Read = Not tested MB/s β”‚
β”‚ - Network LAN : Transfer rate = Not tested MB/s β”‚
β”‚ β”‚
β”‚ ●─ DietPi-Benchmark ------------------------------------ β”‚
β”‚ DietPi-Benchmark : Starts CPU, RAM and IO benchmark suite. Scores can be β”‚
β”‚ ●─ Additional benchmarks ------------------------------- β”‚
β”‚ Custom Filesystem : Benchmark IO performance from a selection of mounted d β”‚
β”‚ Network LAN : Benchmark LAN performance using 2 DietPi systems. β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For the temperature, the CPU governor are not working either. Hope it would be resolved together with HDMI output in the new release.

can you give me the link of this imgae file i also want the image for cubie.

You can grab it at the same source as dietpi:

i need dietpi version not armbian.

Nope. Care to run sbc-bench and share the results?

Glad to see you here! yes i agreed that Armbian is even better for most of the usage!
But for this board the kernel from armbian seems to be much slower than the vendor kernel of 5.15.147.
Hope allwinner could provide better kernel for the community to play with A527.
I’ve also ordered a radxa A7A and A7Z, both using A733 SoC.
Hope they could got armbian support soon! :star_struck:

sure boss,

here is the result:-

root@cubie-a5e:~# sudo /bin/bash ./sbc-bench.sh -c                    β”˜

Average load and/or CPU utilization too high (too much background activity). Waiting...

Too busy for benchmarking: 08:44:24 up 1 min,  2 users,  load average: 0.92, 0.30, 0.11,  cpu: 4%
Too busy for benchmarking: 08:44:29 up 1 min,  2 users,  load average: 0.84, 0.30, 0.11,  cpu: 0%
Too busy for benchmarking: 08:44:34 up 1 min,  2 users,  load average: 0.78, 0.29, 0.10,  cpu: 0%
Too busy for benchmarking: 08:44:40 up 1 min,  2 users,  load average: 0.71, 0.29, 0.10,  cpu: 0%
Too busy for benchmarking: 08:44:45 up 2 min,  2 users,  load average: 0.66, 0.28, 0.10,  cpu: 0%
Too busy for benchmarking: 08:44:50 up 2 min,  2 users,  load average: 0.60, 0.28, 0.10,  cpu: 0%

Status of performance related governors found below /sys (w/o cpufreq):
1800000.gpu: simple_ondemand / 696 MHz (userspace performance simple_ondemand / 150 200 300 400 600 696)

sbc-bench v0.9.72

Installing needed tools: apt-get -f -qq -y install lm-sensors dmidecode sysstat links p7zip. Something went wrong:

debconf: delaying package configuration, since apt-utils is not installed

Trying to continue, tinymembench, ramlat, mhz, cpufetch, cpuminer. Done.
Checking cpufreq OPP. Done (results will be available in 19-28 minutes).
Executing tinymembench. Done.
Executing RAM latency tester. Done.
Executing OpenSSL benchmark. Done.
Executing 7-zip benchmark. Done.
Executing cpuminer. 5 more minutes to wait. Done.
Checking cpufreq OPP again. Done (22 minutes elapsed).

Results validation:

  * Measured clockspeed not lower than advertised max CPU clockspeed
  * Background activity (%system) OK
  * Throttling occured -> https://tinyurl.com/4ky59sys
  * schedutil cpufreq governor configured: 8 cores available vs. only 2 dynamic-power-coefficient DT nodes

Memory performance (all 2 CPU clusters measured individually):
memcpy: 2714.6 MB/s (Cortex-A55)
memset: 5571.7 MB/s (Cortex-A55)
memcpy: 2624.2 MB/s (Cortex-A55)
memset: 5568.6 MB/s (Cortex-A55)

Cpuminer total scores (5 minutes execution): 12.33,12.32,12.30,12.29,12.26,12.25,12.22,12.21,12.20,12.19,12.18,12.17,12.16,12.15,12.14,12.13,12.09,12.05 kH/s

7-zip total scores (3 consecutive runs): 8803,8844,8848, single-threaded: 1517

OpenSSL results (all 2 CPU clusters measured individually):
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     123421.49k   371068.99k   725983.40k   960046.42k  1059840.00k  1067248.30k (Cortex-A55)
aes-128-cbc     159775.27k   472835.71k   926473.13k  1215812.61k  1347578.54k  1357048.49k (Cortex-A55)
aes-192-cbc     117761.51k   327616.87k   589142.61k   739194.20k   797797.03k   802548.39k (Cortex-A55)
aes-192-cbc     152201.55k   420431.70k   751985.24k   940701.01k  1015100.76k  1021165.57k (Cortex-A55)
aes-256-cbc     114189.55k   301409.69k   509662.29k   617648.81k   658464.77k   661662.38k (Cortex-A55)
aes-256-cbc     147518.60k   387053.25k   651335.00k   786112.51k   837733.03k   841875.46k (Cortex-A55)

Full results uploaded to https://0x0.st/Kojb.bin
./sbc-bench.sh: line 1: kill: (17156) - No such process

I do agree that the test from dietpi is not as all-rounded as sbc-bench, but it could give a quick comparison at https://dietpi.com/survey/#benchmark

I’ve also tested my workstation result for comparison :rofl:

    sbc-bench v0.9.72

Installing needed tools: apt-get -f -qq -y install lm-sensors powercap-utils links. Something went       wrong:

debconf: delaying package configuration, since apt-utils is not installed

Trying to continue, tinymembench, ramlat, mhz, cpufetch, cpuminer. Done.
Checking cpufreq OPP. Done (results will be available in 13-19 minutes).
Executing tinymembench. Done.
Executing RAM latency tester. Done.
Executing OpenSSL benchmark. Done.
Executing 7-zip benchmark. Done.
Executing cpuminer. 5 more minutes to wait. Done.
Checking cpufreq OPP again. Done (13 minutes elapsed).

Results validation:

  * Measured clockspeed not lower than advertised max CPU clockspeed
  * Background activity (%system) OK
  * No throttling

Memory performance
memcpy: 25157.4 MB/s
memset: 84072.6 MB/s

Cpuminer total scores (5 minutes execution): 1306,1305,1304,1302,1301,1300,1299,1298,1297,1296,1295      ,1294,1293,1292 kH/s

7-zip total scores (3 consecutive runs): 310800,312401,311663, single-threaded: 5101

OpenSSL results:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc    1001516.44k  1153073.43k  1211499.86k  1303633.92k  1308797.61k  1308923.22k
aes-128-cbc    1080923.71k  1247434.79k  1293421.06k  1305712.30k  1309496.66k  1309436.59k
aes-192-cbc     938158.07k  1059389.87k  1092944.55k  1101423.96k  1104265.22k  1104188.76k
aes-192-cbc     936691.81k  1060514.54k  1093498.45k  1101708.63k  1103994.88k  1104352.60k
aes-256-cbc     826479.88k   921132.33k   945824.00k   952585.56k   954357.08k   954422.61k
aes-256-cbc     827482.89k   921122.56k   945899.52k   952635.05k   954310.66k   954444.46k

Full results uploaded to https://0x0.st/KojP.bin

Great distro you use :wink:

Thank you, just added Allwinner A527 scores to official results. This octa-core A55 thingy is slightly faster than a RK3399 multithreaded (~125%) but of course slightly slower single-threaded (80%-85%).

Nope, it can’t. The β€˜benchmark’ itself is so horribly stupid that I stopped commenting to the former DietPi guy (at least I was able to explain to him that the benchmark he initially chose is even more crap (to compare armhf with aarch64).

This benchmark is a stupid bash loop that represents which real-world task? Exactly: NONE. Benchmarks are always about use cases so if your use case is executing silly bash loops for no reason only then this benchmark is for you. It does not represent anything other than that!

Next flaw should be pretty obvious: a benchmark always also measures software. And since software evolves scores may change (the 7-zip benchmark sbc-bench relies on is one of the few exceptions when remaining on same version).

With this stupid bash loop I measured on the very same CPU a performance difference of 100% (bash v4.4.20) vs. 60% (bash v5.0.3). Exact same hardware and the whole root cause for this drastic β€˜performance difference’ is the version of the software environment. With this DietPi β€˜benchmark’ scores automagically improve on newer distros since packages got updated.

Next flaw: the β€˜benchmark’ is used in fire-and-forget mode w/o taking into account background activity, throttling or other stuff that instantly ruins benchmark scores. It generates garbage numbers by design also for the simple reason execution is just once and way too short (a quick burst of background activity can led to twice or three times the execution time). Benchmarking basics: you need to measure multiple times, record standard deviation and if this is higher than let’s say 5% trash the results. Nope, they get uploaded instead.

And if you would’ve looked with brain switched on at the link you shared you would’ve noticed that the whole collection of β€˜scores’ in reality is just a collection of garbage. Just look at the RPi numbers:

  • RPi 5B: benchmark execution between 1.7 and 7.54 seconds. That’s a results variation of close to 450% which ofc is impossible in a benchmarking environment. You would expect that fastest and slowest runs differ by max 5-10% but never exceeding 50% or even +400%
  • RPi 4B: between 4.9 and 55 seconds, a +1100% difference, LMAO
  • RPi 1/Zero 512M: between 3.2 and 600 sec, a ~18750% difference, LMAO even more.

And then average values are generated instead of discarding this crap. I mean: how on earth should it be possible that an original RPi with its dog slow BCM2835 outperforms RK3399 boards (3.2 vs. 6/7 seconds) or even more modern ones? Correct: that’s impossible since the numbers collected there are BS and nothing else.

But for a project like DietPi that’s today mostly based on ignorance it’s rather typical to do β€˜performance measurements’ in such a comically flawed way.

6 seconds is actually much better than the bulk of RK3399 results we get, also better than every single uploaded RK3399 result. They are more between 7-9 seconds: https://dietpi.com/survey/#benchmark

Cotrex A55 is expected to be ~18% faster than A53 and A72 about double the speed? So taking the A53 as reference, utilizing all cores (which is what our benchmark does/allows):

  • RK3399: 4x1 + 2x2 = 8
  • T527 (or A527?): 8x1.18 = 9.44 ~ +18%

So the dietpi-benchmark results seem to reflect this quite well.

Since we do not provide images for the Cubie A5E yet, where did you get it from, or which base image did you use in case? CPU governor is a kernel thing, so if it was a kernel build/sources from Radxa, I would expect CPUFreq to work, but I don’t know about mainline support yet. Tried to create a Orange Pi 4A (same SoC) image a while ago, but failed hard with their 32-bit U-Boot sources, not allowing any β€œnormal” 64-bit Linux loading without highly adjusted and legacy build methods :smile:. After focusing 2 months on software and Debian Trixie support, will have another look.

You can run the dietpi-installer on other Debian-based images, but we do not support this SBC yet. If so, our images would be built from scratch.

Most likely due to outstanding CPUFreq support and probably missing higher frequencies in OPP table? Note that, if there is any chance to use mainline Linux, we will most likely go with it as well. I don’t know how Armbian implemented support so far, but the vendor sources I saw from Orange Pi (same 5.15.147, same primary source) were a pain to deal with, which I won’t repeat.

Do you know what this debconf APT config does exactly? :wink:

Nice, I never knew about this thread. When checking about the currently used method once, respectively for pure shell/bash CPU benchmark methods, I just found it on two random 3rd party articles. Seemed to be an okay’ish approach, good enough for our needs.

Indeed, between Debian Bullseye and Bookworm (or maybe it was between Buster and Bullseye, don’t remember exactly), scores dropped significantly, though usually more in a range 5%-10%. So at that time, newer boards had relatively worse results compared to older boards. But on our survey page, which drops surveys from >6 months, this systematic error faded in averages.

It stops most services we know being safe to stop (and which users did not chose to keep running in any case), including cron, and it switches to performance governor. But sure, it is pretty short running, and we cannot prevent unknown processes or the users themselves to cause resource usage. Thermal or undervoltage throttling can falsify results, indeed. Though since it is so short running, usually temperatures do not raise any near thermal throttling. How does sbc-bench deal with these things?

A single run does not represent much, if not done on a fresh clean system. The averaging in our case happens on the uploaded results. This means, that one sees outlier on the survey page, especially individual very slow examples among those SBCs which have a lot of samples. We could filter this, but in the averages it does not seem to cause a significant error, and that a more than doubled CPU time compared to average is caused by other/background CPU utilization most likely, should be obvious.

There are also some fake results, or falsely assigned/detected hardware model. In these cases, if the result is must faster than the average, and the number of samples is sufficiently large, it would actually make sense to drop them completely, also not count them into averages :thinking:. But it remains true that the averages are not significantly falsified by this.

Your points are not wrong, I am well aware of it, and admitted in wherever it was topic. If precise and/or differentiated benchmarks are required, there are certainly better tools out there. The question really is what you aim for, and what exactly you want to measure. For the aim to give a rough overview and qualitative performance comparison between SBCs, it works pretty well IMO. The advertised or elsewhere measured performance gains between SBC/SoC/CPU models is represented quite well, also for the T/A/whatever527 vs RK3399 in this case. The tool aims to be quick & simple without requiring any other external software, naturally implying limits and lacking precision. And that results are published without filtering, stretches the ranges at least. But: You said β€œAbsolutely not.” when I mentioned that β€œour benchmark should run through, and its result should be qualitatively correct”, but from this topic and your comments, regarding the averages, I do not see where this is β€œabsolutely not” true? Do average sbc-bench results show significantly different relative differences between SBCs? I see it is hard to compare, since you have very small sample sizes, no per-SBC averages, and no combined CPU score or such, with AES being effected by hardware support, not only raw CPU performance.

For completeness, there are other flaws in our benchmark:

  • We measure RAM speeds in a tmpfs, hence there is a filesystem layer involved, which does not really make sense. The sample size is based on the free physical memory, which can be, even after stopping known services, much too small to be representative. So variations there are much higher than for CPU results, when repeating them on the same system (unless you really have some heavy or sudden CPU consuming process popping up elsewhere). Addressing this, repeating RAM I/O until e.g. a fixed overall size was handled, or running it for a fixed time instead, is actually the higher priority idea I had about it. But there is a lot more important tasks, so I won’t find that time any soon, I guess.
  • Same for disk speeds: measured with filesystem layer, though there it sort of makes sense, so one can compare the filesystem impact as well, theoretically. But sample size is much too small. Originally, it was chosen small to not unnecessarily burn the SD card. But nowadays, this is not such a problem anymore, SD cards have become better and much larger in average, so we could raise the sample size without needing to worry much. Admins who run it, will know the amount of data written. And if one needs to worry whether some hundred MiB writes are killing the card or not, then that system has no serious task, I guess :smile:.
  • And β€œsmall sample size” also applies to the CPU benchmark. Time is so small for the top SBCs in the meantime, that even just invoking the loop etc might have a significant effect. Same as for RAM I/O I had the idea to run it for a fixed time, rather than a fixed number of iterations. But again, a lot of ideas, but not the time to do everything. Priorities were and are different so far.