Glad to see you here! yes i agreed that Armbian is even better for most of the usage!
But for this board the kernel from armbian seems to be much slower than the vendor kernel of 5.15.147.
Hope allwinner could provide better kernel for the community to play with A527.
Iβve also ordered a radxa A7A and A7Z, both using A733 SoC.
Hope they could got armbian support soon!
Thank you, just added Allwinner A527 scores to official results. This octa-core A55 thingy is slightly faster than a RK3399 multithreaded (~125%) but of course slightly slower single-threaded (80%-85%).
Nope, it canβt. The βbenchmarkβ itself is so horribly stupid that I stopped commenting to the former DietPi guy (at least I was able to explain to him that the benchmark he initially chose is even more crap (to compare armhf with aarch64).
This benchmark is a stupid bash loop that represents which real-world task? Exactly: NONE. Benchmarks are always about use cases so if your use case is executing silly bash loops for no reason only then this benchmark is for you. It does not represent anything other than that!
Next flaw should be pretty obvious: a benchmark always also measures software. And since software evolves scores may change (the 7-zip benchmark sbc-bench relies on is one of the few exceptions when remaining on same version).
Next flaw: the βbenchmarkβ is used in fire-and-forget mode w/o taking into account background activity, throttling or other stuff that instantly ruins benchmark scores. It generates garbage numbers by design also for the simple reason execution is just once and way too short (a quick burst of background activity can led to twice or three times the execution time). Benchmarking basics: you need to measure multiple times, record standard deviation and if this is higher than letβs say 5% trash the results. Nope, they get uploaded instead.
And if you wouldβve looked with brain switched on at the link you shared you wouldβve noticed that the whole collection of βscoresβ in reality is just a collection of garbage. Just look at the RPi numbers:
RPi 5B: benchmark execution between 1.7 and 7.54 seconds. Thatβs a results variation of close to 450% which ofc is impossible in a benchmarking environment. You would expect that fastest and slowest runs differ by max 5-10% but never exceeding 50% or even +400%
RPi 4B: between 4.9 and 55 seconds, a +1100% difference, LMAO
RPi 1/Zero 512M: between 3.2 and 600 sec, a ~18750% difference, LMAO even more.
And then average values are generated instead of discarding this crap. I mean: how on earth should it be possible that an original RPi with its dog slow BCM2835 outperforms RK3399 boards (3.2 vs. 6/7 seconds) or even more modern ones? Correct: thatβs impossible since the numbers collected there are BS and nothing else.
But for a project like DietPi thatβs today mostly based on ignorance itβs rather typical to do βperformance measurementsβ in such a comically flawed way.
6 seconds is actually much better than the bulk of RK3399 results we get, also better than every single uploaded RK3399 result. They are more between 7-9 seconds: https://dietpi.com/survey/#benchmark
Cotrex A55 is expected to be ~18% faster than A53 and A72 about double the speed? So taking the A53 as reference, utilizing all cores (which is what our benchmark does/allows):
RK3399: 4x1 + 2x2 = 8
T527 (or A527?): 8x1.18 = 9.44 ~ +18%
So the dietpi-benchmark results seem to reflect this quite well.
Since we do not provide images for the Cubie A5E yet, where did you get it from, or which base image did you use in case? CPU governor is a kernel thing, so if it was a kernel build/sources from Radxa, I would expect CPUFreq to work, but I donβt know about mainline support yet. Tried to create a Orange Pi 4A (same SoC) image a while ago, but failed hard with their 32-bit U-Boot sources, not allowing any βnormalβ 64-bit Linux loading without highly adjusted and legacy build methods . After focusing 2 months on software and Debian Trixie support, will have another look.
You can run the dietpi-installer on other Debian-based images, but we do not support this SBC yet. If so, our images would be built from scratch.
Most likely due to outstanding CPUFreq support and probably missing higher frequencies in OPP table? Note that, if there is any chance to use mainline Linux, we will most likely go with it as well. I donβt know how Armbian implemented support so far, but the vendor sources I saw from Orange Pi (same 5.15.147, same primary source) were a pain to deal with, which I wonβt repeat.
Do you know what this debconf APT config does exactly?
Nice, I never knew about this thread. When checking about the currently used method once, respectively for pure shell/bash CPU benchmark methods, I just found it on two random 3rd party articles. Seemed to be an okayβish approach, good enough for our needs.
Indeed, between Debian Bullseye and Bookworm (or maybe it was between Buster and Bullseye, donβt remember exactly), scores dropped significantly, though usually more in a range 5%-10%. So at that time, newer boards had relatively worse results compared to older boards. But on our survey page, which drops surveys from >6 months, this systematic error faded in averages.
It stops most services we know being safe to stop (and which users did not chose to keep running in any case), including cron, and it switches to performance governor. But sure, it is pretty short running, and we cannot prevent unknown processes or the users themselves to cause resource usage. Thermal or undervoltage throttling can falsify results, indeed. Though since it is so short running, usually temperatures do not raise any near thermal throttling. How does sbc-bench deal with these things?
A single run does not represent much, if not done on a fresh clean system. The averaging in our case happens on the uploaded results. This means, that one sees outlier on the survey page, especially individual very slow examples among those SBCs which have a lot of samples. We could filter this, but in the averages it does not seem to cause a significant error, and that a more than doubled CPU time compared to average is caused by other/background CPU utilization most likely, should be obvious.
There are also some fake results, or falsely assigned/detected hardware model. In these cases, if the result is must faster than the average, and the number of samples is sufficiently large, it would actually make sense to drop them completely, also not count them into averages . But it remains true that the averages are not significantly falsified by this.
Your points are not wrong, I am well aware of it, and admitted in wherever it was topic. If precise and/or differentiated benchmarks are required, there are certainly better tools out there. The question really is what you aim for, and what exactly you want to measure. For the aim to give a rough overview and qualitative performance comparison between SBCs, it works pretty well IMO. The advertised or elsewhere measured performance gains between SBC/SoC/CPU models is represented quite well, also for the T/A/whatever527 vs RK3399 in this case. The tool aims to be quick & simple without requiring any other external software, naturally implying limits and lacking precision. And that results are published without filtering, stretches the ranges at least. But: You said βAbsolutely not.β when I mentioned that βour benchmark should run through, and its result should be qualitatively correctβ, but from this topic and your comments, regarding the averages, I do not see where this is βabsolutely notβ true? Do average sbc-bench results show significantly different relative differences between SBCs? I see it is hard to compare, since you have very small sample sizes, no per-SBC averages, and no combined CPU score or such, with AES being effected by hardware support, not only raw CPU performance.
For completeness, there are other flaws in our benchmark:
We measure RAM speeds in a tmpfs, hence there is a filesystem layer involved, which does not really make sense. The sample size is based on the free physical memory, which can be, even after stopping known services, much too small to be representative. So variations there are much higher than for CPU results, when repeating them on the same system (unless you really have some heavy or sudden CPU consuming process popping up elsewhere). Addressing this, repeating RAM I/O until e.g. a fixed overall size was handled, or running it for a fixed time instead, is actually the higher priority idea I had about it. But there is a lot more important tasks, so I wonβt find that time any soon, I guess.
Same for disk speeds: measured with filesystem layer, though there it sort of makes sense, so one can compare the filesystem impact as well, theoretically. But sample size is much too small. Originally, it was chosen small to not unnecessarily burn the SD card. But nowadays, this is not such a problem anymore, SD cards have become better and much larger in average, so we could raise the sample size without needing to worry much. Admins who run it, will know the amount of data written. And if one needs to worry whether some hundred MiB writes are killing the card or not, then that system has no serious task, I guess .
And βsmall sample sizeβ also applies to the CPU benchmark. Time is so small for the top SBCs in the meantime, that even just invoking the loop etc might have a significant effect. Same as for RAM I/O I had the idea to run it for a fixed time, rather than a fixed number of iterations. But again, a lot of ideas, but not the time to do everything. Priorities were and are different so far.