Upcoming ARM v9 product with 45 TOPS NPU

https://github.com/radxa-pkg/linux-sky1/releases/tag/6.1.44-1

Ah, another binaries only release…

Everything below https://github.com/radxa-pkg/ is ‘binaries only’ for the simple reason that various build pipelines at Radxa target this Github repo for releasing packages which are mostly ‘binary only’ :slight_smile:

So asides us all eagerly waiting for a repo with source code packages/binaries appearing at the aforementioned repo is the only thing possible…

At least according to the decompiled device-tree CIX’ 6.1 kernel is shipping all the A720 share the same 1024 value for the capacity-dmips-mhz property as such only clockspeed should be different (the little A520 get a 403 score here). But in the past many vendors showed that these values were chosen incorrectly or outright wrongly.

Tomorrow we will know since then @RadxaNaoki runs sbc-bench which reports cache sizes.

Hi Naoki,

first, you should run defconfig separately from “all”, because “make -j12 defconfig all” says “run the jobs from both defconfig and all in parallel, with up to 12 at a time”. All only works fine once defconfig is finished, and defconfig is 100% serialized. So you can have random effects by having the two at once, and the time of defconfig alone (which doesn’t participate to measuring real performance) adds significant noise to the measure.

Second, when running so, it’s important to monitor usage (vmstat 1 or top -d 1 are fine). You may really observe a lot of I/O wait due to writing on eMMC/NVME etc, which doens’t measure the CPU usage at all. There are also significant parts of the kernel that are single-threaded (linking at the end can be super long, compressing the kernel etc). This is clearly visible in “top” as you’ll see long periods where you’re above 0% idle. Worse, some of these single-threaded tasks may randomly end up on the little cores and take even longer at the end. That’s part of the things I’ve been used to witnessing a lot when running build benchmarks on big-little cpus (e.g. rk3399). While you’re measuring the gain this machine brings to your use case, it doesn’t do a good favor to the machine itself.

One thing that scales fairly well in the kernel is the build of modules as there are a lot to run in parallel with little to no serializing operations. You can then run:

     make allmodconfig
     make -j$(nproc) prepare
     time make -j$(nproc) modules

Usually this will be full on all cores from beginning to end and should provide quite trustable timings.

And as Thomas mentioned, it’s likely that these numbers will improve over time (e.g. RAM timings, CPU operating points etc) so it’s important to make sure some potential buyers will not be disappointed by comparing to other machines and keep a non-satisfying idea of your product if it’s not yet 100% up to speed. I agree with Thomas that sbc-bench does show some such anomalies and allows to take numbers with a grain of salt (e.g. too low CPU frequency, low DRAM frequency etc).

But in any case for me something twice as fast as a rock5b on a kernel build sounds quite good already :wink:

Yeah and let’s not harrass the Radxa team and early testers to get random number before everyone else. They’ll need time to verify the conditions of the tests, deal with various early issues. There’s no point in making tens of series of completely different numbers pop up here out of nowhere for the same hardware just due to pressure.

2 Likes

Interesting! Here’s hoping the CD8180 does indeed have eight full fat A720 cores, with the only difference between them being the clock speed, rather than the “Mediums” being an area-optimized uarch implementation :crossed_fingers:

It’s not about harassment or ‘numbers’, it’s about informations like type of ‘middle’ A720 cores, at which clockspeeds the cores currently (in bring-up stage) are running, anomalies and so on.

Generated benchmark scores now should be thrown away but they’re already in the wild (like the shared kernel compile time on X or Geekbench 6 most probably not made with final clockspeeds)

My point is that upon the very first result publication, everyone obviously wants to compare and have their variation around a test on a platform which got zero validation of the conditions. I agree that the mistake is to publish raw numbers but we all did it in excitement.

I just think that the thread should not degenerate into “and please test for my use case as well” nor “and what if you change this or that”, rather “Let’s first focus on making sure the cores run at the expected frequencies and that DRAM is operating at acceptable speed before testing”, then "now that the platform looks somewhat OK and trustable, let’s run a few tests relevant to the end user (kernel builds and GB are OK there) and clearly mention the conditions (and the date) since it’s clear they will change in a few weeks, generally for good).

2 Likes

Understood.

I’ve just implemented a NotReadyYet=yes mode in sbc-bench that will skip generating ‘random numbers’ but still generates valuable insights.

Sample output generated on Rock-5B: https://0x0.st/8oLI.bin

@RadxaNaoki can you please execute NotReadyYet=yes sbc-bench.sh -r after downloading latest v0.9.69 sbc-bench release?

2 Likes

Oh that’s a great idea! It’s more of a board discovery than a bench and can help many engineering teams quickly spot specific points to dig deeper. Well done! I think you should pass the message to Jean-Luc & Jeff for next time they review early boards as well.

$ time make defconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  LEX     scripts/kconfig/lexer.lex.c
  YACC    scripts/kconfig/parser.tab.[ch]
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/menu.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'defconfig'
#
# configuration written to .config
#

real	0m6.560s
user	0m4.680s
sys	0m1.398s

I see. I think I can do that after answering a bunch of random questions and convincing everyone.
(Personally, I think the priorities are reversed.)

I saw a grep error while running sbc-bench, I don’t think it affects the results but just wanted to let you know.

++ grep '^aes-256-cbc' ''
grep: : No such file or directory
+ OpenSSLResults=
1 Like

They all show 512K L2 as such same cores: Orion O6 Debug Party Invitation

Also interesting:

  • memory timings (at least for the A720) are already excellent (four 32-bit channels total)
  • clockspeeds right now do not differ in a way we could talk about ‘big’ and ‘middle’ (1 x 2.5 GHz, 3 x 2.4 GHz, 2 x 2.3 GHz, 2 x 2.2 GHz – this might change though)
  • it looks like cpufreq is controlled by an MCU inside the SoC and not Linux’ cpufreq driver (maybe a good choice given the state of ‘energy aware scheduling’ on Linux on ARM though this would result in task affinity controlled by the kernel vs. clockspeeds controlled by firmware)

I’m already curious how things will look with Cix’ BSP kernel which likely will show better ‘real-world’ performance (majority of CPU benchmarks are always somewhat stupid since they (should) ramp up cpufreq to the max but in reality it’s a lot about the balance between consumption and performance where settings matter).

Thanks, will be fixed with next version.

Excellent! That is great news. Thanks for the heads up.

It’ll be interesting to see what the frequencies look like when running the BSP kernel. Hopefully all the A720s hit their advertised frequencies. We don’t want any “up to …GHz” here. #sideways glance at the RK3588# :wink:

vpu should be using v4l2 stateful api. And my chromium build for mainline kernel should work, which has been tested on qcom snapdragon 865 platform.

1 Like

I find /usr/share/cix/include/mvx-v4l2-controls.h, which provides customs APIs on mvx platform, I guess there should be patches on chromium to make it work.

That is correct. The VPU IP comes from ARM China, codename is Linlon.

Some RISC-V boards use similar but lower-end VPU models. For example



1 Like

I wonder about one thing. License…

Linux kernel is GPLv2. So sources should be provided with binaries.

Last week Radxa shown binary kernel images. OK, I do not have a board yet so you may say “license does not apply”, but what if I order a board?

Will board come with storage media containing sources? Or URL to download them? Or will it be provided already online?

You proudly wrote “World’s first Open Source ARM V9 Motherboard” in the presentation so I ask.

Considering i didnt get a shipping confirmation from ALLNET it means it will ship at the end of feb at the earliest?