Upcoming ARM v9 product with 45 TOPS NPU

tkaiser · January 15, 2025, 11:49am

Where can this file be found?

meco · January 15, 2025, 11:51am

https://github.com/radxa-pkg/linux-sky1/releases/tag/6.1.44-1
linux-image-6.1.44-1-sky1_6.1.44-1_arm64.deb

Found here: /data/usr/lib/linux-image-6.1.44-1-sky1/cix/sky1-orion-o6.dtb

RadxaNaoki · January 15, 2025, 12:00pm

I build kernels a lot, and kernel build times are important to me, so I don’t understand why they would be rejected, but I’m open to constructive suggestions.

RadxaNaoki · January 15, 2025, 12:01pm

I haven’t seen many people mistake a kernel build for a pure CPU benchmark.

tkaiser · January 15, 2025, 12:10pm

Can you please do a sbc-bench -r and post the results here?

RadxaNaoki · January 15, 2025, 12:40pm

In which environment?
I’m not planning on running it in a work environment because it uses sudo and I don’t understand what it does.

EDIT: I will post the results on Orion O6 Debug Party Invitation

tkaiser · January 15, 2025, 12:42pm

You’re running currently mainline? Anyway, it doesn’t really matter since sbc-bench simply executes a few benchmarks and collects information about the HW setup (which CPU clusters, at which CPU frequency do they run, benchmark clusters individually too).

It generates a report like this not only including scores but also reasons why numbers are as they are

Also Marcin (@hrw) might be happy about it since it collects /proc/cpuinfo for his ARM SoC tables: https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html

Reasonable but then the try to install needed packages might fail as well as configuring OS settings to maximum performance (cpufreq governors and stuff but most probably this won’t work with mainline yet anyway).

RadxaNaoki · January 15, 2025, 12:49pm

So I’ll try running it in the official Debian (not Radxa/CIX, but Debian) bookworm environment. It’s too late today, so I’ll do it tomorrow.

meco · January 15, 2025, 2:20pm

Codec	Decoder	Encoder
AV1	x	na
AVS2	x	na
AVS	x	na
H.264	x	x
HEVC	x	x
JPEG	x	x
MPEG2	x	na
MPEG4	x	na
VC1	x	na
VP8	x	x
VP9	x	x

EDIT:
The codecs are exposed via V4L2

source: extracted VPU-Driver firmware binarys

meco · January 15, 2025, 1:02pm

Apparently this is the driver for that series (a pdf mentioned that name for multiple tiers of SoC’s): https://gitee.com/Arm-China/Compass_NPU_Driver (at least open source and last update in git 19 days ago)

hrw · January 15, 2025, 2:13pm

https://github.com/radxa-pkg/linux-sky1/releases/tag/6.1.44-1

Ah, another binaries only release…

tkaiser · January 15, 2025, 2:23pm

Everything below https://github.com/radxa-pkg/ is ‘binaries only’ for the simple reason that various build pipelines at Radxa target this Github repo for releasing packages which are mostly ‘binary only’

So asides us all eagerly waiting for a repo with source code packages/binaries appearing at the aforementioned repo is the only thing possible…

tkaiser · January 15, 2025, 3:37pm

At least according to the decompiled device-tree CIX’ 6.1 kernel is shipping all the A720 share the same 1024 value for the capacity-dmips-mhz property as such only clockspeed should be different (the little A520 get a 403 score here). But in the past many vendors showed that these values were chosen incorrectly or outright wrongly.

Tomorrow we will know since then @RadxaNaoki runs sbc-bench which reports cache sizes.

willy · January 15, 2025, 5:32pm

Hi Naoki,

first, you should run defconfig separately from “all”, because “make -j12 defconfig all” says “run the jobs from both defconfig and all in parallel, with up to 12 at a time”. All only works fine once defconfig is finished, and defconfig is 100% serialized. So you can have random effects by having the two at once, and the time of defconfig alone (which doesn’t participate to measuring real performance) adds significant noise to the measure.

Second, when running so, it’s important to monitor usage (vmstat 1 or top -d 1 are fine). You may really observe a lot of I/O wait due to writing on eMMC/NVME etc, which doens’t measure the CPU usage at all. There are also significant parts of the kernel that are single-threaded (linking at the end can be super long, compressing the kernel etc). This is clearly visible in “top” as you’ll see long periods where you’re above 0% idle. Worse, some of these single-threaded tasks may randomly end up on the little cores and take even longer at the end. That’s part of the things I’ve been used to witnessing a lot when running build benchmarks on big-little cpus (e.g. rk3399). While you’re measuring the gain this machine brings to your use case, it doesn’t do a good favor to the machine itself.

One thing that scales fairly well in the kernel is the build of modules as there are a lot to run in parallel with little to no serializing operations. You can then run:

     make allmodconfig
     make -j$(nproc) prepare
     time make -j$(nproc) modules

Usually this will be full on all cores from beginning to end and should provide quite trustable timings.

And as Thomas mentioned, it’s likely that these numbers will improve over time (e.g. RAM timings, CPU operating points etc) so it’s important to make sure some potential buyers will not be disappointed by comparing to other machines and keep a non-satisfying idea of your product if it’s not yet 100% up to speed. I agree with Thomas that sbc-bench does show some such anomalies and allows to take numbers with a grain of salt (e.g. too low CPU frequency, low DRAM frequency etc).

But in any case for me something twice as fast as a rock5b on a kernel build sounds quite good already

willy · January 15, 2025, 5:39pm

Yeah and let’s not harrass the Radxa team and early testers to get random number before everyone else. They’ll need time to verify the conditions of the tests, deal with various early issues. There’s no point in making tens of series of completely different numbers pop up here out of nowhere for the same hardware just due to pressure.

loophole · January 15, 2025, 6:55pm

Interesting! Here’s hoping the CD8180 does indeed have eight full fat A720 cores, with the only difference between them being the clock speed, rather than the “Mediums” being an area-optimized uarch implementation

tkaiser · January 15, 2025, 7:07pm

It’s not about harassment or ‘numbers’, it’s about informations like type of ‘middle’ A720 cores, at which clockspeeds the cores currently (in bring-up stage) are running, anomalies and so on.

Generated benchmark scores now should be thrown away but they’re already in the wild (like the shared kernel compile time on X or Geekbench 6 most probably not made with final clockspeeds)

willy · January 15, 2025, 7:22pm

My point is that upon the very first result publication, everyone obviously wants to compare and have their variation around a test on a platform which got zero validation of the conditions. I agree that the mistake is to publish raw numbers but we all did it in excitement.

I just think that the thread should not degenerate into “and please test for my use case as well” nor “and what if you change this or that”, rather “Let’s first focus on making sure the cores run at the expected frequencies and that DRAM is operating at acceptable speed before testing”, then "now that the platform looks somewhat OK and trustable, let’s run a few tests relevant to the end user (kernel builds and GB are OK there) and clearly mention the conditions (and the date) since it’s clear they will change in a few weeks, generally for good).

tkaiser · January 15, 2025, 9:48pm

Understood.

I’ve just implemented a NotReadyYet=yes mode in sbc-bench that will skip generating ‘random numbers’ but still generates valuable insights.

Sample output generated on Rock-5B: https://0x0.st/8oLI.bin

@RadxaNaoki can you please execute NotReadyYet=yes sbc-bench.sh -r after downloading latest v0.9.69 sbc-bench release?

willy · January 15, 2025, 10:02pm

Oh that’s a great idea! It’s more of a board discovery than a bench and can help many engineering teams quickly spot specific points to dig deeper. Well done! I think you should pass the message to Jean-Luc & Jeff for next time they review early boards as well.