Orion O6 Debug Party Invitation

willy · March 22, 2025, 4:42am

Discord still stands as an easier/casual messaging and discussion place

Well, that’s just like a corridor full of people telling their lives at the same time as if they were at a party. Nothing good can emerge from this, and even if someone was caught saying something smart, it wouldn’t be indexed nor archived so nobody could find it when searching for it. Discord is just a closed social network full of morons.

rev · March 22, 2025, 2:02pm

I got mine via undervolting to 17W idle COMPLETE SYSTEM power usage. And that is x86 from AMD. I expected / hoped better from the CIX ARM.

And I agree very much, that discord is hell when it comes to their TOS rules, privacy rules and general search-ability of content that could be important. And it is impossible to make anything discord does “privacy friendly” with 3rd party. Its as closed as can be, and I concur that it is a mistake from Radxa to discuss (marketing) a “open source” product in a closed source platform where information just is not accessible to search engines and most privacy-conscious users. Might as well use X (ex twitter) for that in the recent political climate. Baad idea!

cutterjohn · March 22, 2025, 6:56pm

On more modern zen, i.e. WITH IODIE, have an IDLE power floor of c. 20W. Apparently not all of it can be power gated. Which is why Intel generally slaughters them in IDLE power draw.

The APUs, i.e. generally w/iGPU are monolithic designs, and so do not have this ‘problem’.

Hopefully AMD will address the IODIE in the next design.

Bits of this are probably not accurate or as accurately reprexsented as it has been some time since I looked into this, and did not check original research again, other that I was disappointed that zen5 still had high IDLE power draw.

willy · March 23, 2025, 4:23am

Interesting. My suspicion till now was that since they’re essentially focusing on what matters for their marketing, i.e. maximum gaming performance, they absolutely don’t care about idle consumption. I’ll see what they have to offer once the 395+ are out, because for such edge devices idle should count quite a bit.

Mario · March 23, 2025, 7:24pm

Turns out that CPU clocks can actually be increased by modifying the OPP tables in firmware - they are not hardcoded as I initially thought. Here’s a rough guide:

Clone the edk2, edk2-platforms and edk2-non-osi repositories from https://github.com/radxa and follow the build instructions in edk2-non-osi/Platform/CIX/Sky1/Readme.md
Copy edk2-non-osi/Platform/CIX/Sky1/PackageTool/pm_config/ to edk2-platforms/Platform/Radxa/Orion/O6/pm_config/
Open opp_config_custom.h inside the copied pm_config directory
Set #define PM_OPP_TABLE_CONFIG 1
Tweak the frequency and voltage levels in the defined OPP tables (at your own risk!)
- dxs_lit - little cluster
- dxs_gb0 - big cluster 0
- dxs_gb1 - big cluster 1
- dxs_gm0 - medium cluster 0
- dxs_gm1 - medium cluster 1
Rebuild and flash the generated firmware

Here’s a GB6 run with the big clusters set to 2.8, medium to 2.4 and little to 1.8 GHz:

Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6 - Geekbench

Mario · March 23, 2025, 8:07pm

While playing with the firmware, I also noticed that there’s support for the fastboot protocol:

You will need to have an NVME drive plugged in (we won’t be writing anything to it), otherwise the fastboot app will refuse to load.
Open your build_and_package.sh:
- add local FASTBOOT_LOAD=nvme
- append -D TOKEN_SETUP_SUPPORT=TRUE to the EDK2 build command
Build and flash the generated firmware as usual.
There will appear a new setup menu: CIX System Manager. Open it, then go to Soc Configuration->USB Configuration and switch USBC DRD Controller Role to Device. Save the changes.
While rebooting/powering the board, short the “BOOT” pins near the UART debug headers. This will cause the firmware to enter fastboot mode after the boot countdown:
```
Fastboot: Initializing...
Fastboot: Initializing done
```
Once you see the initialization message, connect the board to another computer via the port labelled “USBC0”.
Download and extract the Android SDK Platform Tools.
From now on, you can update the firmware by simply running fastboot flash bootloader PATH_TO_FW.bin in a terminal / command prompt.
Reboot the board (fastboot reboot).

To further speed up testing, it’s possible to flash individual parts of the firmware image.

In the case above where we modified the OPP tables, it was only necessary to update csu_pm_config.bin.

edk2-non-osi/Platform/CIX/Sky1/PackageTool/spi_flash_config_ota.json contains a single entry for the EDK2 BL33 image (bootloader3.img). Replace it with the definition for csu_pm_config.bin taken from spi_flash_config_all.json.

Build the firmware, then flash cix_flash_ota.bin instead of cix_flash_all.bin.

washley · March 23, 2025, 8:10pm

Does anyone know why the o6’s cores currently get set to different max frequencies for each cluster? There’s not actually any differences between the two big and two medium core clusters, right?

There’s mentions in user_config.h for different fan RPM modes. Has anyone come across how to choose between them?

Mario · March 23, 2025, 8:40pm

Does anyone know why the o6’s cores currently get set to different max frequencies for each cluster? There’s not actually any differences between the two big and two medium core clusters, right?

I guess the medium cores just can’t (efficiently/reliably) clock as high as the big ones. Looking at the OPP tables, the medium cores are given higher voltages than the big cores for the same frequencies. E.g. for 2200 MHz:

dxs_gb0/1 = 790 mV
dxs_gm0 = 850 mV
dxs_gm1 = 890 mV

They weren’t supposed to be identical. From the O6 product page:

4x Cortex®-A720 (Big cores) up to 2.8GHz
4x Cortex®‑A720 (Medium cores) up to 2.4GHz
4x Cortex®‑A520 (LITTLE cores) 1.8GHz

washley · March 23, 2025, 8:57pm

I’m not asking about medium vs big cores. Earlier someone had data suggesting the mediums might not even have an L2 cache, so there’s definitely differences there (even though they’re also a720s).

I am talking about how there are two complexes (I should have used that word instead of cluster) of big cores and two complexes of medium cores. The two medium complexes, by default, run at 2.2GHz and 2.3GHz. The two big complexes at 2.4GHz and 2.5GHz. Is there actually a difference between the two medium complexes, or the two big complexes, that would explain this? Radxa’s website says all the mediums run at the same speed and all the bigs run at the same speed, and at higher speeds than we get out-of-the-box, so it’s doubly odd.

Was this just done to help Cix or someone similar tell the complexes apart?

dxs_gm0 = 850 mV

dxs_gm1 = 890 mV

Same question about voltages. I hope this was all just experimentation/debugging values that simply need to be updated and indeed all of the mediums are identical and all of the bigs are identical. The alternative, where there’s physically something different, would be sad (and false advertising, at least on the clock speed front).

I am working on a tool to help dump out and decode literally all system registers to get a better view into other possible oddities in how these things are being configured.

washley · March 23, 2025, 9:02pm

One other quirk I’ve run into is that the a720 cores have 21 PMU counters, but the a520s only have 7. At least with the version of the kernel current released by radxa, the perf subsystem is not aware of this (only does detection on core 0, which is a big, and identifies there being 21), and you will panic the kernel if you try to use more than 7 on the little cores.

willy · March 23, 2025, 9:56pm

[ clusters topology etc… ]

I did some core-to-core latency measurements a months ago above: Orion O6 Debug Party Invitation

It’s pretty clear that there’s a unified L3 cache between all of them and that they’re seeing the same topology. Regarding L2, RAM latency tests show that it seems to be the same size, at least 512kB. However I’m finding it slower on the medium cores for concurrent accesses, just as if there was a single port to L2 vs 2 L2 ports for the big cores. That will be easier to validate when running at the same frequencies.

willy · March 23, 2025, 9:59pm

You know what ? I’m really disgusted because this file is among those I had modified a few times (even when building the Merak BIOS, which is mostly compatible). I didn’t know we could just copy the directory like this (I don’t know about edk2’s files arrangement). And I’m pretty sure I didn’t notice the big #if in the file depending on a variable set to zero… Now I’ll have to rebuild and test again

willy · March 23, 2025, 11:49pm

OK so for me it works fine even at 3.0 + 2.4 (both tested separately and together). The little cores don’t seem to want to go over 1.8 however, anything set above makes them drop to 800 MHz instead. I was pretty sure reading about 2.0 somewhere as the frequency for the CP8180 but that might possibly be one difference with CD8180. And no, I have not yet measured how much it sucks, but when I build I hear the fan.

RadxaYuntian · March 24, 2025, 6:19am

I imagine this makes BIOS CPU clock settings having no effect?

Mario · March 24, 2025, 11:28am

Yeah I can boot with 3 GHz as well but I haven’t been able to complete a full GB6 test. The app crashes shortly after starting the multicore test.

The clocks appear tied to the Arm DSU block (which contains the L3 cache among other things):

With DSU sustained levels being set to 1300 MHz and 790 mV, you are limited to 2600 MHz and 990 mV for the CPU blocks.

I have just realized the sanity check for frequency is wrong. It will always return TRUE for all CPU OPP tables because they only set the level field to indicate frequency.

I had tried bypassing the voltage check and increasing big cluster voltages up to 1.2 V, but it was still crashing. I’ll try to overclock the DSU block too and see how it goes.

@RadxaYuntian it would be nice if we could get some input from CIX on this stuff. I’m also curious about the absolute maximum ratings for core voltages.

@washley also made some good points above regarding differences between the big/medium “sub-clusters” (B0/B1, G0/G1).

willy · March 24, 2025, 1:01pm

Indeed, by habit I’ve set it to “2.6” but it indeed didn’t affect the frequency.

Mario · March 24, 2025, 5:07pm

Good catch.

The upper limits seem to be:

Little: 1800 MHz
Medium: 2600 MHz
Big: 3200 MHz

Mario · March 24, 2025, 7:49pm

Alright, this is the best I’ve been able to achieve:

B0: 3.2 GHz
B1: 3.1 GHz
M0/1: 2.6 GHz
LIT: 1.8 GHz

Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6 - Geekbench

Power draw was ~13 W at idle and occasionally went up to 50 W during the benchmark.

Unfortunately the B1 cluster heats up too much at 3.2 GHz and trips the thermal sensor (around 100 C?), shutting the board off. Not sure about the factory thermal paste but perhaps a Honeywell pad would do a better job.

willy · March 24, 2025, 8:15pm

That’s pretty good, it doesn’t have to be ashamed in front of some x86. I’ll probably try to stick to 3.0+2.5 or something like this and start to undervolt the cores to figure what margin we have there and how much we can reduce the power draw. I also haven’t played with the “sustained” description, I suspect it will split the frequencies that are only accessible once the cpufreq “boost” is set but that’s pure speculation. If that was the case, we could add more opp that are not used by default, to ease with testing.

Mario · March 25, 2025, 12:19am

Got the latest insider build of Windows 11 (27818.1000) to boot with all cores.

It required a small kernel patch, though disabling the little cores would also work. These recent builds assume that all CPU cores support SPE, but it turns out that the A520s don’t, which leads to a fault when trying to access the missing CPU registers.

WSL2 (Hyper-V) also works (IORT needs to be disabled in ACPI for now):