Orion O6 Debug Party Invitation

Yes, that’s exactly it. But it can still be caused by some software init code. Some chips for example support configuring priorities for each core, maybe there’s something like this. Or maybe it’s related to the way the CPUs are reordered.

I also checked OpenSSL performance in RSA, which is both relevant to my use cases, and a good indication of how advanced the CPU design is. First, here’s what I got on rock5b:

  • A76:

    $ taskset -c 4 openssl speed rsa2048
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.003765s 0.000103s    265.6   9713.8
    
  • A55:

    $ taskset -c 0 openssl speed rsa2048
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.005938s 0.000158s    168.4   6320.9
    

Now on Orion O6, default openssl (3.0.1):

  • A720 big (core 0 or 11):

    $ taskset -c 0 openssl speed rsa2048
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.003951s 0.000074s    253.1  13424.6
    
  • A520 (core 1):

    $ taskset -c 1 openssl speed rsa2048
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.012525s 0.000323s     79.8   3098.4
    

All that is quite shocking, it’s slower than rock5b by default. That might be a bug in the version, as it’s totally outdated (3.0.1) and openssl-3.0 is known for suffering from major performance issues. So I rebuilt openssl-3.0.15 which is the up-to-date and must-use version for the 3.0 branch, and now it’s way better:

  • A720 (big)

    $ LD_LIBRARY_PATH=$PWD taskset -c 11 ./apps/openssl speed rsa2048
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.001136s 0.000029s    880.0  34618.0
    ## 3.3x faster than rock5's A76
    
  • A520:

    $ LD_LIBRARY_PATH=$PWD taskset -c 1 ./apps/openssl speed rsa2048
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.005395s 0.000144s    185.3   6935.7
    ## 10% faster than rock5's A55
    

OK now it’s much better, it changed from quite poor to excellent!
One important difference I’m seeing is this:

  • stock openssl:
    [EDIT: as reported by @tkaiser, that version is not in /usr/bin but in an alternate dir that is in the path; the one in /usr/bin is up to date and works fine]

    $ openssl version -b -v -p -c
    OpenSSL 3.0.1 14 Dec 2021 (Library: OpenSSL 3.0.1 14 Dec 2021)
    built on: Thu Jan  9 10:44:37 2025 UTC
    platform: linux-aarch64
    CPUINFO: N/A
    
  • rebuild openssl:

    $ LD_LIBRARY_PATH=$PWD ./apps/openssl version -b -v -p -c
    OpenSSL 3.0.15 3 Sep 2024 (Library: OpenSSL 3.0.15 3 Sep 2024)
    built on: Fri Jan 31 04:43:22 2025 UTC
    platform: linux-aarch64
    CPUINFO: OPENSSL_armcap=0xfd
    

The build options seem to be the same, so that was likely a bug of 3.0.1 to fail to detect the CPU correctly, causing all operations to fall back to generic unoptimized ones. I think it’s important to fix it in the final product because openssl speed definitely is part of the metrics used to compare boards, and it would be too bad if it would compare unfavorably to other ones.

880 sign/s is very good at this frequency. My skylake at 4.4 GHz gives me 2160, thus 491 keys/s/GHz. Here we’re at 338/GHz, that’s impressive for an Arm platform.

2 Likes

I’m using the Radxa supplied FrankenDebian 12 image and here it looks quite different:

sh-5.2# /usr/bin/openssl version -b -v -p -c
OpenSSL 3.0.15 3 Sep 2024 (Library: OpenSSL 3.0.15 3 Sep 2024)
built on: Sun Oct 27 14:16:28 2024 UTC
platform: debian-arm64
CPUINFO: OPENSSL_armcap=0xfd

But the openssl binary lying around below /usr/share/cix/bin/ is the outdated 3.0.1 version though using 3.0.15 libs:

sh-5.2# /usr/share/cix/bin/openssl version -b -v -p -c
OpenSSL 3.0.1 14 Dec 2021 (Library: OpenSSL 3.0.15 3 Sep 2024)
built on: Sun Oct 27 14:16:28 2024 UTC
platform: debian-arm64
CPUINFO: OPENSSL_armcap=0xfd
1 Like

Interesting, because I also used their debian 12 image. I picked the USB image from https://dl.radxa.com/orion/o6/images/debian/.

Oh now I’m starting to understand. Indeed, the correct version is in /usr/bin/ but the path is bad and the LD_LIBRARY_PATH as well:

willy@orion-o6:~$ /usr/bin/openssl version
/usr/bin/openssl: /usr/share/cix/lib/libcrypto.so.3: version `OPENSSL_3.0.9' not found (required by  /usr/bin/openssl)
/usr/bin/openssl: /usr/share/cix/lib/libcrypto.so.3: version `OPENSSL_3.0.3' not found (required by /usr/bin/openssl)
willy@orion-o6:~$ echo $LD_LIBRARY_PATH 
/usr/share/cix/lib

Now quickly fixed this way:

$ sudo mv /usr/share/cix/bin/{,cix.}openssl 
$ sudo mv /usr/share/cix/lib/{,cix.}libssl.so.3
$ sudo mv /usr/share/cix/lib/{,cix.}libcrypto.so.3
$ openssl version -b -p -v -c
OpenSSL 3.0.15 3 Sep 2024 (Library: OpenSSL 3.0.15 3 Sep 2024)
built on: Sun Oct 27 14:16:28 2024 UTC
platform: debian-arm64
CPUINFO: OPENSSL_armcap=0xfd

Now fixed, thank you!

If your looking for a seamless out of the box experience or using the Orion as your daily driver then it is isn’t ready for that. The CD8180 is new so there is lot to discover about it capabilities/limitations. So you would be on that journey if you commit to purchasing one at this point and live with the consequences. Unfortunately even the debug party may not reveal all the limitations. As a example for the RK3588 only once the TRM was released could we determine limitations on some of the IP blocks, for me it was PCIE and the NPU. CD8180 TRM is planned to be released Q2 2025. I guess its easy to get caught up in the fever of wanting something new to play with.

2 Likes

While @hrw being an experienced developer maybe even willing to join this journey I wonder how many people ordered the O6, waiting eagerly to be shipped in February and thinking this would be a final product. Based on state of software side of things I would suspect it’s a loooong journey till O6 is ready for consumers while I see the board being advertised as ‘immediately ready’.

I guess we’ll see a lot of angry people and lots of complaints within the next months here…

4 Likes

Regarding memory speed, it’s quite good (basically twice Rock5 which is not surprising since it’s twice as wide):

  • 10 GB/s per A520 core, for a limit of 40 GB/s total for the 4.
  • 25-28GB/s per A720 core (depending on frequency), 40 GB/s for 2, 45 GB/s for 3, 43 GB/s for 4 and decreases a bit above to converge to 40GB/s.

Raw results:

  • A520 alone:

    $ for c in 1 1,2 1,2,3 1,2,3,4; do echo $c: $(taskset -c $c ./rambw 200 3);done
    1: 10871 10868 10891
    1,2: 21050 21489 21491
    1,2,3: 31179 31612 31655
    1,2,3,4: 40232 40445 40415
    
  • A720 alone:

    $ for c in 0 0,11 0,10,11 0,9,10,11; do echo $c: $(taskset -c $c ./rambw 200 3);done
    0: 26912 27550 27523
    0,11: 40291 40605 40698
    0,10,11: 45128 45385 45307
    0,9,10,11: 43622 43587 43592
1 Like

I couldn’t have put it better myself … hopefully, chance to snag one on eBay.

Currently digging a bit below /sys/kernel/debug I had to find that I was completely wrong since the CPU cores have those properties set (and sbc-bench needs an adoption since reporting nonsense).

Even the ‘weird’ setup with cpu0 and cpu11 being members of the same cluster is reflected wrt energy aware scheduling:

sh-5.2# cat /sys/kernel/debug/energy_model/cpu?/cpus
0,11
1-4
5-6
7-8
9-10

sh-5.2# grep . /sys/kernel/debug/energy_model/cpu0/ps\:799858/*
/sys/kernel/debug/energy_model/cpu0/ps:799858/cost:1690412
/sys/kernel/debug/energy_model/cpu0/ps:799858/frequency:799858
/sys/kernel/debug/energy_model/cpu0/ps:799858/inefficient:0
/sys/kernel/debug/energy_model/cpu0/ps:799858/power:520000

sh-5.2# grep . /sys/kernel/debug/energy_model/cpu0/ps\:2600173/*
/sys/kernel/debug/energy_model/cpu0/ps:2600173/cost:7640000
/sys/kernel/debug/energy_model/cpu0/ps:2600173/frequency:2600173
/sys/kernel/debug/energy_model/cpu0/ps:2600173/inefficient:0
/sys/kernel/debug/energy_model/cpu0/ps:2600173/power:7640000

According to these properties the fastest A720 consume almost 15 times more power at 2.6 GHz compared to 800 MHz :slight_smile:

Edit 1: Looking at what we get regarding USB-C and USB power delivery:
grep . /sys/class/typec/port*/* 2>/dev/null --> https://0x0.st/88Yj.txt

Even with BSP kernel that’s not much, just host vs. device and sink vs. source (my O6 is powered via port1 from an USB PD capable power brick and on port0 an USB SSD is connected from which the board booted). /sys/class/usb_power_delivery is empty and /sys/class/usb_role/ only shows host vs. device.

There’s only one udc entry for port 1 (cdnsp-gadget confirming that not only PCIe but also the USB stack is licensed from Cadence):
grep -r . /sys/class/udc/90f0000.usb-controller/* 2>/dev/null --> https://0x0.st/88Y2.txt

2 Likes

Oh great catch! Now at least we’re certain they’re from the same cluster, and that very likely a rotation is applied on the core numbers as enumerated.

1 Like

no I haven’t seen any such settings.

Though there’s setpci to renegotiate PCIe speed. I use this script in production to reduce consumption of a bunch of DC SSDs when performance is not needed (which is true for approx 16 hours of the day :slight_smile: )

Edit 1: Since my O6 is equipped with a Samsung SSD after a slight modification adjusting the bus address (no 0000: prefix) this works flawlessly on O6:

root@orion-o6:~# lspci -vv -s '0000:91:00.0' | grep LnkSta:
		LnkSta:	Speed 16GT/s, Width x4

root@orion-o6:~# set-samsung-speed.sh 3

root@orion-o6:~# lspci -vv -s '0000:91:00.0' | grep LnkSta:
		LnkSta:	Speed 8GT/s (downgraded), Width x4

root@orion-o6:~# set-samsung-speed.sh 1

root@orion-o6:~# lspci -vv -s '0000:91:00.0' | grep LnkSta:
		LnkSta:	Speed 2.5GT/s (downgraded), Width x4

Script is here: https://gist.github.com/ThomasKaiser/2c2bd04539a64a906f5520432a651d1d and ofc awk search pattern needs to be adapted for a graphics card or whatever else.

Edit 2: For anyone interested in adjusting PCIe speeds on O6 just use the generic set-pcie-speed script. It gets called with bus address and PCIe Gen as two parameters.

For example pcie_set_speed.sh 0001:c1:00.0 1 to set the PCIe device in the x16 slot to Gen1.

To test I added another RTL8126 NIC via M.2-PCIe-adapter:

lspci will show the bus address (I know it’s on bus 0003 since I have compared with beforeenumeration changes dynamically so the new NIC is now on 0001):

radxa@orion-o6:~$ lspci
0000:90:00.0 PCI bridge: Device 1f6c:0001
0000:91:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller S4LV008[Pascal]
0001:c0:00.0 PCI bridge: Device 1f6c:0001
0001:c1:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8126 (rev 01)
0002:00:00.0 PCI bridge: Device 1f6c:0001
0002:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8126 (rev 01)
0003:30:00.0 PCI bridge: Device 1f6c:0001
0003:31:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8126 (rev 01)

Next we’re checking the actual speed (can only be Gen3 since RTL8126 is not Gen4 capable), then set it to Gen1 and re-check (requires superuser privileges):

radxa@orion-o6:~$ sudo lspci -vv -s '0001:c1:00.0' 2>/dev/null | grep LnkSta:
		LnkSta:	Speed 8GT/s, Width x1

radxa@orion-o6:~$ sudo pcie_set_speed.sh 0001:c1:00.0 1

radxa@orion-o6:~$ sudo lspci -vv -s '0001:c1:00.0' 2>/dev/null | grep LnkSta:
		LnkSta:	Speed 2.5GT/s (downgraded), Width x1
2 Likes

Nice, thanks for sharing!

I’m realizing from your captures (and a local lspci -nnvv) that the LnkCap for the root bridge advertises x4, not x8, while the PCIe connector is supposed to be x8 (and that’s even what’s indicated on the schematic). Thus I don’t know if it’s a device tree limitation or anything else, but I suspect that the network tests I planned will be seriously limited if only 4 lanes are active for now. I’ll see this week-end. Maybe there’s some form of bifurcation and the BIOS changes the configuration when it detects a connected device.

There could also situation when more lanes are used when there is a need.

I had system where gfx card started low speed, low lanes and went up to maximum when heavy used.

Maybe my mistake having forgotten that bus enumeration changes dynamically. I’ve now put an el cheapo SSD in the M.2 adapter to really identify the right ‘slot’ and lspci -nnvv shows x8 as LnkCap: https://0x0.st/88gw.txt

1 Like

I ordered one orion board. I know from my rock 5b that it lasts some time to get all developments done. Bot hopefully someone is is able to resolv the heatsink issue with the next software or acpi update. At least the board should work as a standard PC. Thanks to all making a standard user happy.
Regards

Just for my knowledge, what is the “heatsink issue” ? On mine, the fan speed can be adjusted using pwm, I’m setting it to silent enough mode this way:

$ echo 17 > /sys/class/hwmon/hwmon1/pwm1
1 Like

The noise.
Thanks I will test if I get my board.

Ah OK I understand then :wink: On this one the fan can be completely stopped using pwm (though it wouldn’t be wise, as it gets warm). Also when it runs at low speed, the speed is quite regular so there’s no humming noise like we can have on some other devices. For example my LX2 sends a small boost every second, that’s unbearable, I had to dampen it using resistor+capacitor. No, really once booted, the O6 is quite bearable even for someone like me who can’t stand fans at all at less than 3m.