Orion O6 Debug Party Invitation

willy · March 29, 2025, 7:28pm

Slightly only. I remained within the limits:

--- edk2-non-osi/Platform/CIX/Sky1/PackageTool/pm_config/opp_config_custom.h    2025-03-29 20:05:37.752470267 +0100
+++ edk2-platforms/Platform/Radxa/Orion/O6/pm_config/opp_config_custom.h        2025-03-29 17:43:37.838172574 +0100
@@ -7,7 +7,7 @@
 #include "pm_export_config.h"
 #include "opp_config.h"
 
-#define PM_OPP_TABLE_CONFIG   0
+#define PM_OPP_TABLE_CONFIG   1
 
 #if PM_OPP_TABLE_CONFIG
 /* V1.1, DFS */
@@ -48,22 +48,24 @@
 };
 
 static domain_opp_config_t dxs_gb0 = {
-    .size = 7,
-    .sustained_idx = 6,
+    .size = 9,
+    .sustained_idx = 8,
     .opp_table = {
         { .level =  800UL, .voltage = 730 },
         { .level = 1200UL, .voltage = 750 },
         { .level = 1500UL, .voltage = 750 },
         { .level = 1800UL, .voltage = 790 },
         { .level = 2200UL, .voltage = 790 },
-        { .level = 2300UL, .voltage = 850 },
-        { .level = 2400UL, .voltage = 920 },   /* sustained */
+        { .level = 2400UL, .voltage = 850 },
+        { .level = 2800UL, .voltage = 950 },   /* sustained */
+        { .level = 3000UL, .voltage = 950 },   /* sustained */
+        { .level = 3100UL, .voltage = 990 },   /* sustained */
     },
 };
 
 static domain_opp_config_t dxs_gb1 = {
-    .size = 7,
-    .sustained_idx = 6,
+    .size = 9,
+    .sustained_idx = 8,
     .opp_table = {
         { .level =  800UL, .voltage = 730 },
         { .level = 1200UL, .voltage = 750 },
@@ -71,13 +73,15 @@
         { .level = 1800UL, .voltage = 790 },
         { .level = 2200UL, .voltage = 790 },
         { .level = 2400UL, .voltage = 850 },
-        { .level = 2500UL, .voltage = 920 },   /* sustained */
+        { .level = 2800UL, .voltage = 950 },   /* sustained */
+        { .level = 3000UL, .voltage = 950 },   /* sustained */
+        { .level = 3100UL, .voltage = 990 },   /* sustained */
     },
 };
 
 static domain_opp_config_t dxs_gm0 = {
-    .size = 7,
-    .sustained_idx = 6,
+    .size = 8,
+    .sustained_idx = 7,
     .opp_table = {
         { .level =  800UL, .voltage = 730 },
         { .level = 1200UL, .voltage = 750 },
@@ -85,21 +89,23 @@
         { .level = 1800UL, .voltage = 790 },
         { .level = 2000UL, .voltage = 790 },
         { .level = 2200UL, .voltage = 850 },
-        { .level = 2300UL, .voltage = 890 },   /* sustained */
+        { .level = 2400UL, .voltage = 920 },   /* sustained */
+        { .level = 2600UL, .voltage = 950 },   /* sustained */
     },
 };
 
 static domain_opp_config_t dxs_gm1 = {
-    .size = 7,
-    .sustained_idx = 6,
+    .size = 8,
+    .sustained_idx = 7,
     .opp_table = {
         { .level =  800UL, .voltage = 730 },
         { .level = 1200UL, .voltage = 750 },
         { .level = 1500UL, .voltage = 750 },
         { .level = 1800UL, .voltage = 790 },
         { .level = 2000UL, .voltage = 790 },
-        { .level = 2100UL, .voltage = 850 },
-        { .level = 2200UL, .voltage = 890 },   /* sustained */
+        { .level = 2200UL, .voltage = 850 },
+        { .level = 2400UL, .voltage = 920 },   /* sustained */
+        { .level = 2600UL, .voltage = 950 },   /* sustained */
     },
 };

I’ve read that cppc supports “boost” mode on recent kernels >= 5.8, but I don’t know how it needs to be declared in the ACPI tables, all I found is that it’s detected when there’s a difference between a nominal_freq and a maximum_freq, though I found none of these here. This could be useful to reserve some OPP for manual OC and experimentation.

Also, I have not run GB, only some local builds, openssl speed and CPU/DRAM perf tests

Jeremy_Linton · April 1, 2025, 9:27pm

“Take RK3588 for instance. The display controller (VOP2) depends on pixel clocks exposed by the global clock & reset unit (CRU). VOP2 needs to be able to enable this particular clock and set its frequency based on whatever video mode is used for the corresponding output port.”

But as I know your aware, this isn’t a problem specific to Arm SoC’s. Its not at all uncommon for devices to have firmware/device managed mailbox drivers which send messages to microcontrollers embedded in the device or somewhere else on the platform to change clocks like this. ACPI provides DSM() as a method to achieve this in a device specific manner, or the platform/device and provide direct HW mailboxes to do the operation. After all this is largely how many ‘DT’ devices behave (ex the rpi, which uses the GPU to control voltage/clocks for some things). Then as long as the mailbox remains consistent the SoC/Board provider can change the underlying “change_video_mclock()” function without updating the OS.

Jeremy_Linton · April 1, 2025, 9:38pm

Hi,

So I’m running the F42 beta on this board in ACPI mode, and while it tends to work far far far better than I expected, and I’ve seen from most !server grade products a few things poped out:

First is the EFI/RTC service seems to be having problems (possibly due to the 48bit vamap?)
[ 0.860750] CPU: 7 UID: 0 PID: 1 Comm: swapper/0 Tainted: G S I ------- — 6.14.0-63.fc42.aarch64 #1
[ 0.860755] Tainted: [S]=CPU_OUT_OF_SPEC, [I]=FIRMWARE_WORKAROUND
[ 0.860757] Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 1.0 Jan 1 1980
[ 0.860760] pstate: 61000009 (nZCv daif -PAN -UAO -TCO +DIT -SSBS BTYPE=–)
[ 0.860763] pc : __efi_queue_work+0xe4/0x120
[ 0.860766] lr : __efi_queue_work+0xd0/0x120
[ 0.860769] sp : ffff80008007ba10
[ 0.860770] x29: ffff80008007ba10 x28: 0000000000000000 x27: 0000000000000007
[ 0.860775] x26: ffff8000829ef03c x25: ffff800081f5ab40 x24: ffff0000833385a8
[ 0.860779] x23: ffff80008186e328 x22: 0000000000000000 x21: ffff80008007babc
[ 0.860783] x20: ffff80008007bac8 x19: ffff8000843e04f8 x18: 0000000000000000
[ 0.860787] x17: 00000000a7e70532 x16: 00000000edd0963a x15: ffff0000808fcbd0
[ 0.860791] x14: ffff80008349fab0 x13: 0000000000000001 x12: 00000000334d65a6
[ 0.860794] x11: 0000000000000180 x10: 05d9de8a67bf9830 x9 : ffff80008156b08c
[ 0.860798] x8 : ffff0000808ff028 x7 : ffff0000808fcb00 x6 : 0000000000000010
[ 0.860802] x5 : 0000000000000000 x4 : ffff807f54ef8000 x3 : 0000000000000000
[ 0.860805] x2 : 0000000000000000 x1 : 8000000000000015 x0 : 8000000000000015
[ 0.860810] Call trace:
[ 0.860812] __efi_queue_work+0xe4/0x120 §
[ 0.860815] virt_efi_get_time+0x60/0xc0
[ 0.860819] efi_rtc_probe+0x58/0x180
[ 0.860826] platform_probe+0x70/0xe8
[ 0.860831] really_probe+0xc8/0x3a0
[ 0.860835] __driver_probe_device+0x84/0x160
[ 0.860839] driver_probe_device+0x40/0x128
[ 0.860843] __driver_attach+0xd0/0x1f0
[ 0.860847] bus_for_each_dev+0x84/0x100
[ 0.860851] driver_attach+0x2c/0x40
[ 0.860854] bus_add_driver+0x158/0x280
[ 0.860858] driver_register+0x70/0x140
[ 0.860862] __platform_driver_probe+0x54/0xe0
[ 0.860865] efi_rtc_driver_init+0x2c/0x40
[ 0.860868] do_one_initcall+0x64/0x320
[ 0.860873] do_initcalls+0x194/0x1d8
[ 0.860879] kernel_init_freeable+0x1b8/0x218
[ 0.860884] kernel_init+0x28/0x158
[ 0.860890] ret_from_fork+0x10/0x20

Second, the DMI BIOS data is incomplete around build dates and versions, and that makes it difficult to validate if the latest FW is installed.

The CPPC seems to be having ‘boost’ issues around the mismatched big/medium cores. I might suggest adding CPPC ‘turbo’ states to the mid/little cores just to keep the kernel from being unhappy about the asymmetry.

Running TMON, acpitz1 seems to be glitching the thermal state to 1200C once in a while, this seems to correspond to the clocks being cranked to minimum. I’m guessing this and the boost states above are a large part of the perf issues on the board.

The machine seems to be able to go into some suspend state, but then it can’t resume (not tried tracking this down at all).

The DT has coresight information, but its missing from the acpi DSDT/SSDT tables. I might have taken a crack at fixing this, if it were obvious where to post fixes.

tkaiser · April 2, 2025, 5:20pm

@RadxaYuntian is there some working way to address such issues? I second that UEFI info like this is not really helpful:

Vendor: Radxa Computer (Shenzhen) Co., Ltd.
Version: 1.0
Release Date: Jan  1 1980
BIOS Revision: 1.0

Also wrt coresight information Jeremy references… have you and/or Cix enabled any way to let community (now also ARM Holdings Ltd. included) jump in fixing things?

RadxaYuntian · April 2, 2025, 11:31am

tkaiser:

@RadxaYuntian is there some working way to address such issues? I second that UEFI info like this is not really helpful:
Vendor: Radxa Computer (Shenzhen) Co., Ltd.
Version: 1.0
Release Date: Jan  1 1980
BIOS Revision: 1.0

This will be fixed in next release.

Feel free to create issues under edk2-cix repo.

washley · April 3, 2025, 3:38pm

Is uart4 “Power management, voltage, and frequency monitoring” planned on being used? I’ve never seen any output there. Is this actually active but is interactive?

Also, what is the use for the BOOT_STRAP pin (next to the cluster of uart pins)?

Mario · April 3, 2025, 4:43pm

Regarding BOOT_STRAP:

meco · April 5, 2025, 5:32pm

https://browser.geekbench.com/v6/compute/3962520
10249 - Vulkan Score
(compared to ~3500 on RK3588)

ShivanSpS · April 5, 2025, 7:57pm

Is the Debian 12 DT image suppoused to be unstable?

Right now, flashing to image to a NVME and booting from it it works for a while until it crashes and eventually stops working.

Sometime it freeses with visual artifacts as when you have a memory problem on a unstable computer.

ShivanSpS · April 6, 2025, 12:08am

After reinstalling the image i try to run memtester, screen glitched out and lost signal half way trought it. Rebooted and it freezed on the desktop.

God damn it $300 for the board, $90 for shipping and an additional suprise $110 Fedex charged me at my door.

I tried with a psu, then with a 90W USB-C and two different m.2 at this point.

Just opened chromium on a fresh image

There is a stock distro that will give video out in ACPI mode so i could test with something else?

willy · April 6, 2025, 6:56am

That doesn’t smell good
Regarding memtester, you can try to pin it to one CPU core to see if that changes anything.
The big cores are 0,9,10,11; the medium cores are 5,6,7,8. The little cores are 1,2,3,4.
You can run for example: taskset -c 1 ./memtester to run it on a little core and see if it
crashes or works. Normally memory tests are agnostic to the CPU’s performance so a little
core should suffice and will draw less juice. If it passes, it would indicate that the RAM is not
at fault. Then you can retry on other cores, then multiple in parallel. If it only fails when using
multiple cores, it could be a power or cooling issue (e.g. heatsink improperly installed). But I
tend to think more about a power issue based on your description (maybe the USB cable itself if you’re using the same with various PSUs).

ShivanSpS · April 6, 2025, 4:07pm

Ok doing memtester runs only by ssh without anything else attached.

without taskset it freezes and stop responding like after like 2 minutes in.

Did a run on core 1 for 16386mb and it completed OK (this takes a lot of time)

Then tryied to 28700mb and it got “killed” when it tried to mlock, its running now for 28000mb.

Btw, i already checked the cooler and its fine.

EDIT: Ok 28000mb memtest completed on core 1 OK… well ill test one of the middle cores, then core 0.

Also the two m.2 screws came loose inside were the motherboard was, not sure if thats enoght to do damage during shipping.

willy · April 6, 2025, 4:52pm

OK so at least now you know that the RAM is fine. I don’t see why a screw would cause any damage. It’s possible that you’re just facing power issues. Mine only starts at 20V, maybe yours works with a lower voltage supply and sucks more amps from the cable. I’m now powering my board from Radxa’s provided PSU which includes its own cable. For now it’s working fine. It also used to work fine with a GaN 65W PSU and good quality cables previously.

ShivanSpS · April 6, 2025, 7:10pm

My first attempt was with a ATX PSU that i know it works fine, mainly because i wanted to see if i could get a RTX 3060 working somehow.

Well, right now its not crashing or freezing at all via SSH like it did yesterday, i realised memtest was not crashing anymore and right now it has been running stress-ng --cpu 12 for 30 minutes just fine.

But there is no screen or keyboard attached. I think ill left it like this for a while then use the stress-ng to test the storage and if that fine ill try a screen again.

Edit: ok the image is broken again and i cant boot into the desktop anymore, just stuck on a broken login.

Any of this dmesg seem off?

angel · April 7, 2025, 1:34am

i wanted to see if i could get a RTX 3060 working somehow.

Did you end up getting it to work? I haven’t been able to get either of my RTX 40 series cards to be detected, I posted more about it in this thread: Problems with PCIe Gen4 on the x8 slot!

re: debian image, for me it seems to be just completely broken/unreliable in many ways. For example, just a simple apt install nvme-cli smartmontools would completely freeze the system and the heartbeat led would quit blinking. I’ve given up and moved on to using mainline 6.14 NixOS with ACPI and things have been a lot more stable.

willy · April 7, 2025, 3:15am

I’m not spotting anything obviously wrong there. But it could possibly be related to the display if, as you say, it’s working fine when nothing is connected.

ShivanSpS · April 7, 2025, 11:33am

ok so im not the only one who had frezees with the debian DT image then. Good to know. You also have the 32GB version? Most people have the 16GB version because i belive those were shipped first.

As for the RTX 3060 i was thinking to bridge the presense pin on the pcie slot and try with that, if that dosent work ill try with a mining riser that should drop the link to x1 2.0 or 3.0.

We have a list of OS that work in ACPI mode?

ShivanSpS · April 7, 2025, 7:57pm

i had freezes on the ssh too the first day. But yesterday without changing anything it started working properly, then re-flashed the image (thats the other thing, these freezes ends up breaking the desktop env, not longer being able to log in, it just returns to the login screen again, that means there is some serious file corruption happening). And it did work fine on my small screen, i did ran some browser benchmarks and ran glxgears and a youtube video for hours without problems. At least on my small screen, i havent tried to connect my 4K monitor again.

So i dont known what to think really. My worry is that there is actually some issue with it and the heat from using it fixed it, that is not going to be permanent. Ill try to use it again later and see what happens.

The other possibility is that the iGPU is broken, ill need to test it somehow, the OpenGL game i hoped to use to benchmark it crashes via zink on it, but i know this is likely caused by zink.

EDIT: Well when i booted today into the image the login is again broken for no reason

Ill try with ACPI and a normal install.

ShivanSpS · April 8, 2025, 5:49pm

I wonder if there is any reason for the mesa version of the debian image to be 23.0.x what is really old, ill try to compile the current version.

argh unmeet deps.
builddeps:mesa : Depends: directx-headers-dev (>= 1.613.0) but 1.606.4-1 is to be installed
Depends: meson (>= 1.4.0) but 1.0.1-5 is to be installed
Depends: libdrm-dev (>= 2.4.121) but 2.4.114-1+b1 is to be installed
Depends: wayland-protocols (>= 1.34) but 1.31-1 is to be installed
E: Unable to correct problems, you have held broken packages.

Update:
Managed to build Mesa 24.0.9 but did nothing to me really, there seems to be some missing features on the vulkan driver to properly suport all zink features.

meco · April 10, 2025, 7:29am

RTX 3090 (PCIe Gen4 card) works out of the box in Fedora Rawhide with Linux 6.15 and ACPI boot.
No bugs Vulkan/OpenGL work in applications via Nouveau driver (nvidia driver doesn’t work with 6.15 for me yet)

Sidenote the PCIe connection is downgraded though
(Edit: This might be a power saving measure so I need to do further testing)

  LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
  	ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
  LnkCtl:	ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
  	ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
  LnkSta:	Speed 2.5GT/s (downgraded), Width x8 (downgraded)