Issues with modern graphics cards

Hello, I’ve decided that my rx 580 doesn’t cut it so i ordered an rx 7800 to try with my Orion.
Fast forward to when it arrives, I plug it in and nothing happens, Black screen.
According to linux’s lspci i see nothing there either.

I’ve seen this post by Bryan_F where he managed to get an rx 7600 working which got my hopes up.

Any hopes of eventually getting such cards working in the future or is this a hardware limitation?

Here are some logs just in case:
logs.zip (10.7 KB)

1 Like

Do you use a power supply or just powered by PCIE?

I use an ATX power supply.

Have you checked that the graphics card actually works on another machine? If it is not listed by lspci, then it probably implies a hardware issue with the card itself or the power supply, or maybe you were unfortunate enough to get a defective Orion board.

Have you checked that the graphics card actually works on another machine?

I don’t actually have another machine to test with.

If it is not listed by lspci , then it probably implies a hardware issue with the card itself or the power supply

Likely, I returned the card a day after though so it’s fine.

or maybe you were unfortunate enough to get a defective Orion board.

What? Why? Other graphics cards work on my board, Currently driving an rx 580.

@arolath & @Bryan_F both of you have an rx 7600 correct?
I’m planning on getting myself one to validate which ones work since i’ve seen conflicting information.
What version/memory capacity do your cards have?
And can you test with both UEFI version 0.3.1 and 0.9.0 or 1.0.0-1?
Additionally could you test with the latest kernel as of now? 6.17.0-rc1? (Or the latest that your distro offers)

Wild guess but i’m assuming 8gb cards work while anything higher doesn’t?
Another wild guess would be assuming that the OC versions have a lower chance of working?

i’ve got a 7600xt with 16gb of vram, it actually works just fine since around kernel 6.15. it just won’t show the bios, you need to plug in an HDMI into the orion to see what’s going on untill you get to the OS. oh and i was using Armbian and opensuse

Ah that’s perfectly expected behavior then, Since your last post you didn’t mention that you got it working so i assumed rx 7600 had some issues, I guess not! Great news!!
And even better news to hear that the 16gb version works.

I’ll see if i can get myself a hold of one too.
Should also update my wiki…

Many thanks!!

Can confirm, rx 7600 xt works!
Doesn’t work in UEFI on 0.3.1-1 but works in linux
Works in UEFI on 1.0.0-1 but not in linux

I’m gonna stick to 0.3.1-1 for now.
Not sure if this is a me issue or an nvtop issue but data rate readings are not available in this card…
That was… Sorta helpful when i was doing game dev on this…
Oh well…

Where can I find the bios 0.3.1.1
regards uli

Right here
Though be warned it’s a pre-release so expect bugs.

Do you mean the RX and TX values that are visible at the top of the screenshot here? I just gave it a try on my x64 system with RX 7900 XTX and got the same result; my guess would be that the functionality is supported only on NVIDIA GPUs (in other words, probably not an AArch64 problem). Also, usually I run amdgpu_top, but it doesn’t seem to support such statistics :frowning:.

Yes the RX/TX values, And no that was working fine on my rx 580 with the orion.
I guess this is normal then, Got it.
Then this card is fully functional on the orion (UEFI not visible through it due to missing x86 emulation in firmwares older than 1.0.0-1)

I highly recommend anyone who can afford it to get it and pair it with the orion, Though maybe not right now as it sort of doesn’t give an edge over the rx 580 in games, CPU is still a bottleneck…
Hoping we’d get the 2.8ghz firmware sometime reaaaal soon or at least an option to overclock from UEFI (If possible)

Well, it is probably a RDNA problem then. If it is any consolation, with AMD GPUs the RX and TX numbers seem to be an upper bound at best and not an actual estimate of the data transfer rates, judging by the implementation.

Otherwise my expectation is that all recent AMD GPUs work after UEFI hands off the reins to Linux as long as the kernel version is at least 6.15 and the firmware build is the latest release, i.e. 0.3.0-1 (and possibly others such as 9.0.0); ACPI mode would be necessary, of course. That is certainly my case - I even managed to get Steam and Cyberpunk 2077 running, albeit with an unplayable framerate. In fact, even UEFI is visible for me - FYI my card is MSI Radeon RX 6700 XT MECH 2X 12GB OC. Also, if I block the amdgpu kernel module from loading, I still get output from the card, with the same reduced functionality as from the integrated GPU (i.e. a simple framebuffer device and software rendering).

Similar to you, everything works fine except that I discovered one annoying issue - my display has only 3 inputs, but I juggle 4 devices with it. The Orion O6 board (well, the external GPU on it) is connected via a HDMI cable, but I have to detach it to use one of my other computers. However, shortly after I disconnect the cable the machine locks up and needs a reset. The only symptom that I noticed were the following messages that appeared in the kernel log (I was connected via SSH from another computer):

[ 4710.788374] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x10 received:
[ 4710.788398] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000c30100000010
[ 4710.788414] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000020000000000
[ 4710.788419] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
[ 4710.788422] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
[ 4710.788426] arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:c3:00.1 sid: 0xc301 ssid: 0x0 iova: 0x0 ipa: 0x0
[ 4710.788433] arm-smmu-v3 arm-smmu-v3.0.auto: unpriv data write s1 "Input address caused fault" stag: 0x0

I am currently on kernel version 6.17-rc2, but it was the same on some versions of the 6.15 and 6.16 releases. If I keep the cable connected and leave the machine alone, even for more than a day, it still remains responsive afterwards. It is probably another dark corner of the amdgpu code that needs a fix…

Yeah… I’ve been experiencing a lot of lock-ups post 6.17 and rx 7600 xt upgrades.
Not sure what causes this…
Most of the lockups seem to happen from something removing something that was killed?
That’s what dmesg says (amdgpu driver issue), I’ll see if i can get the logs when it happens next time and update this.

Edit:
[ 8251.589995] [drm:drm_sched_entity_push_job] *ERROR* Trying to push to a killed entity

The only lock-ups that I have experienced (except for the one I already mentioned) have been thermal-related, I believe, because they have happened while something has been hammering on all the CPU cores simultaneously for an extended period of time, i.e. a kernel build, the HPL benchmark, etc. sensors reports the temperature jumping north of 90 °C; I suppose that is the effect of the combination of summer and keeping everything in a case instead of having the bare board lying on my desk…

I don’t think that I have seen your error message, but I have seen another one - not sure how bad it is, given that things seem to be working fine otherwise:

[   10.228203] amdgpu 0000:c3:00.0: amdgpu: [drm] Unknown EDID CEA parser results