RockPi 4 kernel crash PCI-PCI bridge

Hello,

The RockPi 4 is the shoddiest thing I have ever bought. Mounds of problems and design flaws. Now, to top it off, if you attach any form of PCI-PCI bridge to the M.2 slot, the kernel panics on boot, as shown by the serial debug cable. The only devices that work are endpoints like Intel X550-T2.

Devices which crash the kernel include:

  • PLX / Broadcom PCIe switches
  • Gigabyte JHL7540 GC-TITAN RIDGE Thunderbolt 3 bridge
  • MSI JHL6540 Thunderbolt M3 card with pcie2tbt mailbox hack to make it permanently visible to non-Intel platforms (set enumeration mode to 0x1 from 0x0).

It is likely a problem with the drivers/pci/controller/pcie-rockchip* files in the kernel. If I can solve the issue then I will post patches to Bjorn Helgaas and try to get them accepted in mainline.

At the kernel developers and hardware designers who did the work, is it too much to ask that you thoroughly test your work? Just because you had NVMe endpoints in mind for the M.2 slot, does not change the fact that it is a (really badly bottlenecked) PCIe root port to which users can attach anything. Get a $5 M.2 - PCIe x4 adapter off Ebay, a molex connector from a power supply and try adding some GPU mining PCIe splitters - see how fast the kernel crashes.

I am concerned that the mainline kernel instructions require compiling a release candidate kernel - more shoddy work that did not make it to the final release without being reverted, I assume?

Thank you!

There is one BSP kernel maintained by Rockchip https://github.com/rockchip-linux/kernel and 3 main forks (FriendlyArm, Pine64, Radxa) with small but big enough changes to prevent clean merge. Quality? It works somehow :slight_smile: … but lower your expectations.

Few weeks ago I tested 10Gb card with success … don’t have much other hardware around. It was running K5.3.y but on another RK3399 board. PCI stack is identical so it has to work (in theory) here as well, but I don’t have M2 -> PCI to re-test.

Great!

Mainline support is “in development”, not very far, buggy in general. We at Armbian use Ayufan fork where RockPro/Pinebook pro changes are getting applied + other general ones + some other patches. https://github.com/ayufan-rock64/linux-mainline-kernel for all RK3399 boards at this stage. Pure mainline is less usable.

Quantity of competent people who might have know-how and the time to solve harder problems is low. Very low. At chip/board makers where you might address this wish.

If you wait on random people to fix things, it will take a while …

Just to clarify, the 10GbE card was one of the things that did work. It is PCI-PCI bridges that do not.

See https://hackaday.com/2019/09/05/pcie-multiplier-expands-raspberry-pi-4-possibilities/ which would seem to be the same issue - kernel crash with PCI-PCI bridge. Their solution was to edit DTB to allow more than one bus. But the DTB for Rock Pi 4 has 0x1f bus numbers already, so that is not the issue.

Yes, the quantity of skilled people is limited, but those capable individuals end up working for companies like this. They are clearly good at what they do… but why did they not fully test the board. Clearly they just checked that the PCIe worked with NVMe storage and left it at that.

Why are we waiting on random people to fix things? If you mean people like me, sure… but that does not change the fact that the people who are paid a living doing this from the Rock Pi company should have done it. I will be doing it for free.

first:

then:

then:

then:

Can I assume that raspberry pi is also shoddiest thing as well as that it’s developers are not testing? I guess everything can be improved, there is no perfection.

The Pi has issues, but the increased scope of the Rock Pi 4 means more flaws. Technically the fact that Rock Pi 4 has M.2 PCIe makes it better - but that means there are more things wrong with it including:

  • Using PCIe 2.0, putting them 6-8 years behind.
  • PCIe bottlenecked, even for PCIe 2.0 x4
  • M.2 slot facing out of board instead of in. NanoPC-T4 has it done correctly.
  • Why is the SoC on the bottom of the board? That is probably why the M.2 slot cannot be done correctly.
  • And so on.

Expectations need to grow, and this is a step up from the Raspberry Pi, which means higher expectations.

Also, it is a mini PC. It might not cost the same, but I bet if I buy an X86 SBC for the same price, that I will not be having kernel crashes because of PCI-PCI bridges.

It comes down to:

  • I was sold a device that advertises PCIe but it turns out it only works for endpoint devices. It would be like buying something advertising USB support but turns out that USB hubs crash the kernel - pretty big problem, not just a minor one
  • I bought it under the guise that it has mainline kernel support, and from this thread, it only kinda, not really, has mainline support

If you sell something, it has to work as advertised. I can accept small bugs, but not large issues like this.

Why did you buy your rock, what is the purpose, is it gpu mining?

There was a dev reporting that gpu’s and mining works, he is not anymore here. I did not test it at all, rockpi 4 is BIG difference even if PCIe2, it still can perform very well for specific purpose and using it as PoS/MN host it seems to be perfect device.

What exactly was advertized and told to you, do you reffer to radxa homepage?

Board makers promise (marketing) to deliver full blown “Linux Ubuntu” “Android” and they do up to some degree … you can boot it, play video, run 3D test, perhaps some game and job for HW designers is done. Software support people at board makers is terrible in general. They usually rely on (too little) skilled persons and upstream (Rockchip) contracts.

If we talk about SoC makers, Rockchip in this case, then yes. They should invest into development way more and better - at least in the past when we were working with 3288 their know-how / support was poor.

If we look solo on board makers as business entities we quickly arrive into the situation of (price/quality) race to the bottom. Why would they support competition?

Mainlining? This exceeds board creators operation for magnitude and is again not yet a final destination. Its just a start. Some companies does not live long enough too see their hardware getting there. This should be the job for Rockchip but I am not sure decision makers (at Rockchip) understand the scale of such operation. Or just don’t want to invest more -> even you don’t put anything for that, (some) support will eventually show up in the mainline. We will do it … but when and what will work? With afternoon hacking (which is not exactly a development) things progress slowly and lots of time is additionally wasted for lack of proper planning and communication.

I will be doing it for free.

We also do Armbian project mainly from our own pockets (few board makers do help). It’s an every day activity - some R&D, some bug hunting, support. We do receive general public donations which are shared among, spent for infrastructure, etc… While their share is below 0.5% of total costs, its enough to treat people with a dinner if they do something useful, something we are breaking teeth upon.

1 Like

These are great little experimental boards that have a strong following and are a work progress.
These are for hobbyist and people who like to tinker / hack / experiment. Thats what they sold and thats what you bought. If you live in the U.S. I will gladly buy that piece of junk from you and then you can go away. Really, your tirade is quite rude and your expectations are near out of line.

@ TheDude

Sorry, but my expectations do not have to be the same as yours. I did buy it to tinker with PCIe - but that requires PCIe to be working. I am not trying to be rude, but you may realise I am feeling very frustrated here.

Edit: also I will not sell this because I am a team player and I might be able to get the problem fixed and merged into mainline for everybody. I might sound negative when I am down, but my outlook is always positive in the long term.

@ igorp

Thanks for your comments. They do help me understand why things are why they are to some degree. Sorry for coming across as unforgiving. Please understand my frustration. But as always, I will try to contribute what I can, if I can. I am just struggling right now - I am trying to get mainline compiled as per instructions on the website but cannot get it to run successfully. Once I get that running, I can start to try to find out what the problem is.

Some things about trying to get mainline running

  • There does not appear to be an initrd / initramfs, nor the need to install modules in the guide for mainline. Are the modules packed in vmlinuz image? If not, then this might be part of the problem. The guide does not say anything about modules.
  • It is booting the kernel with the serial debug console but it hangs half way
  • Is it possible to see extlinux menu without serial debug cable?
  • My workflow is slow because every time I brick the device, I have to remove the heatsink to get to the eMMC, put it into the reader, reboot my laptop to Linux, restore the changes, go back to Windows for PuTTY (cannot get Minicom to work on Linux without characters being lost), reassemble eMMC and heatsink, try again. If I could find a way to use the extlinux boot menu to specify an alternate backup conf file then it would get things moving.
  • Often when selecting boot option, it manages to boot the kernel from another “label” option from extlinux.conf. I cannot tell what is going on. Either it finds it does not like the image specified and loads another one instead of failing gracefully, or it is overrunning the label section into the next entry.
  • Previously I did get actual mainline running on the board without using the source and toolchain specified in the guide, but PCIe would not work (I mean not initialised at all).
  • Following this guide closely has not quite booted.
  • Do I have to use the link https://github.com/ayufan-rock64/linux-mainline-kernel perhaps?

I will keep you up to date on progress. Cheers

Edit: The 5.2-rc6 ends with this and just halts.

[ 11.500798] evm: Initialising EVM extended attributes:
[ 11.503357] evm: security.selinux
[ 11.504511] evm: security.SMACK64
[ 11.505625] evm: security.SMACK64EXEC
[ 11.507258] evm: security.SMACK64TRANSMUTE
[ 11.508579] evm: security.SMACK64MMAP
[ 11.509755] evm: security.apparmor
[ 11.511288] evm: security.ima
[ 11.512294] evm: security.capability
[ 11.513488] evm: HMAC attrs: 0x1
[ 12.842924] rockchip-usb2phy ff770000.syscon:usb2-phy@e450: failed to create phy
[ 12.891135] rockchip-usb2phy ff770000.syscon:usb2-phy@e460: failed to create phy
[ 12.994947] rockchip-usb2phy ff770000.syscon:usb2-phy@e450: failed to create phy
[ 13.041023] rockchip-usb2phy ff770000.syscon:usb2-phy@e460: failed to create phy
[ 13.156476] hctosys: unable to open rtc device (rtc0)

@ 7261647861

I bought this to show that Thunderbolt 3 can work with ARM. Hence PCI-PCI bridges. But even non-Thunderbolt PCI bridges crash it.

The Raspberry Pi thing was interesting because if it turned out to be the same problem, it would have an easy fix. The Raspberry Pi is different, though, because its PCIe is only for the USB controller - it is not meant to be exposed to the user. The person de-soldered the USB controller and bridged the PCIe lane to a USB port and used those GPU risers that route PCIe through USB cables. This Rock Pi gives mixed feelings - positive that it exposes PCIe in a slot, but negative that it is finicky. I would not give up the PCIe feature, but people who do not need it are probably enjoying this a lot more than I.

The first issue was that the Rock Pi cannot drive the signal down one of those cables. I was slightly frustrated, but not upset. Desktops tend to handle them (perhaps they have re-timers behind the ports). I solved that issue with some new adapters to eliminate the long runs. That helped, and gets things like the 10GbE NIC working (cool).

Now I need to get a kernel I compiled running so that I can start to figure out the kernel crash.

End result is hopefully a proof of concept that Thunderbolt 3 is not limited to X86, to spread enthusiasm for the future of the connector.

Edit: About the advertising? I was checking off the important things and saw the mainline page and took that as them saying mainline is available. “ROCK Pi 4 is officially supported in mainline kernel since v5.1” https://wiki.radxa.com/Rockpi4/dev/kernel-mainline

2 Likes

Initial mainline support, date when chip arrived into mainline, can and usually looks like this:

  • serial console output only
  • one cpu core up running at 50%
    … which is totally useless for 99% of people.

Textbook example: Cubieboard 4 is “supported” by mainline for at least 3 years, but you still can’t boot it just like that and most of the on-board hardware does not work. This boards is exotic but it tells something about “support” term.

With RK3399 its ofc much better since its not exotic SoC and present on many (popular) boards.

Armbian nightly builds with K5.2.y have serial console on pins 6-8-10 but there is one problem - dt name is wrong which prevents successful boot (I thought this related patch will be accepted faster). When you make a link or make a copy of dt, image will boot.

Stick to armbian/ayufan if you want mainline experience. Everything else is IMO (bigger) waste of time.

My initial test with Thunderbolt 3 10GB NIC was unsuccessful - not detected - while it goes straight up to 10GB/s on Dell XPS13 :frowning:

“My initial test with Thunderbolt 3 10GB NIC was unsuccessful - not detected - while it goes straight up to 10GB/s on Dell XPS13 :frowning:

I cannot figure out this quote thing.

How did you add a Thunderbolt 3 port to your Rock Pi 4?

I didn’t. Sorry, mixing up those boards up … Rockpro64.

“I didn’t. Sorry, mixing up those boards up … Rockpro64.”

How did you add a Thunderbolt 3 port to your RockPro64?

If you shoved a GC-TITAN RIDGE in there, it would have crashed the kernel, presuming it has the same issue that the RockPi 4 does.

It was one quick test weeks ago with armbian development 5.3 Not recall whether it was a kernel crash or just not detected. I am only sure that it was not working and I didn’t investigate further.