Random crash using the vendor kernel if booting from LVM on NVME


This problem happens pretty often.
Initially I built the vendor kernel (needs to revert the gcc_wrap commit to pass compile using latest native toolchain on aarch64), and it has a very high chance to crash (without any kernel call trace) at boot time, at or just after PCIE and lvm initialization).

Initially I thought this is some random crap that vendor kernel doesn’t handle newer toolchain.

But today I also tried the prebuilt debian image, and well it’s even worse.
It has not only the same crash (and reset) behavior, but also sometimes failed to initialize the PCIE2.0 lanes for r8125 network card.

The r8125 failed to init problem is new on the prebuilt debian image.

I’m not expecting the problem can be solved by RK kernel, just let you guys know, RK vendor kernel is really crap.

What power adapter are you using? What’s the output of sensors command?

20V 2.25A PD power supply.

Adapter: rk3x-i2c
in0:          20.00 V  (min = +20.00 V, max = +20.00 V)
curr1:         2.25 A  (max =  +2.25 A)

So nope, definitely not the power supply.

My guess is the PCIE initialization code, as the reset always happen around PCIE initialization time.
And my NVME + LVM setup seems to make the problem much easier to trigger.

I guess you guys should have already known how crappy the rockchip kernel is.
They even have extra sleep in device mapper, while never really investigate the root cause.

One thing to mention is, I don’t use rootwait kernel option for both the prebuilt debian image or my self compiled kernel with ArchlinuxARM user space.

If rootwait is a mandatory, this means the PCIE initialization/probe has some race.
Although I didn’t expect the Rockchip guys would even bother to investigate.

I’ll try the upstream code and see if it’s possible to use rk356x PCIE driver/dts for RK3588.

That’s why you will see sleep in the BSP source.