Rock 3A NVMe SSD Filesystem Corruption

Facing a very similar issue to the users in this thread. I’m using the Rock 4 M.2 extension board to connect a NVMe SSD to my Rock 3A V1.31, which is running the stable-4.19-rock3 of the radxa bsp kernel on Armbian Bullseye. Basically, whenever I write a large number of files, like git cloning the radxa kernel repo or trying to install kernel headers, the root partition gets remounted read-only and if I check dmesg, I’ll see print_req_error: I/O error and EXT4-fs warning (device nvme0n1): ext4_end_bio:309: I/O error 10 writing to inode errors.

After my root partition is mounted read-only, once I reboot the system gets stuck on the boot animation screen forever. I’ve tried forcing fsck on boot by adding the fsck.mode=force kernel cmdline to Armbian before rebooting, but the system still gets stuck on the boot animation, so it seems like the root filesystem is fully corrupted and irreparable at that point.

As pointed out in the thread linked above though, Armbian’s Linux 5.10 legacy kernel doesn’t have this issue at all, at least not in my experience using the test cases that trigger the SSD issue on Linux 4.19 radxa kernel. I have no idea what exactly fixed the issue in Linux 5.10, but I’m hoping it can be backported to the stable 4.19 branch, or the 5.10 radxa kernel can be ready for rock 3a soon(MIPI DSI doesn’t work on Armbian Linux 5.10 right now, so I can’t use Linux 5.10).

Out of curiosity, since I’m really at a loss about what is causing this, I tested the Armbian Bookworm current(mainline Linux 6.1) image and managed to trigger the filesystem read-only error there as well(apps can’t even be launched at that point). However, what’s different is on mainline Linux 6.1, after I reboot the system manages to fix whatever the issue is and the system is fully useable again. Even more confusingly, after that reboot that fixed the system I can git clone the mainline Linux kernel repo and install kernel headers and run sudo apt full-upgrade all without triggering the filesystem read-only issue at all, so it doesn’t seem to be the same problem. Although, some kind of SSD issue still seems to exist on mainline Linux for the rock 3a, so it’s not just something that got fixed by non-rockchip changes in newer Linux kernel releases.

Any suggestions and help would be greatly appreciated, and thanks to all Radxa developers for this amazing board!