M2.ssd will randomly fail to write on rock3a

hey there, i have been running a rock3a for a while with a poe hat and nvme attached to it and it randomly starts to fail to write to the disk and then the kernel remounts it as readonly.
you can see the dmesg log here gist:dfb76e9401ebb932bbfdb90734a4ebc8 (github.com)
and the kernel commit i am running is the following arm64: dts: radxa cm3 io: add HD101BOE9365 Display support · radxa/kernel@12d0b2b (github.com)
with this only change on the config

diff --git a/arch/arm64/configs/rockchip_linux_defconfig b/arch/arm64/configs/rockchip_linux_defconfig
index bcfd115935e4..5af59cabd46e 100644
--- a/arch/arm64/configs/rockchip_linux_defconfig
+++ b/arch/arm64/configs/rockchip_linux_defconfig
@@ -1166,3 +1166,8 @@ CONFIG_FUNCTION_TRACER=y
 CONFIG_BLK_DEV_IO_TRACE=y
 CONFIG_LKDTM=y
 CONFIG_CGROUP_NET_PRIO=y
+
+CONFIG_SCSI_ISCSI_ATTRS=y
+CONFIG_ISCSI_TCP=y
+
+CONFIG_ARM64_VA_BITS_48=y

any idea of what could be causing it? could it be power? the poe switch provides up to 30W per port

Hi, please upload the dmesg.

Hi, the dmesg is on the link, as there is body limit here and no txt can be uploaded.

From the log, the nvme ssd is down, what power adapter are you using and did you connect other USB devices?

hey @jack i am using a poe that delivers up to 30W per port and no other usb devices connected, just the nvme and ethernet cable

What PoE HAT are you using? I suspect the voltage for 5V is dropped so the NVMe is under voltage.

Hey @jack it is a RockPi_PoE_F4L Rock Pi 4, if you point me where I should take the measurement of the voltage, I can double-check it.

@jack
i have the same problem.
this problem has been happend on my 4 sbc.
the proer supply is CoolMaster GX550.

i’m using this way on cron to alleviate this problem.

  1. check if nvme fs mount has mounted and is rw
  2. check /dev/ has nvme partition mount point, if not, reboot
  3. stop all service which usins the nvme fs mount point
  4. umount all nvme partition
  5. run fsck
  6. remount all nvme partition
  7. restart alll service which had stoped before

thanks for this but, it still is not a solution on the long run.
i wonder if i with emmc the same thing would happen.

Hey jack, even if i plug a QC charger in it the same happens.
To reproduce it faster you can do apt install stress && cd /nvme-path && journalctl -kf & && stress --cpu 8 --vm 8 --hdd 8