One disk disconnects and reconnects as a new device

Hi all,

I have a raspberry pi 4B with 4 1TB SSD plugged in the SATA hat and arranged in 2 hardware raid (RAID 0) (following Setting up HARDWARE RAID with QUAD SATA-HAT (jms561 controller)). The hat is directly powered with the DC adapter.

This is what it looks like.

$ lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0  1.8T  0 disk 
└─sda1        8:1    0  1.8T  0 part /media/raida
sdb           8:64   0  1.8T  0 disk 
└─sdb1        8:65   0  1.8T  0 part /media/raidb
mmcblk0     179:0    0 14.8G  0 disk 
├─mmcblk0p1 179:1    0  256M  0 part /boot
└─mmcblk0p2 179:2    0 14.6G  0 part /

However, every day or so, the second disk fails and immediately reconnect as a new device (first /dev/sdc, then /dev/sdd, and currently /dev/sde. The UUID is listed in fstab so it mounts back to /media/raidb when I run sudo mount -a.

Note the disk in question is under constant heavy use (between 60-200MB/s read and 1-10MB/s write). The other disk (/dev/sda) is under moderate use and experiences no problem whatsoever.

I have read the seemingly related post: [solved] SATA HAT disconnects disk on heavy use but I don’t get the usb errors that they were getting so I am not sure the problem is the same.

It happened again overnight (/dev/sdd dying and coming back as /dev/sde) and I tracked the event in /var/log/message:

Mar 28 05:36:57 mediaserver kernel: [117370.023034] sd 3:0:0:0: [sdd] tag#12 uas_eh_abort_handler 0 uas-tag 9 inflight: IN 
Mar 28 05:36:57 mediaserver kernel: [117370.023051] sd 3:0:0:0: [sdd] tag#12 CDB: opcode=0x28 28 00 b4 19 cd d0 00 02 50 00
Mar 28 05:36:57 mediaserver kernel: [117370.043049] scsi host3: uas_eh_device_reset_handler start
Mar 28 05:36:57 mediaserver kernel: [117370.172229] usb 2-1: reset SuperSpeed Gen 1 USB device number 5 using xhci_hcd
Mar 28 05:36:57 mediaserver kernel: [117370.192386] usb 2-1: device firmware changed
Mar 28 05:36:57 mediaserver kernel: [117370.199471] scsi host3: uas_eh_device_reset_handler FAILED err -19
Mar 28 05:36:57 mediaserver kernel: [117370.199491] sd 3:0:0:0: Device offlined - not ready after error recovery
Mar 28 05:36:57 mediaserver kernel: [117370.199523] sd 3:0:0:0: [sdd] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06 cmd_age=30s
Mar 28 05:36:57 mediaserver kernel: [117370.199532] sd 3:0:0:0: [sdd] tag#12 CDB: opcode=0x28 28 00 b4 19 cd d0 00 02 50 00
Mar 28 05:36:57 mediaserver kernel: [117370.199538] print_req_error: 57 callbacks suppressed
Mar 28 05:36:57 mediaserver kernel: [117370.199712] usb 2-1: USB disconnect, device number 5
Mar 28 05:36:57 mediaserver kernel: [117370.201023] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 51015585 starting block 378373427)
Mar 28 05:36:57 mediaserver kernel: [117370.201032] buffer_io_error: 257 callbacks suppressed
Mar 28 05:36:57 mediaserver kernel: [117370.201049] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 51015585 starting block 378373481)
Mar 28 05:36:57 mediaserver kernel: [117370.201190] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 50987748 starting block 221300)
Mar 28 05:36:57 mediaserver kernel: [117370.201209] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 50995913 starting block 374333392)
Mar 28 05:36:57 mediaserver kernel: [117370.217817] buffer_io_error: 25 callbacks suppressed
Mar 28 05:36:58 mediaserver kernel: [117371.096007] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
Mar 28 05:36:59 mediaserver kernel: [117371.351107] sd 3:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00
Mar 28 05:36:59 mediaserver kernel: [117371.579331] usb 2-1: new SuperSpeed Gen 1 USB device number 6 using xhci_hcd
Mar 28 05:36:59 mediaserver kernel: [117371.600633] usb 2-1: New USB device found, idVendor=152d, idProduct=0561, bcdDevice=81.36
Mar 28 05:36:59 mediaserver kernel: [117371.600645] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Mar 28 05:36:59 mediaserver kernel: [117371.600651] usb 2-1: Product: External Disk 3.0
Mar 28 05:36:59 mediaserver kernel: [117371.600656] usb 2-1: Manufacturer: JMicron
Mar 28 05:36:59 mediaserver kernel: [117371.600660] usb 2-1: SerialNumber: RANDOM__1ACDEBB6A084
Mar 28 05:36:59 mediaserver kernel: [117371.610446] scsi host4: uas
Mar 28 05:36:59 mediaserver kernel: [117371.613307] scsi 4:0:0:0: Direct-Access     JMicron  Tech             8136 PQ: 0 ANSI: 6
Mar 28 05:36:59 mediaserver kernel: [117371.615421] sd 4:0:0:0: Attached scsi generic sg1 type 0
Mar 28 05:36:59 mediaserver kernel: [117371.616574] sd 4:0:0:0: [sde] 3906863104 512-byte logical blocks: (2.00 TB/1.82 TiB)
Mar 28 05:36:59 mediaserver kernel: [117371.616845] sd 4:0:0:0: [sde] Write Protect is off
Mar 28 05:36:59 mediaserver kernel: [117371.617426] sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
Mar 28 05:36:59 mediaserver kernel: [117371.618255] sd 4:0:0:0: [sde] Optimal transfer size 33553920 bytes
Mar 28 05:36:59 mediaserver kernel: [117371.651469]  sde: sde1
Mar 28 05:36:59 mediaserver kernel: [117371.655225] sd 4:0:0:0: [sde] Attached SCSI disk
Mar 28 05:36:59 mediaserver mtp-probe: checking bus 2, device 6: "/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb2/2-1"
Mar 28 05:36:59 mediaserver mtp-probe: bus: 2, device: 6 was not an MTP device
Mar 28 05:36:59 mediaserver mtp-probe: checking bus 2, device 6: "/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb2/2-1"
Mar 28 05:36:59 mediaserver mtp-probe: bus: 2, device: 6 was not an MTP device
Mar 28 05:37:01 mediaserver udisksd[426]: Error probing device: Error sending ATA command IDENTIFY DEVICE to '/dev/sde': Unexpected sense data returned:#0120000: 70 00 01 00  00 00 00 0a  00 00 00 00  00 1d 00 00    p...............#0120010: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................#012 (g-io-error-quark, 0)
Mar 28 05:37:04 mediaserver kernel: [117376.804579] EXT4-fs error: 5 callbacks suppressed
Mar 28 05:37:10 mediaserver kernel: [117382.572194] EXT4-fs error: 7 callbacks suppressed
Mar 28 05:37:20 mediaserver kernel: [117392.972809] EXT4-fs error: 1 callbacks suppressed
Mar 28 05:38:19 mediaserver kernel: [117451.668046] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:40:19 mediaserver kernel: [117571.860951] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:41:19 mediaserver kernel: [117631.956933] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:42:19 mediaserver kernel: [117691.734965] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:43:19 mediaserver kernel: [117752.152666] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:44:19 mediaserver kernel: [117812.248268] EXT4-fs error: 3 callbacks suppressed

I don’t know how to read this. Is it a sata HAT failure or a disk failure or a USB failure? What can I do about it?

Please let me know what other logs would be relevant and I’ll update!