Hi all,
I have a raspberry pi 4B with 4 1TB SSD plugged in the SATA hat and arranged in 2 hardware raid (RAID 0) (following Setting up HARDWARE RAID with QUAD SATA-HAT (jms561 controller)). The hat is directly powered with the DC adapter.
This is what it looks like.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part /media/raida
sdb 8:64 0 1.8T 0 disk
└─sdb1 8:65 0 1.8T 0 part /media/raidb
mmcblk0 179:0 0 14.8G 0 disk
├─mmcblk0p1 179:1 0 256M 0 part /boot
└─mmcblk0p2 179:2 0 14.6G 0 part /
However, every day or so, the second disk fails and immediately reconnect as a new device (first /dev/sdc
, then /dev/sdd
, and currently /dev/sde
. The UUID is listed in fstab so it mounts back to /media/raidb
when I run sudo mount -a
.
Note the disk in question is under constant heavy use (between 60-200MB/s read and 1-10MB/s write). The other disk (/dev/sda
) is under moderate use and experiences no problem whatsoever.
I have read the seemingly related post: [solved] SATA HAT disconnects disk on heavy use but I don’t get the usb errors that they were getting so I am not sure the problem is the same.
It happened again overnight (/dev/sdd
dying and coming back as /dev/sde
) and I tracked the event in /var/log/message
:
Mar 28 05:36:57 mediaserver kernel: [117370.023034] sd 3:0:0:0: [sdd] tag#12 uas_eh_abort_handler 0 uas-tag 9 inflight: IN
Mar 28 05:36:57 mediaserver kernel: [117370.023051] sd 3:0:0:0: [sdd] tag#12 CDB: opcode=0x28 28 00 b4 19 cd d0 00 02 50 00
Mar 28 05:36:57 mediaserver kernel: [117370.043049] scsi host3: uas_eh_device_reset_handler start
Mar 28 05:36:57 mediaserver kernel: [117370.172229] usb 2-1: reset SuperSpeed Gen 1 USB device number 5 using xhci_hcd
Mar 28 05:36:57 mediaserver kernel: [117370.192386] usb 2-1: device firmware changed
Mar 28 05:36:57 mediaserver kernel: [117370.199471] scsi host3: uas_eh_device_reset_handler FAILED err -19
Mar 28 05:36:57 mediaserver kernel: [117370.199491] sd 3:0:0:0: Device offlined - not ready after error recovery
Mar 28 05:36:57 mediaserver kernel: [117370.199523] sd 3:0:0:0: [sdd] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06 cmd_age=30s
Mar 28 05:36:57 mediaserver kernel: [117370.199532] sd 3:0:0:0: [sdd] tag#12 CDB: opcode=0x28 28 00 b4 19 cd d0 00 02 50 00
Mar 28 05:36:57 mediaserver kernel: [117370.199538] print_req_error: 57 callbacks suppressed
Mar 28 05:36:57 mediaserver kernel: [117370.199712] usb 2-1: USB disconnect, device number 5
Mar 28 05:36:57 mediaserver kernel: [117370.201023] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 51015585 starting block 378373427)
Mar 28 05:36:57 mediaserver kernel: [117370.201032] buffer_io_error: 257 callbacks suppressed
Mar 28 05:36:57 mediaserver kernel: [117370.201049] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 51015585 starting block 378373481)
Mar 28 05:36:57 mediaserver kernel: [117370.201190] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 50987748 starting block 221300)
Mar 28 05:36:57 mediaserver kernel: [117370.201209] EXT4-fs warning (device sdd1): ext4_end_bio:349: I/O error 10 writing to inode 50995913 starting block 374333392)
Mar 28 05:36:57 mediaserver kernel: [117370.217817] buffer_io_error: 25 callbacks suppressed
Mar 28 05:36:58 mediaserver kernel: [117371.096007] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
Mar 28 05:36:59 mediaserver kernel: [117371.351107] sd 3:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00
Mar 28 05:36:59 mediaserver kernel: [117371.579331] usb 2-1: new SuperSpeed Gen 1 USB device number 6 using xhci_hcd
Mar 28 05:36:59 mediaserver kernel: [117371.600633] usb 2-1: New USB device found, idVendor=152d, idProduct=0561, bcdDevice=81.36
Mar 28 05:36:59 mediaserver kernel: [117371.600645] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Mar 28 05:36:59 mediaserver kernel: [117371.600651] usb 2-1: Product: External Disk 3.0
Mar 28 05:36:59 mediaserver kernel: [117371.600656] usb 2-1: Manufacturer: JMicron
Mar 28 05:36:59 mediaserver kernel: [117371.600660] usb 2-1: SerialNumber: RANDOM__1ACDEBB6A084
Mar 28 05:36:59 mediaserver kernel: [117371.610446] scsi host4: uas
Mar 28 05:36:59 mediaserver kernel: [117371.613307] scsi 4:0:0:0: Direct-Access JMicron Tech 8136 PQ: 0 ANSI: 6
Mar 28 05:36:59 mediaserver kernel: [117371.615421] sd 4:0:0:0: Attached scsi generic sg1 type 0
Mar 28 05:36:59 mediaserver kernel: [117371.616574] sd 4:0:0:0: [sde] 3906863104 512-byte logical blocks: (2.00 TB/1.82 TiB)
Mar 28 05:36:59 mediaserver kernel: [117371.616845] sd 4:0:0:0: [sde] Write Protect is off
Mar 28 05:36:59 mediaserver kernel: [117371.617426] sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
Mar 28 05:36:59 mediaserver kernel: [117371.618255] sd 4:0:0:0: [sde] Optimal transfer size 33553920 bytes
Mar 28 05:36:59 mediaserver kernel: [117371.651469] sde: sde1
Mar 28 05:36:59 mediaserver kernel: [117371.655225] sd 4:0:0:0: [sde] Attached SCSI disk
Mar 28 05:36:59 mediaserver mtp-probe: checking bus 2, device 6: "/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb2/2-1"
Mar 28 05:36:59 mediaserver mtp-probe: bus: 2, device: 6 was not an MTP device
Mar 28 05:36:59 mediaserver mtp-probe: checking bus 2, device 6: "/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb2/2-1"
Mar 28 05:36:59 mediaserver mtp-probe: bus: 2, device: 6 was not an MTP device
Mar 28 05:37:01 mediaserver udisksd[426]: Error probing device: Error sending ATA command IDENTIFY DEVICE to '/dev/sde': Unexpected sense data returned:#0120000: 70 00 01 00 00 00 00 0a 00 00 00 00 00 1d 00 00 p...............#0120010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................#012 (g-io-error-quark, 0)
Mar 28 05:37:04 mediaserver kernel: [117376.804579] EXT4-fs error: 5 callbacks suppressed
Mar 28 05:37:10 mediaserver kernel: [117382.572194] EXT4-fs error: 7 callbacks suppressed
Mar 28 05:37:20 mediaserver kernel: [117392.972809] EXT4-fs error: 1 callbacks suppressed
Mar 28 05:38:19 mediaserver kernel: [117451.668046] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:40:19 mediaserver kernel: [117571.860951] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:41:19 mediaserver kernel: [117631.956933] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:42:19 mediaserver kernel: [117691.734965] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:43:19 mediaserver kernel: [117752.152666] EXT4-fs error: 2 callbacks suppressed
Mar 28 05:44:19 mediaserver kernel: [117812.248268] EXT4-fs error: 3 callbacks suppressed
I don’t know how to read this. Is it a sata HAT failure or a disk failure or a USB failure? What can I do about it?
Please let me know what other logs would be relevant and I’ll update!