Quad sata hat failure : disk sudden disconnection

Hi everyone,

My rpi 4 and its quad sata hat worked perfectly for one year.
It started 2 weeks ago suddenly when I played with rsync/backup files from the NAS to a remote share.
The raid 5 does not last and I lose regularly one drive (most of the time sdd). I changed this disk after a red light on the hat (the first on the right) and then a red light on the left disk.
Nothing changed… after reading similar issue, I updated the jms drivers and install a fresh and clean raspberry os (with OMV5).
Everything worked fine so far… I loaded my data… and boum, it started again.
(I tried 2 usb3 cables instead of the bridge but the hat is not seen)

I unmount the raid5, assemble with force, it works one hour maybe and then boum…

Has anyone any idea what’s going on ? Is the hat broken (sad because it’s sold out…)

Thank you folks

I have the dmesg log showing that suddenly there is a pb.

[ 218.783329] md/raid:md127: device sda operational as raid disk 0

[ 218.783339] md/raid:md127: device sdd operational as raid disk 3

[ 218.783345] md/raid:md127: device sdc operational as raid disk 2

[ 218.783352] md/raid:md127: device sdb operational as raid disk 1

[ 218.785827] md/raid:md127: raid level 5 active with 4 out of 4 devices, algorithm 2

[ 218.786160] md127 : bitmap file is out of date (24618 < 24758) – forcing full recovery

[ 218.786231] md127 : bitmap file is out of date, doing full recovery

[ 219.288414] md127: detected capacity change from 0 to 3000208195584

[ 351.129780] EXT4-fs (md127): recovery complete

[ 351.130227] EXT4-fs (md127): mounted filesystem with ordered data mode. Opts: user_xattr,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0,acl

[ 663.898256] sd 0:0:0:0: [sda] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=19s

[ 663.898298] sd 0:0:0:0: [sda] tag#9 Sense Key : 0xb [current]

[ 663.898321] sd 0:0:0:0: [sda] tag#9 ASC=0x0 ASCQ=0x0

[ 663.898347] sd 0:0:0:0: [sda] tag#9 CDB: opcode=0x28 28 00 4b 04 31 00 00 03 00 00

[ 663.898374] blk_update_request: I/O error, dev sda, sector 1258565888 op 0x0:(READ) flags 0x0 phys_seg 96 prio class 0

[ 716.658046] sd 0:0:0:0: [sda] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=52s

[ 716.658096] sd 0:0:0:0: [sda] tag#10 Sense Key : 0xb [current]

[ 716.658120] sd 0:0:0:0: [sda] tag#10 ASC=0x0 ASCQ=0x0

[ 716.658145] sd 0:0:0:0: [sda] tag#10 CDB: opcode=0x28 28 00 4b 04 31 08 00 00 08 00

[ 716.658173] blk_update_request: I/O error, dev sda, sector 1258565896 op 0x0:(READ) flags 0x4000 phys_seg 1 prio class 0

[ 719.958357] sd 0:0:0:0: [sda] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=56s

[ 719.958403] sd 0:0:0:0: [sda] tag#8 Sense Key : 0xb [current]

[ 719.958427] sd 0:0:0:0: [sda] tag#8 ASC=0x0 ASCQ=0x0

[ 719.958452] sd 0:0:0:0: [sda] tag#8 CDB: opcode=0x28 28 00 4b 04 31 60 00 00 08 00

[ 719.958479] blk_update_request: I/O error, dev sda, sector 1258565984 op 0x0:(READ) flags 0x4000 phys_seg 1 prio class 0

[ 723.256643] sd 0:0:0:0: [sda] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=59s

[ 723.256687] sd 0:0:0:0: [sda] tag#6 Sense Key : 0xb [current]

[ 723.256711] sd 0:0:0:0: [sda] tag#6 ASC=0x0 ASCQ=0x0

[ 723.256736] sd 0:0:0:0: [sda] tag#6 CDB: opcode=0x28 28 00 4b 04 31 68 00 00 08 00

[ 723.256764] blk_update_request: I/O error, dev sda, sector 1258565992 op 0x0:(READ) flags 0x4000 phys_seg 1 prio class 0

[ 727.602626] md/raid:md127: read error corrected (8 sectors at 1258565896 on sda)

[ 727.602789] md/raid:md127: read error corrected (8 sectors at 1258565984 on sda)

[ 727.602813] md/raid:md127: read error corrected (8 sectors at 1258565992 on sda)

[ 1505.160392] sd 1:0:0:0: [sdc] tag#6 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD OUT

[ 1505.160411] sd 1:0:0:0: [sdc] tag#6 CDB: opcode=0x2a 2a 08 00 00 00 10 00 00 05 00

[ 1505.160699] sd 1:0:0:1: [sdd] tag#5 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD OUT

[ 1505.160713] sd 1:0:0:1: [sdd] tag#5 CDB: opcode=0x2a 2a 00 4b ee f1 00 00 03 00 00

[ 1505.161086] sd 1:0:0:1: [sdd] tag#4 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD OUT

[ 1505.161100] sd 1:0:0:1: [sdd] tag#4 CDB: opcode=0x2a 2a 08 00 00 00 10 00 00 05 00

[ 1505.220455] scsi host1: uas_eh_device_reset_handler start

[ 1505.371451] usb 2-1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd

[ 1505.405076] scsi host1: uas_eh_device_reset_handler success

[ 1506.651278] scsi host1: uas_eh_device_reset_handler start

[ 1506.801447] usb 2-1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd

[ 1506.835099] scsi host1: uas_eh_device_reset_handler success

[ 1514.881561] sd 1:0:0:1: [sdd] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=54s

[ 1514.881597] sd 1:0:0:1: [sdd] tag#2 Sense Key : 0x5 [current]

[ 1514.881626] sd 1:0:0:1: [sdd] tag#2 ASC=0x21 ASCQ=0x0

[ 1514.881656] sd 1:0:0:1: [sdd] tag#2 CDB: opcode=0x2a 2a 00 4b ee f1 00 00 03 00 00

[ 1514.881694] blk_update_request: critical target error, dev sdd, sector 1273950464 op 0x1:(WRITE) flags 0x800 phys_seg 96 prio class 0

[ 1526.280734] sd 0:0:0:0: [sda] tag#10 uas_eh_abort_handler 0 uas-tag 9 inflight: CMD

[ 1526.280753] sd 0:0:0:0: [sda] tag#10 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1526.280780] sd 0:0:0:1: [sdb] tag#11 uas_eh_abort_handler 0 uas-tag 10 inflight: CMD

[ 1526.280794] sd 0:0:0:1: [sdb] tag#11 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1526.320779] scsi host0: uas_eh_device_reset_handler start

[ 1526.471806] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

[ 1526.505606] scsi host0: uas_eh_device_reset_handler success

[ 1531.296799] scsi host0: uas_eh_device_reset_handler start

[ 1531.441870] usb 2-2: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

[ 1531.475717] scsi host0: uas_eh_device_reset_handler success

[ 1562.121323] sd 0:0:0:0: [sda] tag#8 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN

[ 1562.121342] sd 0:0:0:0: [sda] tag#8 CDB: opcode=0x28 28 00 3a 30 0b d8 00 00 08 00

[ 1562.121574] xhci_hcd 0000:01:00.0 : WARNING: Host System Error

[ 1567.141403] xhci_hcd 0000:01:00.0 : xHCI host not responding to stop endpoint command.

[ 1567.141416] xhci_hcd 0000:01:00.0 : USBSTS: HCHalted HSE EINT

[ 1567.141459] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead

[ 1567.141507] xhci_hcd 0000:01:00.0: HC died; cleaning up

[ 1567.141579] usb 1-1: USB disconnect, device number 2

[ 1567.145958] sd 0:0:0:0: [sda] tag#7 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD

[ 1567.145987] sd 0:0:0:0: [sda] tag#7 CDB: opcode=0x2a 2a 00 4b ee f0 00 00 01 00 00

[ 1567.146771] usb 2-1: USB disconnect, device number 3

[ 1567.147266] sd 1:0:0:1: [sdd] tag#3 uas_zap_pending 0 uas-tag 1 inflight: CMD

[ 1567.147293] sd 1:0:0:1: [sdd] tag#3 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1567.147319] sd 1:0:0:0: [sdc] tag#2 uas_zap_pending 0 uas-tag 2 inflight: CMD

[ 1567.147333] sd 1:0:0:0: [sdc] tag#2 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1567.147602] sd 1:0:0:1: [sdd] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=52s

[ 1567.147621] sd 1:0:0:1: [sdd] tag#3 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1567.147651] blk_update_request: I/O error, dev sdd, sector 16 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0

[ 1567.147663] md: super_written gets error=-5

[ 1567.147677] md/raid:md127 : Disk failure on sdd, disabling device.

md/raid:md127: Operation continuing on 3 devices.

[ 1567.147741] sd 1:0:0:0: [sdc] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=52s

[ 1567.147760] sd 1:0:0:0: [sdc] tag#2 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1567.147783] blk_update_request: I/O error, dev sdc, sector 16 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0

[ 1567.147793] md: super_written gets error=-5

[ 1567.151149] sd 1:0:0:0: [sdc] Synchronizing SCSI cache

[ 1567.741460] sd 1:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00

[ 1567.823603] sd 1:0:0:1: [sdd] Synchronizing SCSI cache

[ 1568.421466] sd 1:0:0:1: [sdd] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00

[ 1568.462580] xhci_hcd 0000:01:00.0 : WARN Can’t disable streams for endpoint 0x82, streams are being disabled already

[ 1568.464759] usb 2-2: USB disconnect, device number 2

[ 1568.465252] sd 0:0:0:0: [sda] tag#9 uas_zap_pending 0 uas-tag 4 inflight: CMD

[ 1568.465268] sd 0:0:0:0: [sda] tag#9 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1568.465296] sd 0:0:0:1: [sdb] tag#10 uas_zap_pending 0 uas-tag 5 inflight: CMD

[ 1568.465309] sd 0:0:0:1: [sdb] tag#10 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1568.465361] sd 0:0:0:0: [sda] tag#9 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=102s

[ 1568.465378] sd 0:0:0:0: [sda] tag#9 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1568.465409] blk_update_request: I/O error, dev sda, sector 16 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0

[ 1568.465420] md: super_written gets error=-5

[ 1568.465483] sd 0:0:0:1: [sdb] tag#10 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=102s

[ 1568.465501] sd 0:0:0:1: [sdb] tag#10 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1568.465524] blk_update_request: I/O error, dev sdb, sector 16 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0

[ 1568.465534] md: super_written gets error=-5

[ 1568.465651] sd 0:0:0:0: Device offlined - not ready after error recovery

[ 1568.465669] sd 0:0:0:0: Device offlined - not ready after error recovery

[ 1568.467526] sd 0:0:0:0: [sda] Synchronizing SCSI cache

[ 1568.481492] blk_update_request: I/O error, dev sda, sector 16 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0

[ 1568.481507] md: super_written gets error=-5

[ 1568.481551] blk_update_request: I/O error, dev sda, sector 1273949440 op 0x1:(WRITE) flags 0x0 phys_seg 96 prio class 0

[ 1568.481707] blk_update_request: I/O error, dev sda, sector 976227288 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

[ 1568.481727] md/raid:md127 : read error not correctable (sector 976227288 on sda).

[ 1568.481750] blk_update_request: I/O error, dev sda, sector 1273950208 op 0x1:(WRITE) flags 0x800 phys_seg 32 prio class 0

[ 1568.671477] sd 0:0:0:1: [sdb] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s

[ 1568.671500] sd 0:0:0:1: [sdb] tag#2 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1568.671530] blk_update_request: I/O error, dev sdb, sector 16 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0

[ 1568.671542] md: super_written gets error=-5

[ 1568.671630] md: super_written gets error=-5

[ 1568.671694] blk_update_request: I/O error, dev sda, sector 8 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0

[ 1568.671704] md: super_written gets error=-5

[ 1568.871467] sd 0:0:0:1: [sdb] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s

[ 1568.871490] sd 0:0:0:1: [sdb] tag#2 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1568.871517] md: super_written gets error=-5

[ 1568.871645] md: super_written gets error=-5

[ 1568.871751] md: super_written gets error=-5

[ 1569.071520] sd 0:0:0:1: [sdb] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s

[ 1569.071544] sd 0:0:0:1: [sdb] tag#3 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1569.071569] md: super_written gets error=-5

[ 1569.071626] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=0x07 driverbyte=0x00

[ 1569.071796] md: super_written gets error=-5

[ 1569.071833] md: super_written gets error=-5

[ 1569.271471] sd 0:0:0:1: [sdb] tag#3 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x00 cmd_age=0s

[ 1569.271494] sd 0:0:0:1: [sdb] tag#3 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

[ 1569.271522] md: super_written gets error=-5

[ 1569.273709] sd 0:0:0:1: [sdb] Synchronizing SCSI cache

[ 1569.321706] md: super_written gets error=-5

[ 1569.321740] md: super_written gets error=-5

[ 1569.321813] md: super_written gets error=-5

[ 1569.322453] md: super_written gets error=-5

I changed the rpi 4b board… nothing changes.

I am going to stop using raid5 with the radxa board and try 2 raid1. ….

I’m using Rock Pi with Debian Buster OS and OMV5 with a 4 Seagate 1TB 2.5 HD in RAID5. So far no problems, but nobody at home is a heavy NAS user.

Mauricio

Here are the results of the hdparm

pi@nas:~ $ sudo hdparm -Tt /dev/sda /dev/sda
/dev/sda:
Timing cached reads: 1562 MB in 2.00 seconds = 781.32 MB/sec
Timing buffered disk reads: 252 MB in 3.01 seconds = 83.76 MB/sec
/dev/sda:
Timing cached reads: 1484 MB in 2.00 seconds = 742.32 MB/sec
Timing buffered disk reads: 252 MB in 3.51 seconds = 71.87 MB/sec

pi@nas:~ $ sudo hdparm -Tt /dev/sdb /dev/sdb
/dev/sdb:
Timing cached reads: 1478 MB in 2.00 seconds = 739.55 MB/sec
Timing buffered disk reads: 272 MB in 3.41 seconds = 79.79 MB/sec
/dev/sdb:
Timing cached reads: 1488 MB in 2.00 seconds = 743.97 MB/sec
Timing buffered disk reads: 252 MB in 3.36 seconds = 74.98 MB/sec

pi@nas:~ $ sudo hdparm -Tt /dev/sdc /dev/sdc
/dev/sdc:
Timing cached reads: 1482 MB in 2.00 seconds = 740.93 MB/sec
Timing buffered disk reads: 272 MB in 3.13 seconds = 86.82 MB/sec
/dev/sdc:
Timing cached reads: 1500 MB in 2.00 seconds = 750.39 MB/sec
Timing buffered disk reads: 252 MB in 3.42 seconds = 73.72 MB/sec

pi@nas:~ $ sudo hdparm -Tt /dev/sdd /dev/sdd
/dev/sdd:
Timing cached reads: 1446 MB in 2.00 seconds = 722.77 MB/sec
Timing buffered disk reads: 252 MB in 3.26 seconds = 77.19 MB/sec
/dev/sdd:
Timing cached reads: 1470 MB in 2.00 seconds = 734.73 MB/sec
Timing buffered disk reads: 252 MB in 3.27 seconds = 76.98 MB/sec

It’s a little slow but nothing big came up.

I tried another time raid 5 :

[ 2928.428552] md: super_written gets error=-5
[ 2928.462772] md: super_written gets error=-5
[ 2928.462834] md: super_written gets error=-5
[ 2928.546634] md/raid:md0 : read error not correctable (sector 160829240 on sdb).
[ 2928.546696] md/raid:md0 : read error not correctable (sector 160829240 on sda).
[ 2928.546920] md: super_written gets error=-5
[ 2928.546979] md: super_written gets error=-5
[ 2928.589803] md: super_written gets error=-5
[ 2928.589865] md: super_written gets error=-5

I am in total loss of hope !
The HDD are brand new :frowning:

I think it’s the quad sata hat.