I’ve been having persistent problems attempting to rebuild the mdraid set on this HAT. For a while, I had poor performance, the updated the firmware before the warning message appeared, and recently had a disk failure. Sure - no problem, replace the disk right?
[ 485.246042] md: recovery of RAID array md127
[ 609.259857] sd 1:0:0:0: [sdc] tag#13 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN
[ 609.259874] sd 1:0:0:0: [sdc] tag#13 CDB: Read(10) 28 00 01 21 5b 00 00 04 00 00
[ 609.260441] sd 1:0:0:1: [sdd] tag#5 uas_eh_abort_handler 0 uas-tag 12 inflight: CMD OUT
[ 609.260451] sd 1:0:0:1: [sdd] tag#5 CDB: Write(10) 2a 00 01 21 23 00 00 04 00 00
[ 609.260756] sd 1:0:0:0: [sdc] tag#11 uas_eh_abort_handler 0 uas-tag 8 inflight: CMD IN
[ 609.260766] sd 1:0:0:0: [sdc] tag#11 CDB: Read(10) 28 00 01 21 53 00 00 04 00 00
[ 609.261368] sd 1:0:0:1: [sdd] tag#4 uas_eh_abort_handler 0 uas-tag 11 inflight: CMD OUT
[ 609.261377] sd 1:0:0:1: [sdd] tag#4 CDB: Write(10) 2a 00 01 21 1f 00 00 04 00 00
[ 609.261682] sd 1:0:0:0: [sdc] tag#10 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
[ 609.261690] sd 1:0:0:0: [sdc] tag#10 CDB: Read(10) 28 00 01 21 57 00 00 04 00 00
[ 609.262250] sd 1:0:0:1: [sdd] tag#3 uas_eh_abort_handler 0 uas-tag 10 inflight: CMD OUT
[ 609.262259] sd 1:0:0:1: [sdd] tag#3 CDB: Write(10) 2a 00 01 21 1b 00 00 04 00 00
[ 609.262567] sd 1:0:0:0: [sdc] tag#7 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[ 609.262576] sd 1:0:0:0: [sdc] tag#7 CDB: Read(10) 28 00 01 21 63 00 00 04 00 00
[ 609.263237] sd 1:0:0:1: [sdd] tag#2 uas_eh_abort_handler 0 uas-tag 9 inflight: CMD OUT
[ 609.263246] sd 1:0:0:1: [sdd] tag#2 CDB: Write(10) 2a 00 01 21 17 00 00 04 00 00
[ 609.263575] sd 1:0:0:0: [sdc] tag#6 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
[ 609.263584] sd 1:0:0:0: [sdc] tag#6 CDB: Read(10) 28 00 01 21 4f 00 00 04 00 00
[ 609.264169] sd 1:0:0:0: [sdc] tag#1 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
[ 609.264186] sd 1:0:0:0: [sdc] tag#1 CDB: Read(10) 28 00 01 21 5f 00 00 04 00 00
[ 609.264783] sd 1:0:0:0: [sdc] tag#0 uas_eh_abort_handler 0 uas-tag 7 inflight: CMD IN
[ 609.264795] sd 1:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 01 21 4b 00 00 04 00 00
[ 639.980186] sd 1:0:0:0: [sdc] tag#12 uas_eh_abort_handler 0 uas-tag 14 inflight: CMD IN
[ 639.980204] sd 1:0:0:0: [sdc] tag#12 CDB: Read(10) 28 00 01 21 6f 00 00 04 00 00
[ 639.980726] sd 1:0:0:0: [sdc] tag#9 uas_eh_abort_handler 0 uas-tag 13 inflight: CMD IN
[ 639.980736] sd 1:0:0:0: [sdc] tag#9 CDB: Read(10) 28 00 01 21 6b 00 00 04 00 00
[ 639.981292] sd 1:0:0:0: [sdc] tag#8 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD IN
[ 639.981302] sd 1:0:0:0: [sdc] tag#8 CDB: Read(10) 28 00 01 21 67 00 00 04 00 00
[ 639.996191] scsi host1: uas_eh_device_reset_handler start
[ 640.125091] usb 2-1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[ 640.148124] scsi host1: uas_eh_device_reset_handler success
[ 641.089442] scsi host1: uas_eh_device_reset_handler start
[ 641.217054] usb 2-1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[ 641.240003] scsi host1: uas_eh_device_reset_handler success
[ 657.507907] md: md127: recovery interrupted.
[ 658.867623] md127: detected capacity change from 4000228311040 to 0
[ 658.867655] md: md127 stopped.
[ 887.999523] usb 2-2: USB disconnect, device number 2
[ 888.001702] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 888.239236] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 888.306000] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[ 888.543240] sd 0:0:0:1: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 888.743457] usb 2-1: USB disconnect, device number 3
[ 888.745564] sd 1:0:0:0: [sdc] Synchronizing SCSI cache
[ 888.983240] sd 1:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 889.049994] sd 1:0:0:1: [sdd] Synchronizing SCSI cache
[ 889.287236] sd 1:0:0:1: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 1088.328481] usb 2-1: new SuperSpeed Gen 1 USB device number 4 using xhci_hcd
[ 1088.349453] usb 2-1: New USB device found, idVendor=1058, idProduct=0a10, bcdDevice=81.36
[ 1088.349467] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=5
[ 1088.349477] usb 2-1: Product: JMS56x Series
[ 1088.349487] usb 2-1: Manufacturer: JMicron
[ 1088.349496] usb 2-1: SerialNumber: 1234567890123
[ 1088.357572] scsi host0: uas
[ 1088.358683] scsi 0:0:0:0: Direct-Access ST2000LM 015-2E8174 8136 PQ: 0 ANSI: 6
[ 1088.360025] scsi 0:0:0:1: Direct-Access ST2000LM 015-2E8174 8136 PQ: 0 ANSI: 6
[ 1088.361495] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 1088.362114] sd 0:0:0:1: Attached scsi generic sg1 type 0
[ 1088.363301] sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 1088.363533] sd 0:0:0:0: [sda] Write Protect is off
[ 1088.363541] sd 0:0:0:0: [sda] Mode Sense: 67 00 10 08
[ 1088.364022] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 1088.364776] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
[ 1088.366531] sd 0:0:0:1: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 1088.366787] sd 0:0:0:1: [sdb] Write Protect is off
[ 1088.366794] sd 0:0:0:1: [sdb] Mode Sense: 67 00 10 08
[ 1088.367252] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 1088.367959] sd 0:0:0:1: [sdb] Optimal transfer size 33553920 bytes
[ 1088.415724] sda: sda1
[ 1088.445528] sdb: sdb1
[ 1088.447727] sd 0:0:0:0: [sda] Attached SCSI disk
[ 1088.514565] sd 0:0:0:1: [sdb] Attached SCSI disk
[ 1088.892535] usb 2-2: new SuperSpeed Gen 1 USB device number 5 using xhci_hcd
[ 1088.913667] usb 2-2: New USB device found, idVendor=1058, idProduct=0a10, bcdDevice=81.36
[ 1088.913681] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=5
[ 1088.913691] usb 2-2: Product: JMS56x Series
[ 1088.913700] usb 2-2: Manufacturer: JMicron
[ 1088.913709] usb 2-2: SerialNumber: 1234567890123
[ 1088.921976] scsi host1: uas
[ 1088.923116] scsi 1:0:0:0: Direct-Access WDC WD20 SPZX-22UA7T0 8136 PQ: 0 ANSI: 6
[ 1088.926079] scsi 1:0:0:1: Direct-Access WDC WD20 SPZX-22UA7T0 8136 PQ: 0 ANSI: 6
[ 1088.927348] sd 1:0:0:0: Attached scsi generic sg2 type 0
[ 1088.927989] scsi 1:0:0:1: Attached scsi generic sg3 type 0
[ 1088.928969] sd 1:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 1088.929258] sd 1:0:0:0: [sdc] Write Protect is off
[ 1088.929268] sd 1:0:0:0: [sdc] Mode Sense: 67 00 10 08
[ 1088.929839] sd 1:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 1088.930617] sd 1:0:0:0: [sdc] Optimal transfer size 33553920 bytes
[ 1088.930729] sd 1:0:0:1: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 1088.931006] sd 1:0:0:1: [sdd] Write Protect is off
[ 1088.931015] sd 1:0:0:1: [sdd] Mode Sense: 67 00 10 08
[ 1088.931434] sd 1:0:0:1: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 1088.932185] sd 1:0:0:1: [sdd] Optimal transfer size 33553920 bytes
[ 1089.050634] sdc: sdc1
[ 1089.155696] sd 1:0:0:0: [sdc] Attached SCSI disk
[ 1089.155727] sdd: sdd1
[ 1089.180400] sd 1:0:0:1: [sdd] Attached SCSI disk
[ 1089.479860] md/raid10:md127: active with 3 out of 4 devices
[ 1089.502718] md127: detected capacity change from 0 to 4000228311040
[ 1089.529323] md: recovery of RAID array md127
[ 1131.504441] sd 0:0:0:1: [sdb] tag#11 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD OUT
[ 1131.504461] sd 0:0:0:1: [sdb] tag#11 CDB: Write(10) 2a 00 01 81 ae 80 00 04 00 00
[ 1131.504760] sd 0:0:0:0: [sda] tag#9 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD IN
[ 1131.504772] sd 0:0:0:0: [sda] tag#9 CDB: Read(10) 28 00 01 82 a7 80 00 02 80 00
[ 1131.505266] sd 0:0:0:0: [sda] tag#8 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN
[ 1131.505277] sd 0:0:0:0: [sda] tag#8 CDB: Read(10) 28 00 01 82 a6 00 00 01 80 00
[ 1131.505603] sd 0:0:0:0: [sda] tag#7 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
[ 1131.505613] sd 0:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 01 82 a2 00 00 04 00 00
[ 1131.506135] sd 0:0:0:0: [sda] tag#6 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
[ 1131.506145] sd 0:0:0:0: [sda] tag#6 CDB: Read(10) 28 00 01 82 9e 00 00 04 00 00
[ 1131.520445] scsi host0: uas_eh_device_reset_handler start
[ 1131.649281] usb 2-1: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
[ 1131.672379] scsi host0: uas_eh_device_reset_handler success
[ 1133.525099] scsi host0: uas_eh_device_reset_handler start
[ 1133.653288] usb 2-1: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
[ 1133.676329] scsi host0: uas_eh_device_reset_handler success
[ 1213.425242] sd 0:0:0:0: [sda] tag#7 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD IN
[ 1213.425263] sd 0:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 02 23 13 00 00 04 00 00
[ 1213.425959] sd 0:0:0:0: [sda] tag#6 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
[ 1213.425972] sd 0:0:0:0: [sda] tag#6 CDB: Read(10) 28 00 02 23 0f 00 00 04 00 00
[ 1213.426654] sd 0:0:0:0: [sda] tag#5 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
[ 1213.426665] sd 0:0:0:0: [sda] tag#5 CDB: Read(10) 28 00 02 23 17 00 00 04 00 00
[ 1213.427187] sd 0:0:0:0: [sda] tag#4 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[ 1213.427197] sd 0:0:0:0: [sda] tag#4 CDB: Read(10) 28 00 02 23 0b 00 00 04 00 00
[ 1217.521276] sd 0:0:0:1: [sdb] tag#0 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD OUT
[ 1217.521296] sd 0:0:0:1: [sdb] tag#0 CDB: Write(10) 2a 00 02 22 1b 80 00 04 00 00
[ 1217.541268] scsi host0: uas_eh_device_reset_handler start
[ 1217.670138] usb 2-1: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd
[ 1217.693099] scsi host0: uas_eh_device_reset_handler success
[ 1233.317693] usb 2-1: USB disconnect, device number 4
[ 1233.318325] sd 0:0:0:0: [sda] tag#7 uas_zap_pending 0 uas-tag 1 inflight: CMD
[ 1233.318341] sd 0:0:0:0: [sda] tag#7 CDB: Test Unit Ready 00 00 00 00 00 00
[ 1233.318395] scsi host0: uas_eh_device_reset_handler FAILED to get lock err -19
[ 1233.325812] sd 0:0:0:1: Device offlined - not ready after error recovery
[ 1233.325826] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 1233.325838] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 1233.325848] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 1233.325858] sd 0:0:0:0: Device offlined - not ready after error recovery
[ 1233.325916] sd 0:0:0:1: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 1233.325936] sd 0:0:0:1: [sdb] tag#0 CDB: Write(10) 2a 00 02 22 1b 80 00 04 00 00
[ 1233.325956] blk_update_request: I/O error, dev sdb, sector 35789696 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
[ 1233.337614] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.337994] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1233.342979] blk_update_request: I/O error, dev sdb, sector 35790720 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
[ 1233.343090] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.359520] blk_update_request: I/O error, dev sdb, sector 35791744 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
[ 1233.361409] blk_update_request: I/O error, dev sda, sector 35851008 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0
[ 1233.371190] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.386742] blk_update_request: I/O error, dev sdb, sector 35792768 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
[ 1233.389400] blk_update_request: I/O error, dev sda, sector 35854080 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0
[ 1233.398153] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.408738] blk_update_request: I/O error, dev sda, sector 35852032 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0
[ 1233.413971] blk_update_request: I/O error, dev sdb, sector 35793792 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
[ 1233.414890] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.424834] blk_update_request: I/O error, dev sda, sector 35853056 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0
[ 1233.435891] blk_update_request: I/O error, dev sdb, sector 35794816 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
[ 1233.435983] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.468484] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.473859] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.479257] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.484650] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.490042] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.495465] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.500855] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.506232] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.511623] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.517019] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.522413] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.527821] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.533210] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.538599] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.544001] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.549409] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.554785] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.560170] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.565569] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.570960] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.576359] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.581740] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.587121] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.592511] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.597907] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.603311] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.608709] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.614088] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.619470] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.624859] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.630265] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.635665] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.641058] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.646451] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.651842] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.657232] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.662622] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.665462] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 1233.668006] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.673413] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.678793] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.684157] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.689548] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.694919] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.700327] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.705744] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.711145] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.725541] sd 0:0:0:1: rejecting I/O to offline device
[ 1233.731665] md: md127: recovery interrupted.
[ 1233.731783] md: super_written gets error=10
[ 1233.731953] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[ 1233.736154] md/raid10:md127: Disk failure on sdb1, disabling device.
md/raid10:md127: Operation continuing on 3 devices.
[ 1233.736203] md: super_written gets error=10
[ 1233.989499] sd 0:0:0:1: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 1234.404079] md: super_written gets error=10
[ 1234.656909] md: super_written gets error=10
[ 1234.822479] md: super_written gets error=10
[ 1234.869288] md: super_written gets error=10
[ 1234.916176] md: super_written gets error=10
[ 1234.963126] md: super_written gets error=10
[ 1235.010003] md: super_written gets error=10
[ 1235.056971] md: super_written gets error=10
[ 1235.103813] md: super_written gets error=10
[ 1235.150834] md: super_written gets error=10
[ 1235.197646] md: super_written gets error=10
[ 1235.244608] md: super_written gets error=10
[ 1235.291457] md: super_written gets error=10
[ 1235.338284] md: super_written gets error=10
[ 1235.385156] md: super_written gets error=10
[ 1235.432102] md: super_written gets error=10
[ 1235.478967] md: super_written gets error=10
[ 1235.525876] md: super_written gets error=10
[ 1235.572746] md: super_written gets error=10
[ 1235.619691] md: super_written gets error=10
[ 1235.666506] md: super_written gets error=10
[ 1235.713454] md: super_written gets error=10
[ 1235.782725] md: super_written gets error=10
[ 1235.829576] md: super_written gets error=10
[ 1235.876401] md: super_written gets error=10
needless to say it gets worse from there on out.
It should only be rebuilding one of the sets… and no matter what I do to rebuild (I’ve tried lowering the maximum recovery sync to 100000 from the default 200000 via sysctl; trying new disks (been through about 6 now, 4 of which were bought brand new), it’s getting annoying that the jmicron controller appears to be rebooting!
I’m on firmware 8.1.3.6.
➜ jms561-fw-update sudo ./JMS561FwUpdate -d /dev/sda -v
Bridge Firmware Version: v8.1.3.6
➜ jms561-fw-update sudo ./JMS561FwUpdate -d /dev/sdc -v
Bridge Firmware Version: v8.1.3.6
I’m at a loss for what to do to try and recover this situation.