Rock 5 B SSD Corruption

Hi… Please Help…

HARDWARE
16GB Rock 5 B
WD_Black 2TB SN850X NVME M.2 SSD

PSU
When I first got the Rock, I was unable to reliably boot with the OKdo 36W PSU, so I bought a “dumb” 3V to 24V 5A adjustable PSU and currently have it set to ~15V

PROBLEM

  1. When the SSD is installed into the Rock 5 B board and I am running the Nethermind Ethereum software, the database on the SSD consistently becomes corrupt, and I believe it’s related to the Rock’s ability to correctly write to the SSD
  2. Ideally I would run the SSD on board but out of desperation I bought an NVME M.2 SSD USB enclosure to see if it would work that way, but I can’t even mount it and I get an error in the dmesg logs. It works fine when connected to a Raspberry Pi 4

OTHER DETAILS
I have reason to believe that the SSD is fine as I have two of the same model and I have the same issues with both of them

I also have reason to believe that the Rock 5 B board itself is not defective as I have returned and replaced it once

Updating the firmware on the SSD did not help

This is extremely frustrating… Any ideas?

Thank you

md5sum /dev/mtdblock0

$ sudo md5sum /dev/mtdblock0
cf53d06b3bfaaf51bbb6f25896da4b3a /dev/mtdblock0

PROBLEM 1 - Nethermind logs snippet

16 Jul 07:38:28 | Corrupted DB detected on path /ethclient/nethermind/nethermind_db/holesky/state/0. Please restart Nethermind to attempt repair.
16 Jul 07:38:28 | Error when handling response RocksDbSharp.RocksDbException: Corruption: block checksum mismatch: stored = 1863351859, computed = 1623628796, type = 4 in /ethclient/nethermind/nethermind_db/holesky/state/0/000174.sst offset 486728 size 32295
at Nethermind.Db.Rocks.DbOnTheRocks.GetWithColumnFamily(ReadOnlySpan1 key, ColumnFamilyHandle cf, IteratorManager iteratorManager, ReadFlags flags) in /src/Nethermind/Nethermind.Db.Rocks/DbOnTheRocks.cs:line 778 at Nethermind.Trie.NodeStorage.Get(Hash256 address, TreePath& path, ValueHash256& keccak, ReadFlags readFlags) in /src/Nethermind/Nethermind.Trie/NodeStorage.cs:line 115 at Nethermind.Trie.Pruning.TrieStore.IsPersisted(Hash256 address, TreePath& path, ValueHash256& keccak) in /src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs:line 601 at Nethermind.Synchronization.SnapSync.SnapProviderHelper.IsChildPersisted(TrieNode node, TreePath& nodePath, Int32 childIndex, IScopedTrieStore store) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProviderHelper.cs:line 317 at Nethermind.Synchronization.SnapSync.SnapProviderHelper.StitchBoundaries(List1 sortedBoundaryList, IScopedTrieStore store) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProviderHelper.cs:line 302
at Nethermind.Synchronization.SnapSync.SnapProviderHelper.AddAccountRange(StateTree tree, Int64 blockNumber, ValueHash256& expectedRootHash, ValueHash256& startingHash, ValueHash256& limitHash, IReadOnlyList1 accounts, IReadOnlyList1 proofs) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProviderHelper.cs:line 77
at Nethermind.Synchronization.SnapSync.SnapProvider.AddAccountRange(Int64 blockNumber, ValueHash256& expectedRootHash, ValueHash256& startingHash, IReadOnlyList1 accounts, IReadOnlyList1 proofs, Nullable1& hashLimit) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProvider.cs:line 99 at Nethermind.Synchronization.SnapSync.SnapProvider.AddAccountRange(AccountRange request, AccountsAndProofs response) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProvider.cs:line 62 at Nethermind.Synchronization.SnapSync.SnapSyncFeed.HandleResponse(SnapSyncBatch batch, PeerInfo peer) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapSyncFeed.cs:line 63 at Nethermind.Synchronization.ParallelSync.SyncDispatcher1.DoHandleResponse(T request, PeerInfo allocatedPeer) in /src/Nethermind/Nethermind.Synchronization/ParallelSync/SyncDispatcher.cs:line 186
16 Jul 07:38:30 | Corrupted DB detected on path /ethclient/nethermind/nethermind_db/holesky/state/0. Please restart Nethermind to attempt repair.
16 Jul 07:38:30 | Error when handling response RocksDbSharp.RocksDbException: Corruption: block checksum mismatch: stored = 2224436863, computed = 2896197617, type = 4 in /ethclient/nethermind/nethermind_db/holesky/state/0/000193.sst offset 227020 size 32755
at Nethermind.Db.Rocks.DbOnTheRocks.GetWithColumnFamily(ReadOnlySpan1 key, ColumnFamilyHandle cf, IteratorManager iteratorManager, ReadFlags flags) in /src/Nethermind/Nethermind.Db.Rocks/DbOnTheRocks.cs:line 778 at Nethermind.Trie.NodeStorage.Get(Hash256 address, TreePath& path, ValueHash256& keccak, ReadFlags readFlags) in /src/Nethermind/Nethermind.Trie/NodeStorage.cs:line 115 at Nethermind.Trie.Pruning.TrieStore.IsPersisted(Hash256 address, TreePath& path, ValueHash256& keccak) in /src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs:line 601 at Nethermind.Synchronization.SnapSync.SnapProviderHelper.IsChildPersisted(TrieNode node, TreePath& nodePath, Int32 childIndex, IScopedTrieStore store) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProviderHelper.cs:line 317 at Nethermind.Synchronization.SnapSync.SnapProviderHelper.StitchBoundaries(List1 sortedBoundaryList, IScopedTrieStore store) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProviderHelper.cs:line 302
at Nethermind.Synchronization.SnapSync.SnapProviderHelper.AddAccountRange(StateTree tree, Int64 blockNumber, ValueHash256& expectedRootHash, ValueHash256& startingHash, ValueHash256& limitHash, IReadOnlyList1 accounts, IReadOnlyList1 proofs) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProviderHelper.cs:line 77
at Nethermind.Synchronization.SnapSync.SnapProvider.AddAccountRange(Int64 blockNumber, ValueHash256& expectedRootHash, ValueHash256& startingHash, IReadOnlyList1 accounts, IReadOnlyList1 proofs, Nullable1& hashLimit) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProvider.cs:line 99 at Nethermind.Synchronization.SnapSync.SnapProvider.AddAccountRange(AccountRange request, AccountsAndProofs response) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapProvider.cs:line 62 at Nethermind.Synchronization.SnapSync.SnapSyncFeed.HandleResponse(SnapSyncBatch batch, PeerInfo peer) in /src/Nethermind/Nethermind.Synchronization/SnapSync/SnapSyncFeed.cs:line 63 at Nethermind.Synchronization.ParallelSync.SyncDispatcher1.DoHandleResponse(T request, PeerInfo allocatedPeer) in /src/Nethermind/Nethermind.Synchronization/ParallelSync/SyncDispatcher.cs:line 186

PROBLEM 2 - dmesg snippet

[ 35.433743] usb 2-1: new high-speed USB device number 2 using ehci-platform
[ 35.590898] usb 2-1: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=20.01
[ 35.590928] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 35.590949] usb 2-1: Product: Ugreen Storage Device
[ 35.590968] usb 2-1: Manufacturer: Ugreen
[ 35.590987] usb 2-1: SerialNumber: 0129380513BA
[ 35.684351] usb-storage 2-1:1.0: USB Mass Storage device detected
[ 35.689237] scsi host0: usb-storage 2-1:1.0
[ 35.689576] usbcore: registered new interface driver usb-storage
[ 35.693643] usbcore: registered new interface driver uas
[ 36.705892] scsi 0:0:0:0: Direct-Access Realtek RTL9210 1.00 PQ: 0 ANSI: 6
[ 36.715602] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x00 driverbyte=0x08
[ 36.715615] sd 0:0:0:0: [sda] Sense Key : 0x5 [current]
[ 36.715621] sd 0:0:0:0: [sda] ASC=0x24 ASCQ=0x0
[ 36.715633] sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
[ 36.715638] sd 0:0:0:0: [sda] 0-byte physical blocks
[ 36.716340] sd 0:0:0:0: [sda] Write Protect is off
[ 36.716347] sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
[ 36.720608] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
[ 36.752121] sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x00 driverbyte=0x08
[ 36.752137] sd 0:0:0:0: [sda] Sense Key : 0x5 [current]
[ 36.752143] sd 0:0:0:0: [sda] ASC=0x24 ASCQ=0x0
[ 36.759654] sd 0:0:0:0: [sda] Attached SCSI disk

One important piece of info is missing: kernel version (uname -ra)
My board has 16GB with 1TB SSD and I do a lot of builds, there has never been any file corruption.

A few notes that might help:

  • disable Wifi completely if any
  • i don’t recommend an adjustable PSU, i use 4A/5V fixed voltage, maybe on edge but it work always.
  • reading the forum you can notice some SSD draw more power than others, mine is a cheap and slow 1TB goldenfir, buy one of these and try with this one if you can, or get a 512MB.

That seems strange, I run an SSD on the 5B without issue. I use Radxa’s Debian OS image.

You need to remember that the OS is also writing to the SSD (ie: logs) so if they are not getting corrupted then that rules out a general problem with SSD/OS.

There maybe some specific scenario with RocksDB which is getting corrupted. You could try running the setup from an SD card and see what result you get from that. You could also try a different SSD (I use Samsung 980 Pro and 970 EVO).

I have mainly been using the Radxa rock-5b_debian_bullseye_cli_b39 image, but I have also tried an Armbian image and I get the same issues

I currently have the Radxa image installed and uname -r gives 5.10.110-38-rockchip

I’ve not installed WiFi

Yeah… This is my best guess at the moment… The SN850X just uses too much power :thinking:

You could try 3djelly’s tip (and remove the SSD) , but if i recall correctly there was some patch floating around to increase SD card speed which had some issues, file corruption… (not really sure about this).

You could further try isolating the issue by running a standalone RocksDB stress test.


You can then compare that to a large file copy (many GB’s) on the OS and run a md5sum on source and destination files to check for corruption.

You posted dmesg about usb device connected, nothing unusual, do You have the one about corrupted data with nvme? There should be some traces about bad i/o.

This is always really strange for me. Where all users get this conclusion that particular board is faulty? Software bugs always happens, hardware bugs are rather obvious. Imagine all those users from recent bluescreen incident replacing hardware because it was faulty :slight_smile:

I would try different kernel, distro, filesystem - will that problem reproduce?
maybe try to limit speed of nvme, pay some attention to power requirements,
for this particular nvme there were many reports on web about various problems, with all platforms, some may be already fixed in firmware, I would test something different
with usb its limited, also different protocol is used.

BTW:
works on raspberry pi 4 - no, it’s not same - pi4 don’t have pcie support and it’s usb is much slower than it’s standard. So it’s different protocol, different and much more limited speed. You could connect Your nvme on top m.2 E slot via some adapter, it should work (8x slower becase it’s 1x pcie 2.0 instead of 4x pcie 3.0) or also via usb enclosure, but it’s not the solution for sure.