After a while the nvme just started to die

NVMe (Transcend TS2TMTE220S 2TB) worked great, provided excellent read and write speeds, I set up mysql replication and redis for the live server to have data backup and a simple environment for web application development. However, after a while nvme started to die. I installed the OS twice and at first it worked fine, but eventually it got bad.

I once made a case to fit an nvme and a cooling heatsink in it. And in the metal case the nvme did not fit with a separate heatsink at all. And without the heatsink the temperature of the nvme was higher than it should be.

Now I have to install the ubuntu image again through balenaEtcher, so it is unlikely that I will be able to access any logs…

Any idea why this might be happening?

PS: my CPU cooler is connected correctly, but it still does not work. Does anyone have the same problem?
UPD: fan works great with https://github.com/pymumu/fan-control-rock5b

Not an answer to your question, but this kind of issue is one thing that worries me. I am doing a massive build on my Rock 5B.

Maybe you have been hit by some kind of undervoltage or a PD issue corrupting the fs.
Can you please give more info about your hardware and software?

  • Board version?
  • Kernel version?
  • OS version (and distro)?
  • How you power the board and PD specification.
  • Was your Nvme 2TB a new one?

If you can still boot (or ssh to the board), what is the output:
sudo df -lh

Just an idea, you can try to clear the SPI and boot from SD card, so you would analyze the Nvme fs, maybe is just corrupt, no need for a full re-install.

1 Like

Honestly, are you sure it’s the rock and not the transcend SSD? The only ssds that have prematurely failed on me in the last 15 years were transcends. Just last month a 5 month old transcend 1tb drive started throwing read errors at only 7 tbw.

Other than that it might be an issue with power as in the SSD might not get enough juice during writes

Sure. I was also using a 1 Gbps internet connection and had mysql and redis production replication continuously.

  • v 1.42 2022.08.29
  • 5.10.110-37-rockchip
  • Ubuntu 22.04.2 LTS
  • Type-C PD 30W
  • Transcend TS2TMTE220S 2TB

It was like the following:

root@rock-5b:/tmp# df -lh

Filesystem Size Used Avail Use% Mounted on
tmpfs 1.6G 904K 1.6G 1% /run
/dev/nvme0n1p2 1.9T 34G 1.8T 2% /
tmpfs 7.7G 0 7.7G 0% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 1.6G 4.0K 1.6G 1% /run/user/1001
/dev/nvme0n1p1 511M 141M 371M 28% /boot
tmpfs 1.0M 0 1.0M 0% /var/snap/lxd/common/ns

You’re probably right.
Well, then I will try it with SEDC1000BM8/960G
Perhaps you can suggest reliable alternatives for 2TB?

I’ve been using a crucial p3+ 4tb on my rock 5. Given the SSD is mostly read(it’s the media library for jellyfin, the rock boots off an emmc module), but it has done a good job, is among the cheapest brand ssds on the market and the speed is fantastic(when I first inserted it into the rock I dd’d 480 gig from /dev/zero unbuffered, came out at 3.5gb/sec)

Just as a drive-by observation: why is your nvme not affixed with a screw? If anything, that’s kinda risky…

Alas, the 5b board itself is not designed to use the nvme with a heat sink - it just does not fit tightly. However, the first time this failure occurred, the heat sink was not used, and the nvme was screwed to the board as it should be.

Since SSDs and then NVMe drives have been a thing, I’ve been adding noatime,nodiratime to the relevant partitions in /etc/fstab

This prevents times from being written for every file or directory access.

Also as physical RAM has increased, I no longer create a swap partition.

For something like mysql replication, you may want to exclude logging tables or other tables that are continually updating.

Speaking of logging, you may wish to have logs on a ramdisk rather than writing to the NVMe.

Alternately, the highly transactional data could be written to SD card, whereas the NVMe drive houses the OS and less mutable data.

You may also find the smartctl or nvme commands offer some deeper insights into the status of your drive.

I add fstrim to /etc/cron.weekly for drives that support/need it.

As for the CPU fan, I initially plugged into 3V/gnd but after putting the rock5 in a case, I moved to 5V/gnd. For my use case, the fan is quiet enough to run all the time and I’d rather have the cooling than use a fraction of a W less power.

You may wish to monitor and compare temps with the fan sucking vs blowing to see which is better for your env.

Hope this helps!

using gen4 or gen3 ?