NVME speed doesn't change on PCIE gen 2

rrzayev · October 10, 2022, 1:00pm

Hi,

I followed below instructions to increase NVME speed:
https://wiki.radxa.com/Rockpi4/install/NVME#Step_4_Enable_PCIe_Gen2_mode_to_get_max_speed
https://wiki.radxa.com/Rockpi4/hardware/devtree_overlays
https://wiki.radxa.com/Rockpi4/dev/common-interface-with-kernel-5.10

But in the end the changes has no effect at all. I am using Ubuntu Server. System runs on NVME(no SD/eMMC).

Please see the screenshots below. As you can see, benchmark speed before and after the changes is the same.

Initial benchmark with PCIE1 and the steps to change to PCIE2

Modification of /boot/hw_intfc.conf

Modification of /boot/extlinux/extlinux.conf

Benchmark after changes & reboot

I also tested it on Debian couple months ago, and as far as I remember there was no effect there as well.

Is it normal?

tkaiser · October 10, 2022, 1:46pm

To check which PCIe link width/speed has been negotiated you need to run lspci -vv.

rrzayev · October 11, 2022, 8:30am

Thank you very much. I have run the sudo lspci -vv command, saved the outputs (when PCIE2 enabled/disabled from hw_init.conf). I used diff command to compare them and nothing changed. As I have understood, they reported speed of 5GT/s. See below:

When I checked in wikipedia about PCIE speeds, it looks like 5GT/s correspond to PCIE2. Does it mean that Rock Pi 4 uses PCIE2 as default? It would be strange given that they have instructions/settings to enable pcie2.

Furthermore, my speeds reported by hdparm are ~450MB/s. In following blog post, it is said: “After following his instructions of turning on PCIe gen 2 mode, I re-ran the benchmark. I got a whopping 1.2GB/s read speeds and a staggering 1.4GB/s write speeds”

I have less than half of it.

tkaiser · October 11, 2022, 8:47am

That confirms at the PCIe layer being everything perfect since RK3399 is limited to Gen2 x4. As for your low hdparm numbers that’s then an indication that the OS image you’re using ships with crappy settings (something pretty common in the SBC world).

hdparm is the wrong tool anyway (since using a laughably small block size from last century). And most probably there’s something wrong with cpufreq scaling (keeping the ARM cores on their lowest clock which results in poor I/O performance regardless of PCIe link speed). What does this output:

systemctl status ondemand

rrzayev · October 11, 2022, 9:08am

I am using official Ubuntu Server OS from Radxa downloads page.

This is the result of systemctl status ondemand:

tkaiser · October 11, 2022, 9:43am

Well, Ubuntu on random ARM SBC is pretty much broken performance-wise, at least it ends up with crappy I/O performance by design

Radxa people know but don’t care… see for example the comments here. And see these observations that are now 6 years old.

As already said the issue is most likely CPU cores remaining at lowest clockspeeds when doing I/O. The recipe to improve this on ARM is known since a long time but gets ignored.

If you follow these simple steps then I bet your numbers will be at least twice as high but at the price of a higher idle consumption. The only real way to deal correctly with this is using ondemand with io_is_busy and friends.

rrzayev · October 11, 2022, 11:03am

Thank you very much for detailed information. Though unfortunately I don’t have much knowledge in this domain(about “ondemand”).

I followed the steps in the link, but there was no impact at all(tested with hdparm -Tt /dev/nvmen0n1).

Currently, I am using Linux 5, I previously tested it with Linux 4 on the same OS, also tested on Debian, on different SSDs, and the speed doesn’t change.

tkaiser · October 11, 2022, 11:18am

Your /etc/default/cpufrequtils should look like this

ENABLE=true
GOVERNOR=performance

And without a reboot a systemctl restart cpufrequtils is needed afterwards to get the clockspeeds up.

rrzayev · October 11, 2022, 11:19am

Ok, I think I found the root cause of problem. The problem is in methodology of testing the speed. As you have said, hdparm is not suitable.

I tested it with “Disks” utility, and there is 2x speed increase between two hw_init.conf settings.

rrzayev · October 11, 2022, 11:21am

I will check with and without “governor-performance” scripts, and tell what difference it makes as well.

But in the end everything looks ok at this moment. The speeds are like in the blog post.

tkaiser · October 11, 2022, 12:05pm

This tool is known for generating random BS numbers, e.g. 840 MB/sec on a VIM4 that can’t physically exceed 500 MB/s. No idea what this benchmark is actually doing but it’s not testing storage performance. And BTW: this is pretty common with this type of benchmarks…

With RK3399, a good SSD and appropriate benchmark methodology really testing direct i/o with Gen2 speeds you should get numbers exceeding 1400 MiB/sec both reading and writing. RockPro64 example and Theobroma Systems’ RK3399-Q7 SOM example.

With fio and a bit of parallelism even exceeding 1500 MiB/s is possible.

Moral of the story: lspci confirmed that you’re now useing a higher PCIe link speed and also have no problems with link width (dusty/dirty contacts can result in just a x2 or even an x1 connection). And asides what all these benchmarks tell I guess in reality you’re after better real-world storage performance? And this is sabotaged by ignorance at Radxa’s side who ship with crappy settings

You can compare what’s happening when setting cpufreq governor to performance and powersave. The former brings the optimal storage performance while the latter is what you’ll get with real-world tasks since Radxa’s ondemand settings result in the CPU cores being most of the times on lowest cpufreq with I/O tasks. So even if a benchmark will tell you ‘everything is fine’ real-world storage performance is not

rrzayev · October 11, 2022, 12:20pm

Ok, I got it. Is there any specific command/tool that you would recommend for the benchmark (since 'disks", and “hdparm” are not suitable) ?

This is output of /etc/default/cpufrequtils:
GOVERNOR="performance"

As you can see, I don’t have line starting with “ENABLE”, and performance keyword is in double-quotes.

Is it normal?

tkaiser · October 11, 2022, 12:28pm

The contents should be fine. But as already said you need to either reboot or do a systemctl restart cpufrequtils. It’s important that the following reads performance:

cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor

(or if you want to see what’s happening with lowest clockspeeds, then replace performance with powersave).

For testing do a sudo apt install iozone3, then chdir to the mountpoint of your SSD and then simply:

iozone -e -I -a -s 1000M -r 16384k -i 0 -i 1

This will only test sequential read/write performance and the -I ensures that storage and not filesystem buffers or caches are tested.

rrzayev · October 11, 2022, 1:16pm

Thank you very much. I tested powersave vs performance options, results in MB/s are below:

SETTINGS *********** WRITE ** READ

pcie-1 powersave **** 510 ***** 481
pcie-1 performance ** 682 ***** 607
pcie-2 powersave **** 835 ***** 722
pcie-2 performance ** 1055 **** 974

Also, when switched to powersave, I noticed that system becomes pretty slow (in terms of response time).

rrzayev · October 11, 2022, 1:58pm

One final thing. I checked default speed where
cat /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor is ondemand
vs when we modify it to be performance.

According to the results, it seems that ondemand performance seems pretty good(doesn’t differ from performance).

Perhaps there is no need to deal with performance at all?