Penta SATA HAT limiting speed?

Hello.
Rock 3a
Penta SATA HAT
SATA SSD Crucial MX500 [CT1000MX500SSD1] 1000 GB
Only one drive connected.

I noticed that the write speed to SSD through Penta SATA HAT does not exceed 200MB / s. On the desktop, the write speed is much higher, 450MB / s.
Is the SATA controller limiting the speed?
This is not a problem for me, I just want to understand why the speed is lower.

How about the read speed? I think pcie on 3A is not the bottleneck.

Is the file system for this disk NTFS?

I apologize for not replying right away.

Reading speed through ‘hdparm’ program:

# sudo hdparm -Tt /dev/sda1
/dev/sda1:
 Timing cached reads:   1416 MB in  2.00 seconds = 707.67 MB/sec
 Timing buffered disk reads: 972 MB in  3.01 seconds = 323.44 MB/sec


# sudo hdparm -Tt /dev/sda1
/dev/sda1:
 Timing cached reads:   1732 MB in  2.00 seconds = 866.20 MB/sec
 Timing buffered disk reads: 994 MB in  3.00 seconds = 330.79 MB/sec


# sudo hdparm -Tt /dev/sda1
/dev/sda1:
 Timing cached reads:   1296 MB in  2.00 seconds = 647.73 MB/sec
 Timing buffered disk reads: 934 MB in  3.00 seconds = 311.22 MB/sec

=========

Reading speed through ‘dd’ program:

# sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
# dd if=./tempfile.1 of=/dev/null bs=4k
262144+0 записей получено
262144+0 записей отправлено
1073741824 байт (1,1 GB, 1,0 GiB) скопирован, 4,92929 s, 218 MB/s


# sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
# dd if=./tempfile.1 of=/dev/null bs=8k
131072+0 записей получено
131072+0 записей отправлено
1073741824 байт (1,1 GB, 1,0 GiB) скопирован, 4,98249 s, 216 MB/s


# sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
# dd if=./tempfile.1 of=/dev/null bs=1M
1024+0 записей получено
1024+0 записей отправлено
1073741824 байт (1,1 GB, 1,0 GiB) скопирован, 3,18422 s, 337 MB/s


# sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
# dd if=./tempfile.1 of=/dev/null bs=512K
2048+0 записей получено
2048+0 записей отправлено
1073741824 байт (1,1 GB, 1,0 GiB) скопирован, 3,35237 s, 320 MB/s

==========

Write speed via ‘dd’:

# sync; dd if=/dev/zero of=tempfile.1 bs=1M count=1024; sync 
1024+0 записей получено
1024+0 записей отправлено
1073741824 байт (1,1 GB, 1,0 GiB) скопирован, 6,39575 s, 168 MB/s


# sync; dd if=/dev/zero of=tempfile.2 bs=1M count=1024; sync 
1024+0 записей получено
1024+0 записей отправлено
1073741824 байт (1,1 GB, 1,0 GiB) скопирован, 6,20996 s, 173 MB/s


# sync; dd if=/dev/zero of=tempfile.2 bs=1M count=1024; sync 
1024+0 записей получено
1024+0 записей отправлено
1073741824 байт (1,1 GB, 1,0 GiB) скопирован, 6,20996 s, 173 MB/s

ext4

How about set to performance mode and test again:

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance

# sudo hdparm -Tt /dev/sda1
/dev/sda1:
 Timing cached reads:   2020 MB in  2.00 seconds = 1009.91 MB/sec
 Timing buffered disk reads: 1088 MB in  3.00 seconds = 362.60 MB/sec



# sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
# dd if=./tempfile.1 of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.08253 s, 348 MB/s

# sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
# dd if=./tempfile.2 of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.95699 s, 363 MB/s



# sync; dd if=/dev/zero of=tempfile.4 bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.5536 s, 164 MB/s

# sync; dd if=/dev/zero of=tempfile.5 bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.14195 s, 175 MB/s

# sync; dd if=/dev/zero of=tempfile.6 bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.05292 s, 177 MB/s

If Rock 3A’s kernel is also Rockchip’s BSP defaulting to ondemand then IMO the better advice is to properly adjust ondemand governor.

350 MB/s with 1MB blocksize is ok. What about doing this (as root)?

echo 1 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/io_is_busy
echo 25 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_down_factor
echo 200000 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_rate
apt install iozone3

Then changing to the mountpoint and

iozone -e -I -a -s 1000M -r 1024k -r 16384k -i 0 -i 1

With 16MB block size you should see close to 400 MB/sec.

Thx.

# echo 1 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/io_is_busy
zsh: no such file or directory: /sys/devices/system/cpu/cpufreq/policy0/ondemand/io_is_busy
# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance

# iozone -e -I -a -s 1000M -r 1024k -r 16384k -i 0 -i 1
        Iozone: Performance Test of File I/O
                Version $Revision: 3.489 $
                Compiled for 64 bit mode.
                Build: linux 

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                     Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                     Vangel Bojaxhi, Ben England, Vikentsi Lapa,
                     Alexey Skidanov, Sudhir Kumar.

        Run began: Mon Jul 11 21:31:58 2022

        Include fsync in write timing
        O_DIRECT feature enabled
        Auto Mode
        File size set to 1024000 kB
        Record Size 1024 kB
        Record Size 16384 kB
        Command line used: iozone -e -I -a -s 1000M -r 1024k -r 16384k -i 0 -i 1
        Output is in kBytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 kBytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
         1024000    1024   311079   333960   290405   293185                                                                                  
         1024000   16384   484045   486148   458419   461688                                                                                  

iozone test complete.

What parameter to look at? Confused.
311079 and 484045?

This might only work when ondemand governor is chosen:

echo ondemand | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

As for the numbers. What sequential transfer speeds are displayed always depends on the block size tested. With 1MB block size ~310 MB/s were reported while with 16MB it’s 485/460 write/read which is a bit too good to be true for a SATA attached MX500.

# echo ondemand | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
ondemand

# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
ondemand
ondemand
ondemand
ondemand

# ls -la /sys/devices/system/cpu/cpufreq/policy0
total 0
drwxr-xr-x 3 root root    0 июн 28 13:17 .
drwxr-xr-x 4 root root    0 июн 28 13:17 ..
-r--r--r-- 1 root root 4096 июл 11 21:33 affected_cpus
-r--r--r-- 1 root root 4096 июн 28 13:17 cpuinfo_cur_freq
-r--r--r-- 1 root root 4096 июн 28 13:17 cpuinfo_max_freq
-r--r--r-- 1 root root 4096 июн 28 13:17 cpuinfo_min_freq
-r--r--r-- 1 root root 4096 июл 11 21:33 cpuinfo_transition_latency
-r--r--r-- 1 root root 4096 июл 11 21:33 related_cpus
-r--r--r-- 1 root root 4096 июл 11 21:33 scaling_available_frequencies
-r--r--r-- 1 root root 4096 июн 28 13:17 scaling_available_governors
-r--r--r-- 1 root root 4096 июл 11 21:33 scaling_cur_freq
-r--r--r-- 1 root root 4096 июл 11 21:33 scaling_driver
-rw-r--r-- 1 root root 4096 июл 11 23:04 scaling_governor
-rw-r--r-- 1 root root 4096 июн 28 13:17 scaling_max_freq
-rw-r--r-- 1 root root 4096 июн 28 13:17 scaling_min_freq
-rw-r--r-- 1 root root 4096 июл 11 21:33 scaling_setspeed
drwxr-xr-x 2 root root    0 июн 28 13:17 stats

Well, no idea which kernel version. Maybe the respective ondemand parameters live below. find /sys -name io_is_busy will at least tell.

Anyway: my point was more in @jack’s direction: instead of recommending to switch to performance all the time the images should be better tuned for normal operation just like explained here with fresh Rock 5B. performance results in higher consumption and temperatures while all that’s needed is tweaking ondemand for I/O loads.

That’s what I’ve done in Armbian years ago (combined with some other tweaks to prevent storage hassles –– though nobody maintains this any more) and that’s the reason Armbian makes more fun with everything related to server tasks.

1 Like

Armbian 22.08.0-trunk.0011 Jammy with bleeding edge Linux 5.18.3-rk35xx

# uname -a
Linux rock-3a 5.18.3-rk35xx #trunk.0011 SMP PREEMPT Thu Jun 9 20:34:08 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
# ls -la /sys/devices/system/cpu/cpufreq/ondemand
total 0
drwxr-xr-x 2 root root    0 июл 12 15:18 .
drwxr-xr-x 4 root root    0 июн 28 13:17 ..
-rw-r--r-- 1 root root 4096 июл 12 15:18 ignore_nice_load
-rw-r--r-- 1 root root 4096 июл 12 15:18 io_is_busy
-rw-r--r-- 1 root root 4096 июл 12 15:18 powersave_bias
-rw-r--r-- 1 root root 4096 июл 12 15:18 sampling_down_factor
-rw-r--r-- 1 root root 4096 июл 12 15:18 sampling_rate
-rw-r--r-- 1 root root 4096 июл 12 15:18 up_threshold
# iozone -e -I -a -s 1000M -r 1024k -r 16384k -i 0 -i 1

        Iozone: Performance Test of File I/O
                Version $Revision: 3.489 $
                Compiled for 64 bit mode.
                Build: linux 

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                     Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                     Vangel Bojaxhi, Ben England, Vikentsi Lapa,
                     Alexey Skidanov, Sudhir Kumar.

        Run began: Tue Jul 12 15:23:33 2022

        Include fsync in write timing
        O_DIRECT feature enabled
        Auto Mode
        File size set to 1024000 kB
        Record Size 1024 kB
        Record Size 16384 kB
        Command line used: iozone -e -I -a -s 1000M -r 1024k -r 16384k -i 0 -i 1
        Output is in kBytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 kBytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
         1024000    1024   402056   410952   371914   372100                                                                                  
         1024000   16384   465111   493754   482965   483448                                                                                  

iozone test complete.

Alright, it’s not depending on kernel version but on existence of more than one CPU cluster or differing coufreq policies. So for RK3568 it should look like this:

echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
echo 200000 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate

@lanefu: this is obviously wrong since S922X/A311D don’t have policy 0 and 4 but 0 and 2. RK3588 is 0, 4 and 6 while by accident it still fits for A311D2.

Thank you. Yes, I did. Previous run commands iozone -e -I -a -s 1000M -r 1024k -r 16384k -i 0 -i 1 already with new parameters.
I correctly understood that the write speed is 465111 and 493754 kB/s?

Take averaged values. It’s 480 MB/s for both write and read which is IMO a bit too high to be true. On the other hand JMS585 is a modern controller attached via 2 PCIe lanes so the numbers could be valid given that SATA is 6Gbps :slight_smile:

Anyway, these are benchmarks. For real-world performance these ondemand tweaks need to survive reboots. In Armbian those tweaks are set by accident on this board so you should be fine. With Radxa’s images it depends on Radxa wanting to add some ‘user friendly’ services or the user in question having to fiddle around in /etc/rc.local himself to get desired performance…

1 Like

Rock3a: M-key: PCIe3.0 x 2

PCI Express 1.0
	256 MB/s on 1 line
	0.50 GB/s on 2 lines

PCI Express 2.0
	Increased throughput: 500 MB/s single line bandwidth, or 5 GT/s (Gigatransactions/s).
	512 MB/s on 1 line
	1.0 GB/s on 2 lines

PCI Express 2.1
	According to physical characteristics (speed, connector) corresponds to 2.0

PCI Express 3.0
	1008.246 MB/s on 1 line
	1,969 GB/s on 2 lanes

SATA Revision 1.0 (up to 1.5 Gb/s) - 150 MB/s
SATA Revision 2.0 (up to 3Gb/s) - 300MB/s
SATA Revision 3.0 (up to 6Gb/s) - 600MB/s

SSD: https://www.crucial.com/products/ssd/crucial-mx500-ssd
Sequential reads/writes up to 560/510MB/s

So everything fits. 480 MB/s is quite realistic speed.

And as important: The SATA HBA is also Gen3 x2! JMS585 as well as ASM1166 (6 SATA ports) are a new generation of controllers that do not suffer from internal bottlenecks as older PCIe SATA controllers.

In the SBC world we were used to see SATA performance around or below 400 MB/s:

EVO840 connected via an ASM1061 on a RK3399 board:

                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       1     7350     8238     8921     8925     5627     8167
      102400       4    26169    30599    33183    33313    22879    30418
      102400      16    85579    96564   102667   100994    76254    95562
      102400     512   312950   312802   309188   311725   303605   314411
      102400    1024   325669   324499   319510   321793   316649   324817
      102400   16384   373322   372417   385662   390987   390181   372922

Clearfog Pro with EVO840 connected to a native SATA port of the ARMADA 385:

                                                          random    random
          kB  reclen    write  rewrite    read    reread    read     write
      102400       4    69959   104711   113108   113920    40591    76737
      102400      16   166789   174407   172029   215341   123020   159731
      102400     512   286833   344871   353944   304479   263423   269149
      102400    1024   267743   269565   286443   361535   353766   351175
      102400   16384   347347   327456   353394   389994   425475   379687

EDIT: Just found it. That’s partially BS above since I personally measured with native SATA on the Clearfog Pro beyond 500 MB/s: https://forum.armbian.com/topic/1925-some-storage-benchmarks-on-sbcs/#comment-15265