ROCK 5 in ITX form factor

Review update: Settings matter for performance

TL;DR: how Armbian guys managed to halve performance of one specific use case with the snip of a finger, w/o even noticing of course and w/o doing any evaluation prior or after.

@tkaiser I thought that DDR5 pretty much always had worse (or equivalent) latency to DDR4. I get it that you expected an improvement, but how does this compare to the same test in other systems, for example x86?

The is a clear benefit of DDR5 in your memcpy numbers.

Not just me :wink:

No idea, I’m not a hardware guy, just started to read about the latency stuff recently…

That doesn’t seem to end up with ‘software in general’ getting faster. Synthetic benchmarks are always a problem since unless it’s known to which use case they relate they just generate random numbers. And for example 7-zip (sbc-bench's main metric for some reasons) depending more on low latency than high bandwidth will generate slightly lower scores on RK3588 with LPDDR5 at just 5472 MT/s while my naive assumption would be that most other software will more benefit from higher memory bandwidth.

But if we look at Geekbench6 then LPDDR4X and LPDDR5 generate more or less same scores (don’t look at the total scores but the individual ones):

But Geekbench in itself is a problem as it uses memory access patterns that are not that typical (at least according to RPi guys that try to explain the low multi GB6 scores their RPi 5 achieves) and we don’t know which tests are sensitive to memory latency and which to bandwidth. I did a test weeks ago with Rock 5B comparing LPDDR4X clocked at 528 MHz and 2112 MHz so we know at least which individual tests are not affected by memory clock at all (on RK3588 – on other CPUs with different cache sizes this may differ): https://github.com/raspberrypi/firmware/issues/1876#issuecomment-2021505017

But this also is not sufficient to understand GB6 scores. It would need a system where CAS latency and memory clock can be adjusted in a wide range and even then the question remains: how do the generated scores translate to real world workloads.

I think at the moment we can conclude that the faster LPDDR5 clock does not result in a significantly faster system while in some areas where memory bandwidth is everything (video stuff for example) measurable improvements are possible.

Thanks for the detailed answer, my point was just that before we treat the no improvement in latency as an issue, we should ask the question whether it was not supposed to be this way. Memory latency did not change in a meaningful way since the DDR3 days because the higher clocks are offset by higher delays.

And whether this translates to a more responsive system is another issue altogether. I believe that higher memory throughput is likely to result in a bit higher GPU performance. But it depends on other factors too.

Currently trying to wrap my head around idle consumption which is clearly too high (I measured 4W with everything set to powersave, once a NVMe SSD is mounted, it breaks my heart to look at the SmartPower device).

Turns out ASPM is completely disabled (checked on latest b1 build and Armbian legacy):

root@rock-5-itx:/home/radxa# lspci -vvPPDq | awk '/ASPM/{print $0}' RS= | grep --color -P '(^[a-z0-9:./]+|:\sASPM (\w+)?( \w+)? ?((En|Dis)abled)?)';
0001:10:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
pcilib: sysfs_read_vpd: read failed: Input/output error
0001:10:00.0/11:00.0 SATA controller: ASMedia Technology Inc. ASM1164 Serial ATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
0003:30:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
0003:30:00.0/31:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
pcilib: sysfs_read_vpd: read failed: Input/output error
0004:40:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
0004:40:00.0/41:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+

root@rock-5-itx:/home/radxa# zgrep ASPM /proc/config.gz 
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_POWER_SUPERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
# CONFIG_PCIEASPM_EXT is not set

Checked on Rock 5B with 5.10 BSP kernel it’s the same.

Anyone an idea why? @RadxaYuntian @jack

Do we need to fiddle around with setpci to bring idle consumption into a sane range?

1 Like

Any idea when this board might be available to buy from Okdo in the UK?

Fairly sure this was the reason we changed. The post above showed a way to adjust this setting though.

Are you talking about power[super]save? You changed that in 2022 directly after reporting my findings back then. And adjusting the policy doesn’t change anything:

root@rock-5-itx:/home/radxa# echo performance > /sys/module/pcie_aspm/parameters/policy
root@rock-5-itx:/home/radxa# lspci -vvPPDq | awk '/ASPM/{print $0}' RS= | grep --color -P '(^[a-z0-9:./]+|:\sASPM (\w+)?( \w+)? ?((En|Dis)abled)?)';
0001:10:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
pcilib: sysfs_read_vpd: read failed: Input/output error
pcilib: sysfs_read_vpd: read failed: Input/output error
0001:10:00.0/11:00.0 SATA controller: ASMedia Technology Inc. ASM1164 Serial ATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
0003:30:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
0003:30:00.0/31:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
0004:40:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
0004:40:00.0/41:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+

In the thread mentioned ASPM status of all PCIe devices reads ASPM L1 Enabled.

Edit: adding pcie_aspm=force to kernel cmdline doesn’t change anything.

Yeah I vaguely remember we got that info from you but I just couldn’t find the link. However, the link I posted claims only a 100 mW power saving with different configs, so I’m not sure if we can reduce much of the idle power.

The 100mW was also measured by me back then but this was the result of switching ASPM policy between default and powersave on a system with a single RTL8125BG attached and otherwise no other PCIe lane occupied. ASPM was active but only policy changed.

Now we’re talking about ASPM being completely disabled and Rock 5 ITX having 4 PCIe lanes occupied by default (Gen3 x2 to the ASM1164 and 2 x Gen2 x1 to the RTL8125BG). And when adding a NVMe SSD and/or a Wi-Fi/BT card we’re talking about all PCIe 7 lanes being occupied. ASPM or not in such a situation might make a difference of several watt. And what I’m trying to do now is measuring exactly that :slight_smile:

But I can’t figure out where ASPM got disabled if even pcie_aspm=force doesn’t bring it back. That’s all I’m asking/searching for, the actual numbers which difference this makes are to be measured afterwards.

I’m seeing in your test results that the latency is indeed slightly worse than with LPDDR4X but that the read BW is overall 30-40% faster. I think that for most use cases the BW gains will offset the small losses in latency. It’s just those facing workloads made of lots of small reads that will be affected (e.g. linked lists and/or hash tables in memory), but even then the effect will be really limited. The smallest one can read from memory is a cache line (64B) and the tinymembench shows that it could be achieved in 140 ns previously and that it’s 150 ns now. It’s not exceptional but it’s not bad either. For example my odroid-h3 gives me 130ns where my rock5b shows 139ns. And at 4 parallel accesses, I’m seeing 165ns on both thanks to the 4 channels on RK3588. So we’re on par with low-end x86 here, definitely nothing to be ashamed of.

Sure, but expectations should be adjusted accordingly and advertising ‘10% faster memory’ should be stopped or at least put into perspective (I could imagine various use cases where the higher bandwidth really matters eg. video processing, GPU and maybe NPU).

Talking about ‘on par with low-end x86’ right now idle consumption is concerning since who wants to deal with all the ARM hassles when you can get a x86_64 board like an ODROID-H3 or H4+ with lower idle consumption?

And yes, I’m fully aware of the (way) higher consumption in load scenarios but with certain use cases this doesn’t matter much or at all.

Whatever happened the problem with ‘ASPM Disabled’ was clearly me since it works.

root@rock-5-itx:/home/radxa# . /usr/local/bin/sbc-bench.sh

root@rock-5-itx:/home/radxa# echo performance > /sys/module/pcie_aspm/parameters/policy

root@rock-5-itx:/home/radxa# CheckPCIe
  * ASMedia ASM1164 Serial ATA AHCI: Speed 8GT/s (ok), Width x2 (ok), driver in use: ahci, ASPM Disabled
  * Realtek RTL8125 2.5GbE: Speed 5GT/s (ok), Width x1 (ok), driver in use: r8125, ASPM Disabled
  * Realtek RTL8125 2.5GbE: Speed 5GT/s (ok), Width x1 (ok), driver in use: r8125, ASPM Disabled

root@rock-5-itx:/home/radxa# echo powersave > /sys/module/pcie_aspm/parameters/policy

root@rock-5-itx:/home/radxa# CheckPCIe
  * ASMedia ASM1164 Serial ATA AHCI: Speed 8GT/s (ok), Width x2 (ok), driver in use: ahci, ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
  * Realtek RTL8125 2.5GbE: Speed 5GT/s (ok), Width x1 (ok), driver in use: r8125, ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=150us PortTPowerOnTime=150us  PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns  T_PwrOn=10us 
  * Realtek RTL8125 2.5GbE: Speed 5GT/s (ok), Width x1 (ok), driver in use: r8125, ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=150us PortTPowerOnTime=150us  PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns  T_PwrOn=10us 

root@rock-5-itx:/home/radxa# echo powersupersave > /sys/module/pcie_aspm/parameters/policy

root@rock-5-itx:/home/radxa# CheckPCIe
  * ASMedia ASM1164 Serial ATA AHCI: Speed 8GT/s (ok), Width x2 (ok), driver in use: ahci, ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
  * Realtek RTL8125 2.5GbE: Speed 5GT/s (ok), Width x1 (ok), driver in use: r8125, ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=150us PortTPowerOnTime=150us  PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns  T_PwrOn=10us 
  * Realtek RTL8125 2.5GbE: Speed 5GT/s (ok), Width x1 (ok), driver in use: r8125, ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=150us PortTPowerOnTime=150us  PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns  T_PwrOn=10us 

default and performance ASPM policy show no difference in consumption (ASPM disabled in both cases) and the same is true for powersave vs. powersupersave. W/o having a look at L1 substates we’re talking about a consumption difference of only 700mW with onboard PCIe peripherals.

That’s exactly why I suggested a few months ago that high-end ARM-based SBCs are really challenging, because they can still be inferior in performance to some low-to-mid end x86 devices that come with more I/O capability, a very roughly similar termal envelope and equivalent prices. I really wanted to use an RK3588 1.5 year ago for my server, but the mainline status of this beast compared to a boring x86 chip was just apples to oranges.

I anticipate that in the forthcoming years we’ll see more ARM in the datacenter and more Intel on low-power devices because in the end the ones willing to make efforts are those saving money from it at a large scale, while end users just want something that runs out of the box, and when performance, consumption and price are comparable, what remains is ease of use.

Prices are now known and are competitive. I was about to publish an article about my Rock5B-based NAS, but this new board costs less as it can use the 20 pin ATX plug and includes the SATA controller… I guess I will buy it and compare both to make sure the ITX flavor confirms its rank :wink:

So, the onboard SATA controller seems to be an ASM1164. I guess 2 of the PCIe 3.0 lanes are used for the purpose and explain why the M.2M slot only offers 2 lanes… now I could still compare to my ASM1166 and JMB585 modules, but since only software RAID is possible, performance on regular HDDs should be very close.

Any advice on a cooler? Finding a simple low profile passive cooler is not so easy, most are proposed with a fan and are not suitable for transversal air flow when the fan is removed.
Maybe a 1U model? Then, it would be nice to cool both the SoC and the memory chips, so a flat base would be better.

I also received a sample not long ago. IMO any common 115x 1U low profile passive cooler should be overkill for the low power consumption of RK3588. Also, since SoC and RAM chips have different thicknesses, you may need to use thermal paste and thermal pads for them respectively. But considering that the RAM chip is LPDDR5, it doesn’t really matter even if it doesn’t touch the heat sink.

edit: This is the model I’m using. It will not conflict with the onboard RTC battery holder.

1 Like

@nyanmisaka Thank You for your feedback
Indeed, SoC, eMMc and memory chips only represent 10~15 W of heat to dissipate.
I think cutting a large and light LED aluminium heatsink like this one is a good and cheaper alternative. Then, using adhesive thermal pads of different heights might allow a sufficient sticky contact. Otherwise, I will have to find a refurbished LGA 115x 1U heatsink or simply wait for delivery of this heatsink

Final review update now that the device is ready to be bought and it’s confirmed that eMMC won’t be user-accessible by default: https://github.com/ThomasKaiser/Knowledge/commit/2489492d03db2961d6ac249cc6eefca4687ffa32

1 Like

thanks for details,
hopefully we will get sooner or later more juice out of ddr5 as we all expect. For now it’s just the number and some chance that higher capacities will be more affordable :slight_smile: