Yes, the SMB Multichannel would be extremely useful especially when you are using 10GbE for NAS.
I would also like to test the SMB direct RoCE / InfiniteBand. Had you tested that on the Rock5B ITX? Could that reduce the CPU usage?
News about the ROCK 5B Plus! ;)
Just like @tkaiser said - none pro network equipment consider 5G network speed,
but for sure there are some home consumers devices like those from asus and this requires such hardware.
2.5G jump is relatively cheap and easy. I completely agree that 5G is bad idea compared to 10G at same price and much more devices available. I don’t thing this will change anywhere in future.
Looks like the RPI-5 handles the 5GbE nic at ~3.3Gbit @ PCIe 2.0. Doesn’t seem worth it from 2.5GbE except if some PCIe 3 lanes are left over.
https://fixupx.com/will_whang/status/1797053374199959813
Thanks for that comparison, I still think that this particular speed will be skipped by most of us, it’s just way too expensive compared to 10G and just problematic in hardware (transceivers, switches, routers). 2.5G was cheap bonus for gigabit and easy upgrade
Any news on 5B+?
It’s been a while.
given that there are still strange/bad performance measurements on the DDR5 with the latest DDR SPL code on Rock5 ITX (see DRAM speed on ROCK 5 ITX), I think it’s prudent to let engineers try to sort everything out before deciding what to do/act on. If in the end they have to choose a different DRAM chip or slightly adjust routing to use the optimal timings, you’ll probably be very glad to have waited a little bit more for an optimal product.
from the source I know, there was plan to release Rock 5B Plus in Jul, so stay tune!
Completely agree,
We all hope that DDR5 can increase RAM speed compared to DDR4. For now all other boards released with this pair - RK3588 and DDR5 have none speed improvement, if final result is really based on RAM chip then simple software update will not help those.
With most recent v1.16 DRAM initialization BLOBs it’s even worse compared to LPDDR4X, just check Willy’s aforementioned tests with different BLOBs.
Confirmed, I even ran a quick synthetic test that was twice as slow to execute on the ITX than the rock5b. I need to see if I’m still having the code, in order to share it.
Edit: here it is:
rock@rock-5b:~$ time taskset -c 4 ./a.out 10
malloc...
fill...
scan...
real 6m2.604s
user 6m1.861s
sys 0m0.648s
willy@rock5-itx:~$ time taskset -c 4 ./a.out 10
malloc...
fill...
scan...
real 11m15.793s
user 11m14.447s
sys 0m1.101s
Both run at the same CPU frequency, and rock5-itx is twice slower due to v1.16’s DDR5 timings.
I’m going to put the tool on my github for easier testing, will update the URL shortly.
Sure. Sure. I don’t want anyone to rush the board to market, I was just curious because everything has gone pretty much silent for a longer while and I’ve been waiting for any news at all.
I’ve now added a reproducer to the ramspeed repo and updatd the procedure on the rock5-itx thread above.
for a real world ram use case try llamacpp from phoronix test suite or github
here on my tests it caps at 20GB/s or so i.e 5t/s on 7B Q4 models
Has the SPI been cleared? I saw that mismatching DDR and SPL blobs can disable the DMC leading to a performance impact.
The thing is that llama.cpp also depends on calculations so you never know if you’re CPU-bound or RAM-bound until you can test on another machine with different CPU and same RAM or different RAM and same CPU. I happen to have access to a 80-core Ampere Altra made of Neoverse-N1 cores that are exactly the same as Cortex-A76. Llama.cpp is quite fast there, and there are 6 DDR4 channels. But once limited to 4 cores, it’s basically the same speed as on the rk3588, showing that the 4 A76 there are delivering what they can and are a most important limiting factor than the DRAM speed.
@willy might answer this (I personally haven’t conducted any tests on RK3588 for quite some time)
That’s an interesting bit of information! With BSP kernel running looking below /sys/class/devfreq/dmc
(or checking the existence of this dir) or at /sys/kernel/debug/clk/clk_summary
might already be sufficient?
what should I look for in this directory?
I look at the max_freq in Rock 5B and Rock 5B+ and they are 2112000000 and 2736000000 respectively…
Now please run sbc-bench
on the 5B+ and post results link here so we can check whether the DRAM latency issue is present on your system since if not Joshua might have nailed it with ‘DMC disabled’ and @willy may have tested with DRAM clock set to the absolute minimum instead of 2400 MHz.
Amazing, you’re totally right, that was the problem! I’m seeing that the dmc runs at 2.4 GHz during the filling of the memory and goes down to 534 during the scan. Apparently both dmc_ondemand and simple_ondemand are dumb enough to ignore memory reads!!! And I can’t find a configurable one to fix that. Thus I did this:
# cat /sys/class/devfreq/dmc/max_freq > /sys/class/devfreq/dmc/min_freq
and now my ramlat numbers are wayyyyyy better:
willy@rock5:/tmp$ taskset -c 4 ./ramlat -s -n
size: 1x32 2x32 1x64 2x64 1xPTR 2xPTR 4xPTR 8xPTR
4k: 1.752 1.753 1.752 1.753 1.752 1.753 1.753 3.327
8k: 1.752 1.753 1.752 1.753 1.752 1.752 1.753 3.417
16k: 1.752 1.753 1.752 1.753 1.752 1.753 1.753 3.417
32k: 1.752 1.753 1.752 1.753 1.752 1.753 1.753 3.420
64k: 1.754 1.753 1.754 1.753 1.754 1.754 1.755 3.420
128k: 5.596 5.453 5.564 5.450 5.526 6.081 7.460 13.29
256k: 8.115 8.074 8.125 8.084 8.095 8.254 9.772 15.81
512k: 13.82 13.36 13.75 13.36 13.74 13.93 15.79 22.96
1024k: 41.87 39.68 41.90 39.56 41.99 40.28 40.86 46.22
2048k: 52.99 47.90 52.01 47.76 52.38 48.69 48.04 54.20
4096k: 93.04 89.75 93.46 89.62 93.83 88.62 88.12 90.95
8192k: 118.5 108.0 114.8 106.9 113.0 107.3 106.7 110.9
16384k: 139.5 131.8 136.8 130.8 136.2 130.6 129.6 133.0
I’m going to re-run the ramwalk test now.
Edit: ramwalk on rock5-itx is now 8% slower than on rock5b instead of 80% slower! Much better! I’ll need to retest with the 2736 MHz DDR init code, but for this I need to force to boot from the SD, thus make a contact using a resistor, so I’ll do it once I’m home
From my understanding, the DMC is disabled to maintain stability as the memory cannot train/initialize properly when the DDR and SPL blobs do not match.
This can be observed in U-Boot when attaching a serial console:
ERROR: loader&trust unmatch!!! Please update loader if need enable dmc
ERROR: current trust bl31 need match with loader ddr bin V1.13 or newer
ERROR: current loader need match with trust bl31 V1.38-V1.40