News about the ROCK 5B Plus! ;)

Yes, the SMB Multichannel would be extremely useful especially when you are using 10GbE for NAS.
I would also like to test the SMB direct RoCE / InfiniteBand. Had you tested that on the Rock5B ITX? Could that reduce the CPU usage?

Just like @tkaiser said - none pro network equipment consider 5G network speed,
but for sure there are some home consumers devices like those from asus and this requires such hardware.
2.5G jump is relatively cheap and easy. I completely agree that 5G is bad idea compared to 10G at same price and much more devices available. I don’t thing this will change anywhere in future.

1 Like

Looks like the RPI-5 handles the 5GbE nic at ~3.3Gbit @ PCIe 2.0. Doesn’t seem worth it from 2.5GbE except if some PCIe 3 lanes are left over.
https://fixupx.com/will_whang/status/1797053374199959813

2 Likes

Thanks for that comparison, I still think that this particular speed will be skipped by most of us, it’s just way too expensive compared to 10G and just problematic in hardware (transceivers, switches, routers). 2.5G was cheap bonus for gigabit and easy upgrade :slight_smile:

Any news on 5B+?
It’s been a while.

given that there are still strange/bad performance measurements on the DDR5 with the latest DDR SPL code on Rock5 ITX (see DRAM speed on ROCK 5 ITX), I think it’s prudent to let engineers try to sort everything out before deciding what to do/act on. If in the end they have to choose a different DRAM chip or slightly adjust routing to use the optimal timings, you’ll probably be very glad to have waited a little bit more for an optimal product.

1 Like

from the source I know, there was plan to release Rock 5B Plus in Jul, so stay tune! :wink:

Completely agree,
We all hope that DDR5 can increase RAM speed compared to DDR4. For now all other boards released with this pair - RK3588 and DDR5 have none speed improvement, if final result is really based on RAM chip then simple software update will not help those.

With most recent v1.16 DRAM initialization BLOBs it’s even worse compared to LPDDR4X, just check Willy’s aforementioned tests with different BLOBs.

1 Like

Confirmed, I even ran a quick synthetic test that was twice as slow to execute on the ITX than the rock5b. I need to see if I’m still having the code, in order to share it.

Edit: here it is:

rock@rock-5b:~$ time taskset -c 4 ./a.out 10
malloc...
fill...
scan...

real    6m2.604s
user    6m1.861s
sys     0m0.648s

willy@rock5-itx:~$ time taskset -c 4 ./a.out 10
malloc...
fill...
scan...

real    11m15.793s
user    11m14.447s
sys     0m1.101s

Both run at the same CPU frequency, and rock5-itx is twice slower due to v1.16’s DDR5 timings.

I’m going to put the tool on my github for easier testing, will update the URL shortly.

1 Like

Sure. Sure. I don’t want anyone to rush the board to market, I was just curious because everything has gone pretty much silent for a longer while and I’ve been waiting for any news at all.

I’ve now added a reproducer to the ramspeed repo and updatd the procedure on the rock5-itx thread above.

for a real world ram use case try llamacpp from phoronix test suite or github

here on my tests it caps at 20GB/s or so i.e 5t/s on 7B Q4 models

Has the SPI been cleared? I saw that mismatching DDR and SPL blobs can disable the DMC leading to a performance impact.

1 Like

The thing is that llama.cpp also depends on calculations so you never know if you’re CPU-bound or RAM-bound until you can test on another machine with different CPU and same RAM or different RAM and same CPU. I happen to have access to a 80-core Ampere Altra made of Neoverse-N1 cores that are exactly the same as Cortex-A76. Llama.cpp is quite fast there, and there are 6 DDR4 channels. But once limited to 4 cores, it’s basically the same speed as on the rk3588, showing that the 4 A76 there are delivering what they can and are a most important limiting factor than the DRAM speed.

@willy might answer this (I personally haven’t conducted any tests on RK3588 for quite some time)

That’s an interesting bit of information! With BSP kernel running looking below /sys/class/devfreq/dmc (or checking the existence of this dir) or at /sys/kernel/debug/clk/clk_summary might already be sufficient?

what should I look for in this directory?
I look at the max_freq in Rock 5B and Rock 5B+ and they are 2112000000 and 2736000000 respectively…:thinking:

Now please run sbc-bench on the 5B+ and post results link here so we can check whether the DRAM latency issue is present on your system since if not Joshua might have nailed it with ‘DMC disabled’ and @willy may have tested with DRAM clock set to the absolute minimum instead of 2400 MHz.

2 Likes

Amazing, you’re totally right, that was the problem! I’m seeing that the dmc runs at 2.4 GHz during the filling of the memory and goes down to 534 during the scan. Apparently both dmc_ondemand and simple_ondemand are dumb enough to ignore memory reads!!! And I can’t find a configurable one to fix that. Thus I did this:

# cat /sys/class/devfreq/dmc/max_freq > /sys/class/devfreq/dmc/min_freq

and now my ramlat numbers are wayyyyyy better:

willy@rock5:/tmp$ taskset -c 4 ./ramlat -s -n
   size:  1x32  2x32  1x64  2x64 1xPTR 2xPTR 4xPTR 8xPTR
     4k: 1.752 1.753 1.752 1.753 1.752 1.753 1.753 3.327 
     8k: 1.752 1.753 1.752 1.753 1.752 1.752 1.753 3.417 
    16k: 1.752 1.753 1.752 1.753 1.752 1.753 1.753 3.417 
    32k: 1.752 1.753 1.752 1.753 1.752 1.753 1.753 3.420 
    64k: 1.754 1.753 1.754 1.753 1.754 1.754 1.755 3.420 
   128k: 5.596 5.453 5.564 5.450 5.526 6.081 7.460 13.29 
   256k: 8.115 8.074 8.125 8.084 8.095 8.254 9.772 15.81 
   512k: 13.82 13.36 13.75 13.36 13.74 13.93 15.79 22.96 
  1024k: 41.87 39.68 41.90 39.56 41.99 40.28 40.86 46.22 
  2048k: 52.99 47.90 52.01 47.76 52.38 48.69 48.04 54.20 
  4096k: 93.04 89.75 93.46 89.62 93.83 88.62 88.12 90.95 
  8192k: 118.5 108.0 114.8 106.9 113.0 107.3 106.7 110.9 
 16384k: 139.5 131.8 136.8 130.8 136.2 130.6 129.6 133.0 

I’m going to re-run the ramwalk test now.

Edit: ramwalk on rock5-itx is now 8% slower than on rock5b instead of 80% slower! Much better! I’ll need to retest with the 2736 MHz DDR init code, but for this I need to force to boot from the SD, thus make a contact using a resistor, so I’ll do it once I’m home :slight_smile:

3 Likes

From my understanding, the DMC is disabled to maintain stability as the memory cannot train/initialize properly when the DDR and SPL blobs do not match.

This can be observed in U-Boot when attaching a serial console:

ERROR:   loader&trust unmatch!!! Please update loader if need enable dmc
ERROR:   current trust bl31 need match with loader ddr bin V1.13 or newer
ERROR:   current loader need match with trust bl31 V1.38-V1.40
4 Likes