Hello,
I have been wondering if DRAM speed could be an explanation for the ROCK 5 ITX compiling slightly slower than the ROCK 5B. Compile times are interesting because they’re extremely sensitive to DRAM speed (well, a bit less on this CPU which has 3MB of shared L3 cache, but still).
My tests are always the same, I’m compiling the exact same code (haproxy-3.0.0) with a canadian gcc-4.7 producing code for i586. The same binary compiler is used on both platforms. The difference is not huge but noticeable. My board boots with the DDR init code v1.16 at 2400 MHz. I wanted to test the 2736 MHz but didn’t want to take the risk of bricking the board by flashing something that wouldn’t boot. But I found on the board’s schematic that it’s possible to force it to boot from SD by connecting the maskrom button’s output to a 20k resistor connected to the ground. Thus I did this, booting Joshua Riek’s ubuntu image version 2.1.0 which still employs v1.11 at 2736 MHz while 2.2.0 adopted v1.16. It worked fine and in order to exclude any risk of kernel/userland differences (though I know by experience that this test really does not depend on these), I booted the same image and only used the SD to load u-boot.
The results are the following. I’ve measured the time taken to build using both a 32 and a 64-bit compiler since 32-bit ones are always faster than the 64-bit ones. This run on either all 8 cores, or only the 4 big ones. The values are the time it takes to compile the code (averaged on 3-4 values due to +/- 0.15s variation between tests), lower is better:
Test | ROCK 5B LPDDR4X-4224 | ROCK-5-ITX LPDDR5-4800 | ROCK-5-ITX LPDDR5-5472 |
---|---|---|---|
4x32b | 24.4 | 25.6 | 24.2 |
4x64b | 25.4 | 26.5 | 25.2 |
8x32b | 19.3 | 19.8 | 19.1 |
8x64b | 20.9 | 21.5 | 20.9 |
Thus it started to become obvious that LPDDR5 is far from being on par with LPDDR4X, because it takes 2736 MHz to catch up with the 2112 MHz one. I’ve compared the ramlat tests on each setup and found something quite intriguing, look at stable window sizes from 32M to 128M, we’re seeing this:
- ROCK 5B - LPDDR4X-4224
size: 1x32 2x32 1x64 2x64 1xPTR 2xPTR 4xPTR 8xPTR
32768k: 132.4 133.8 132.0 133.1 135.7 129.3 125.9 125.1`
65536k: 138.8 139.3 138.7 139.2 138.6 131.8 128.8 131.3`
131072k: 142.6 142.5 142.6 142.4 143.1 136.1 133.7 136.1`
- ROCK 5 ITX - LPDDR5-4800
size: 1x32 2x32 1x64 2x64 1xPTR 2xPTR 4xPTR 8xPTR
32768k: 237.8 233.3 237.5 233.3 237.4 231.3 231.6 233.7
65536k: 248.4 243.9 246.8 242.8 246.2 241.5 242.0 243.5
131072k: 251.7 249.1 251.1 248.8 251.1 247.1 246.9 248.5
- ROCK 5 ITX - LPDDR5-5472
size: 1x32 2x32 1x64 2x64 1xPTR 2xPTR 4xPTR 8xPTR
32768k: 140.8 140.0 140.8 139.8 140.8 137.4 138.8 132.4
65536k: 145.5 143.5 145.3 143.2 145.3 142.4 142.8 143.0
131072k: 147.1 145.5 146.6 145.4 146.8 144.2 144.6 148.4
So roughly speaking we’re seeing that ROCK 5B has a latency of around 138ns, while ROCK 5 ITX at 2736 MHz (initial goal) is around 145ns hence slightly slower, though we have no reason to expect it to be faster since it’s normally unrelated to the 30% frequency increase. And ROCK 5 ITX at 2400 MHz (the new default) is around 245 ns, or 70% slower than when configured at 2736 MHz.
The measured bandwidth on the other hand, only varies by ~7%, which might be why this remained unnoticed till now.
Thus I’m wondering a few things related to this:
- on a few excessively laconic git commits for rkbin, I’ve just seen “improve stability” to justify the roll back from 2736 to 2400, without any mention of any reported issue nor any precise concern. There’s not even a mention of any attempts to run at other intermediary speeds (e.g. 2666 which might possibly match some existing effective operating points). How were the issues met ? Were other frequencies tested ?
- Is the issue related to the signals routing on the ROCK 5 ITX board, to the DRAM chips or to the SoC itself ? The first two ones are the only ones which would leave hope that a future version could bring the higher frequency back.
- the +70% latency increase at 2400 MHz compared to 2736 MHz looks totally abnormal and might result from a bad timing somewhere (an improperly coded RAS or CAS counter maybe?). It would be nice if someone involved in this change had a look at it to figure what’s really happening there.
For the record I performed the measurement this way:
git clone https://github.com/wtarreau/ramspeed
cd ramspeed
make -j
taskset -c 7 ./ramlat -n -s 100 524288