DRAM speed capped to 4266 MT/s on O6?

Hi,

I noticed the latest BIOS update apparently changed the DRAM frequency from 2750 to 3000 MHz (thus 6000 MT/s). I gave it a test and noticed that it didn’t change anything at all in my measurements. I found it strange that the MEM_CFG_MEMFREQ variable appears nowhere in the whole project and subprojects, but maybe I’m missing something.

Since I remembered that my first attempt at rebuilding the BIOS gave me a BIOS for the Merak model instead of the O6, and that this image has a lot of settings available in the BIOS including the DRAM speed, I decided to give it a go again and perform measurements with rambw from ramspeed for each and every value. The results are stunning:

ramspeed2

So the measured performance is perfectly linear from 1600 MT/s (16.5 GB/s) to 4266 MT/s (40.2 GB/s), then it reaches a plateau and all other values provide the exact same data rate. Earlier I was finding that 40.2 GB/s was a bit weak for 5500 MT/s since I’m used to seeing about 60% efficiency usually, though it was not dramatic either, especially for a product in beta-stage. But now I’m figuring that we’re indeed observing these 60% for all values till 4266, and all other ones have no effect, even the 5500 that’s shipped by default.

The SoC seems to claim supporting 6400 MT/s, and the DRAM chips are H58G56AK6BX069 which are 6400 MT/s ones. So normally if everything works well, we should get exactly 50% more DRAM performance.

I think someone from Radxa and/or CIX who has access to the setup code should have a look into this. Maybe a power saving register somewhere caps the DRAM controller frequency for example. Or maybe a DDR4 vs DDR5 setting is incorrect and prevents the controller from going beyond 4266. In any case it’s a bit sad to lose 33% DRAM performance on this board that promises to do great things with AI, and likely outperform quite a number of x86 PCs in this domain!

5 Likes

So it’s 40.2 GB/s instead of 100 GB/s?

I wonder how many things will be not in sync with marketing information on product page…

No, it’s 40 measured instead of ~60 expected. Usually with DRAM you observe ~60% efficiency. Please keep in mind that the product is still being troubleshooted, it’s normal to discover such problems in the early stage, and that’s the purpose of the debug party. In fact I was reassured when seeing the LPDDR5-6400 RAM chips because it does match the marketing info :wink:

2 Likes

Well, website says:

128bit LPDDR5

Frequency 5500MT/s
Up to 64GB
100GB/s Memory Bandwidth

And sorry, but debug party was in January. We have March now and many people bought (not got like debug party members) that SBC already.

Given that no update was provided since the early tests and the various reports, I consider that it’s still being debugged and that everything is late. I mean, DRAM frequency is not nominal yet, CPU frequencies are still lower, CPU enumeration order is a real mess, we still don’t have the up-to-date kernel sources nor even all the edk2 sources, and I’m not sure that all distros now boot on it out of the box. I’m still waiting for these issues to be addressed before ordering a larger one. Yes I know the devices are already for sale and I also find this too early. But that doesn’t change the fact that I consider the debug phase still going since reports have not yet been converted into fixes.

4 Likes

It’s in our own tree. However, this tree is not final yet (will be rewritten to submit back to CIX), so edk2-cix is still pointing to some old commit.

It uses different memory config than O6 so it is working as intended.

1 Like

so edk2-cix is still pointing to some old commit.

Indeed you’re right, I can find your commit 4434e41 there in branch “cix_beta2_radxa_dev_250110_patch”, but by default it builds on commit 89ca6898 which is referenced in the build output of edk2-cix, I don’t know why since it’s not mentioned in the tree itself. I’ll try to change the version in the build scripts to see if it changes anything, thanks for the info!

It uses different memory config than O6 so it is working as intended.

Maybe, but I’m getting the exact same performance with this BIOS (i.e. O6 at 5500 has the exact performance of Merak at 4266). I switched to Merak only so that I have access to the options in the BIOS to compare without reflashing each time.

edk2-cix is more for release management, so the submodules will likely only be updated before we release a new version. Users are expected to either use tagged commit to reproduce a specific release, or manage the submodules themselves for development.

OK, good to know, thank you!

Maybe check if you are running benchmark on small core. You can use taskset to pin them to big cores.

1 Like

Need to tweak here. But since this BIOS is not officially supporting ACPI we will keep changes to this area on hold.

Sadly no, rest assured that I’m always using taskset to pin to big cores only. And after rebuilding with the latest “3000 MHz” version, I’m still getting the exact same performance as with the 2750 MHz one, which is the exact same performance as with Merak at 4266 MT/s (2133 MHz):

$ taskset -c 0,5-11 ~/ramspeed/rambw 1000 30 4194304
40200
40227
40143
40211
40192

Thus I’m pretty positive on the fact that despite latest changes in your branch, the RAM still runs around 4266 MT/s.

If you are using ACPI boot then maybe give a try with DT.
Also does dmidecode reports the “current” memory speed as configured in BIOS?

Hmmm interesting. However it does not have any effect, sadly:

# lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ       MHZ
  0    0      0    0 0:0:0:0          yes 2600.0000 800.0000 1625.0000
  1    0      0    0 1:1:0            yes 1800.0000 800.0000  800.0000
  2    0      0    1 2:2:0            yes 1800.0000 800.0000  800.0000
  3    0      0    2 3:3:0            yes 1800.0000 800.0000  800.0000
  4    0      0    3 4:4:0            yes 1800.0000 800.0000  800.0000
  5    0      0    0 5:5:0:0          yes 2300.0000 800.0000  800.0000
  6    0      0    1 6:6:0:0          yes 2300.0000 800.0000  800.0000
  7    0      0    2 7:7:0:0          yes 2200.0000 800.0000  800.0000
  8    0      0    3 8:8:0:0          yes 2200.0000 800.0000  800.0000
  9    0      0    4 9:9:0:0          yes 2500.0000 800.0000  800.0000
 10    0      0    5 10:10:0:0        yes 2500.0000 800.0000  800.0000
 11    0      0    6 11:11:0:0        yes 2600.0000 800.0000 1625.0000

The issue I think is that since we’re booting on a big core, it gets assigned CPU0 in linux. At best we could imagine trying to change the order so that all 4 big cores appear first. But it was suggested elsewhere that on Arm it’s more common to find small cores before large ones, so here the fix would be to make the board boot from a small core, but I’m not sure if this is possible at all. Maybe the SoC decides to boot on a big one ?

I’ll try to change that file (thank you for the pointer) to make all big cores appear there. At least if it works, it will ease IRQ configuration and taskset usage.

Edit: I changed the file to try to place big cores, then medium ones, then little ones, and it didn’t change anything either. As such, I’m not really sure this file is being used by linux during CPU enumeration. Or maybe we need to change one of the CPU numbers there ? I’m seeing the same number present twice on each line, maybe one is an index and the other a hw cpu id ?

Edit: no change either.

The board boots by default in device tree mode after flashing, so I was in DT. I tried as well with ACPI, but both show the exact same performance.

Oh, good idea, it’s indeed reported as 6000 MT/s:

# dmidecode -t 17|grep MT     
        Speed: 6000 MT/s
        Configured Memory Speed: 6000 MT/s
        Speed: 6000 MT/s
        Configured Memory Speed: 6000 MT/s
        Speed: 6000 MT/s
        Configured Memory Speed: 6000 MT/s
        Speed: 6000 MT/s
        Configured Memory Speed: 6000 MT/s

which means that the change applied in the makefile was properly considered there.

Do you have a way to measure the frequency on a test point with a frequency meter, or to look at signals with a scope ? I know this can require pretty serious equipment at such frequencies, which is why I’m asking out of curiosity. I strongly suspect that something in the SoC itself is capping the frequency to LPDDR4X speeds, and that once found it’ll unlock +50% performance.

2 Likes

We will try reproducing your finding first, and then discuss with CIX about potential causes.

6 Likes