Rock 5A lpddr4x memory bandwidth?

I’m trying to figure out what memory bandwidth the rock 5A has.

From the RK3588 and RK3588S datasheets it looks like both processors have the same memory bandwidth: four channels of 16-bit LPDDR4/4x or LPDDR5.

The rock 5A has LPDDR4x memory according to the specs, but when I compare the schematics for rock 5A and 5B it looks like there is some difference in the DDR memory to CPU connectivity. Does the 5B use four channels and the 5A only two? I don’t understand the schematics.

In any case, according to https://en.wikipedia.org/wiki/LPDDR the transfer rate of LPDDR4x is 4267 MT/s. With 16 data pins per channel that gives 8,53GB/s per channel. Four channels would then give a theoretical upper limit of 34GB/s. Right?

But are the memory channels “interleaved” so that if I just read data sequentially I will benefit from all four channels? And is each channel equally tightly bound to each core?

I did some experiments with the memory bandwidth benchmark program mbw. I have a Rock 5A with 16GB RAM. The system is a very minimal Armbian image. Only 328MB Ram is used, the rest is free. CPU load is 0.00. Results differed a lot between different runs. See below. Results vary between 3,3GB/s and 22,6GB/s. This number is copy rate, i.e. read + write, so 22GB/s *2 = 44GB/s. But that seems to be more than the theoretical upper limit of 34GB/s I just calculated above?

Can someone explain what’s going on?

    root@rocky5-2:/home/enok# mbw 1024 -t2 -b $((1024*1024*16))
    Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory.
    Using 16777216 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MCBLOCK Elapsed: 0.30637        MiB: 1024.00000 Copy: 3342.353 MiB/s
    1       Method: MCBLOCK Elapsed: 0.30460        MiB: 1024.00000 Copy: 3361.775 MiB/s
    2       Method: MCBLOCK Elapsed: 0.30420        MiB: 1024.00000 Copy: 3366.251 MiB/s
    3       Method: MCBLOCK Elapsed: 0.30426        MiB: 1024.00000 Copy: 3365.532 MiB/s
    4       Method: MCBLOCK Elapsed: 0.30397        MiB: 1024.00000 Copy: 3368.709 MiB/s
    5       Method: MCBLOCK Elapsed: 0.30193        MiB: 1024.00000 Copy: 3391.548 MiB/s
    6       Method: MCBLOCK Elapsed: 0.30176        MiB: 1024.00000 Copy: 3393.380 MiB/s
    7       Method: MCBLOCK Elapsed: 0.29961        MiB: 1024.00000 Copy: 3417.833 MiB/s
    8       Method: MCBLOCK Elapsed: 0.29930        MiB: 1024.00000 Copy: 3421.362 MiB/s
    9       Method: MCBLOCK Elapsed: 0.29951        MiB: 1024.00000 Copy: 3418.918 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.30255        MiB: 1024.00000 Copy: 3384.559 MiB/s
    root@rocky5-2:/home/enok# mbw 1024 -t2 -b $((1024*1024*16))
    Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory.
    Using 16777216 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MCBLOCK Elapsed: 0.30750        MiB: 1024.00000 Copy: 3330.038 MiB/s
    1       Method: MCBLOCK Elapsed: 0.30648        MiB: 1024.00000 Copy: 3341.186 MiB/s
    2       Method: MCBLOCK Elapsed: 0.30652        MiB: 1024.00000 Copy: 3340.761 MiB/s
    3       Method: MCBLOCK Elapsed: 0.30570        MiB: 1024.00000 Copy: 3349.700 MiB/s
    4       Method: MCBLOCK Elapsed: 0.30438        MiB: 1024.00000 Copy: 3364.271 MiB/s
    5       Method: MCBLOCK Elapsed: 0.30430        MiB: 1024.00000 Copy: 3365.078 MiB/s
    6       Method: MCBLOCK Elapsed: 0.30422        MiB: 1024.00000 Copy: 3366.018 MiB/s
    7       Method: MCBLOCK Elapsed: 0.30442        MiB: 1024.00000 Copy: 3363.730 MiB/s
    8       Method: MCBLOCK Elapsed: 0.30434        MiB: 1024.00000 Copy: 3364.625 MiB/s
    9       Method: MCBLOCK Elapsed: 0.30416        MiB: 1024.00000 Copy: 3366.660 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.30520        MiB: 1024.00000 Copy: 3355.157 MiB/s
    root@rocky5-2:/home/enok# mbw 1024 -t2 -b $((1024*1024))    
    Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory.
    Using 1048576 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MCBLOCK Elapsed: 0.27831        MiB: 1024.00000 Copy: 3679.350 MiB/s
    1       Method: MCBLOCK Elapsed: 0.27823        MiB: 1024.00000 Copy: 3680.382 MiB/s
    2       Method: MCBLOCK Elapsed: 0.27792        MiB: 1024.00000 Copy: 3684.540 MiB/s
    3       Method: MCBLOCK Elapsed: 0.27777        MiB: 1024.00000 Copy: 3686.450 MiB/s
    4       Method: MCBLOCK Elapsed: 0.27767        MiB: 1024.00000 Copy: 3687.804 MiB/s
    5       Method: MCBLOCK Elapsed: 0.27765        MiB: 1024.00000 Copy: 3688.070 MiB/s
    6       Method: MCBLOCK Elapsed: 0.27773        MiB: 1024.00000 Copy: 3687.074 MiB/s
    7       Method: MCBLOCK Elapsed: 0.27759        MiB: 1024.00000 Copy: 3688.920 MiB/s
    8       Method: MCBLOCK Elapsed: 0.27754        MiB: 1024.00000 Copy: 3689.492 MiB/s
    9       Method: MCBLOCK Elapsed: 0.27789        MiB: 1024.00000 Copy: 3684.872 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.27783        MiB: 1024.00000 Copy: 3685.692 MiB/s
    root@rocky5-2:/home/enok# mbw 1024 -t2
    Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory.
    Using 262144 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MCBLOCK Elapsed: 0.15515        MiB: 1024.00000 Copy: 6599.979 MiB/s
    1       Method: MCBLOCK Elapsed: 0.15248        MiB: 1024.00000 Copy: 6715.723 MiB/s
    2       Method: MCBLOCK Elapsed: 0.15353        MiB: 1024.00000 Copy: 6669.532 MiB/s
    3       Method: MCBLOCK Elapsed: 0.15216        MiB: 1024.00000 Copy: 6729.581 MiB/s
    4       Method: MCBLOCK Elapsed: 0.15102        MiB: 1024.00000 Copy: 6780.694 MiB/s
    5       Method: MCBLOCK Elapsed: 0.15252        MiB: 1024.00000 Copy: 6714.050 MiB/s
    6       Method: MCBLOCK Elapsed: 0.15204        MiB: 1024.00000 Copy: 6734.893 MiB/s
    7       Method: MCBLOCK Elapsed: 0.15100        MiB: 1024.00000 Copy: 6781.457 MiB/s
    8       Method: MCBLOCK Elapsed: 0.15196        MiB: 1024.00000 Copy: 6738.748 MiB/s
    9       Method: MCBLOCK Elapsed: 0.15209        MiB: 1024.00000 Copy: 6732.900 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.15240        MiB: 1024.00000 Copy: 6719.376 MiB/s
    root@rocky5-2:/home/enok# mbw 1024 -t2
    Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory.
    Using 262144 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MCBLOCK Elapsed: 0.15048        MiB: 1024.00000 Copy: 6804.755 MiB/s
    1       Method: MCBLOCK Elapsed: 0.14979        MiB: 1024.00000 Copy: 6836.146 MiB/s
    2       Method: MCBLOCK Elapsed: 0.14968        MiB: 1024.00000 Copy: 6841.490 MiB/s
    3       Method: MCBLOCK Elapsed: 0.14930        MiB: 1024.00000 Copy: 6858.490 MiB/s
    4       Method: MCBLOCK Elapsed: 0.14919        MiB: 1024.00000 Copy: 6863.777 MiB/s
    5       Method: MCBLOCK Elapsed: 0.14912        MiB: 1024.00000 Copy: 6866.815 MiB/s
    6       Method: MCBLOCK Elapsed: 0.14767        MiB: 1024.00000 Copy: 6934.475 MiB/s
    7       Method: MCBLOCK Elapsed: 0.14780        MiB: 1024.00000 Copy: 6928.141 MiB/s
    8       Method: MCBLOCK Elapsed: 0.14778        MiB: 1024.00000 Copy: 6929.125 MiB/s
    9       Method: MCBLOCK Elapsed: 0.14797        MiB: 1024.00000 Copy: 6920.322 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.14888        MiB: 1024.00000 Copy: 6878.073 MiB/s
    root@rocky5-2:/home/enok# mbw 1024
    Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory.
    Using 262144 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MEMCPY  Elapsed: 0.31557        MiB: 1024.00000 Copy: 3244.942 MiB/s
    1       Method: MEMCPY  Elapsed: 0.31837        MiB: 1024.00000 Copy: 3216.424 MiB/s
    2       Method: MEMCPY  Elapsed: 0.31786        MiB: 1024.00000 Copy: 3221.534 MiB/s
    3       Method: MEMCPY  Elapsed: 0.31565        MiB: 1024.00000 Copy: 3244.058 MiB/s
    4       Method: MEMCPY  Elapsed: 0.31536        MiB: 1024.00000 Copy: 3247.093 MiB/s
    5       Method: MEMCPY  Elapsed: 0.31545        MiB: 1024.00000 Copy: 3246.187 MiB/s
    6       Method: MEMCPY  Elapsed: 0.31552        MiB: 1024.00000 Copy: 3245.436 MiB/s
    7       Method: MEMCPY  Elapsed: 0.31535        MiB: 1024.00000 Copy: 3247.186 MiB/s
    8       Method: MEMCPY  Elapsed: 0.31566        MiB: 1024.00000 Copy: 3244.048 MiB/s
    9       Method: MEMCPY  Elapsed: 0.31542        MiB: 1024.00000 Copy: 3246.455 MiB/s
    AVG     Method: MEMCPY  Elapsed: 0.31602        MiB: 1024.00000 Copy: 3240.300 MiB/s
    0       Method: DUMB    Elapsed: 0.31287        MiB: 1024.00000 Copy: 3272.904 MiB/s
    1       Method: DUMB    Elapsed: 0.31269        MiB: 1024.00000 Copy: 3274.777 MiB/s
    2       Method: DUMB    Elapsed: 0.31289        MiB: 1024.00000 Copy: 3272.684 MiB/s
    3       Method: DUMB    Elapsed: 0.31298        MiB: 1024.00000 Copy: 3271.795 MiB/s
    4       Method: DUMB    Elapsed: 0.31280        MiB: 1024.00000 Copy: 3273.626 MiB/s
    5       Method: DUMB    Elapsed: 0.31286        MiB: 1024.00000 Copy: 3273.029 MiB/s
    6       Method: DUMB    Elapsed: 0.31268        MiB: 1024.00000 Copy: 3274.924 MiB/s
    7       Method: DUMB    Elapsed: 0.31267        MiB: 1024.00000 Copy: 3274.966 MiB/s
    8       Method: DUMB    Elapsed: 0.31289        MiB: 1024.00000 Copy: 3272.716 MiB/s
    9       Method: DUMB    Elapsed: 0.31281        MiB: 1024.00000 Copy: 3273.532 MiB/s
    AVG     Method: DUMB    Elapsed: 0.31282        MiB: 1024.00000 Copy: 3273.495 MiB/s
    0       Method: MCBLOCK Elapsed: 0.16242        MiB: 1024.00000 Copy: 6304.836 MiB/s
    1       Method: MCBLOCK Elapsed: 0.16153        MiB: 1024.00000 Copy: 6339.380 MiB/s
    2       Method: MCBLOCK Elapsed: 0.16297        MiB: 1024.00000 Copy: 6283.519 MiB/s
    3       Method: MCBLOCK Elapsed: 0.16278        MiB: 1024.00000 Copy: 6290.776 MiB/s
    4       Method: MCBLOCK Elapsed: 0.16181        MiB: 1024.00000 Copy: 6328.371 MiB/s
    5       Method: MCBLOCK Elapsed: 0.16159        MiB: 1024.00000 Copy: 6336.869 MiB/s
    6       Method: MCBLOCK Elapsed: 0.16329        MiB: 1024.00000 Copy: 6271.167 MiB/s
    7       Method: MCBLOCK Elapsed: 0.16224        MiB: 1024.00000 Copy: 6311.793 MiB/s
    8       Method: MCBLOCK Elapsed: 0.16306        MiB: 1024.00000 Copy: 6279.743 MiB/s
    9       Method: MCBLOCK Elapsed: 0.16157        MiB: 1024.00000 Copy: 6337.810 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.16233        MiB: 1024.00000 Copy: 6308.328 MiB/s
    root@rocky5-2:/home/enok# mbw 4096
    Long uses 8 bytes. Allocating 2*536870912 elements = 8589934592 bytes of memory.
    Using 262144 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MEMCPY  Elapsed: 1.26159        MiB: 4096.00000 Copy: 3246.707 MiB/s
    1       Method: MEMCPY  Elapsed: 1.20823        MiB: 4096.00000 Copy: 3390.094 MiB/s
    2       Method: MEMCPY  Elapsed: 0.50646        MiB: 4096.00000 Copy: 8087.477 MiB/s
    3       Method: MEMCPY  Elapsed: 0.50663        MiB: 4096.00000 Copy: 8084.859 MiB/s
    4       Method: MEMCPY  Elapsed: 0.50672        MiB: 4096.00000 Copy: 8083.423 MiB/s
    5       Method: MEMCPY  Elapsed: 0.50682        MiB: 4096.00000 Copy: 8081.733 MiB/s
    6       Method: MEMCPY  Elapsed: 0.50679        MiB: 4096.00000 Copy: 8082.291 MiB/s
    7       Method: MEMCPY  Elapsed: 0.50687        MiB: 4096.00000 Copy: 8080.952 MiB/s
    8       Method: MEMCPY  Elapsed: 0.50697        MiB: 4096.00000 Copy: 8079.294 MiB/s
    9       Method: MEMCPY  Elapsed: 0.50704        MiB: 4096.00000 Copy: 8078.226 MiB/s
    AVG     Method: MEMCPY  Elapsed: 0.65241        MiB: 4096.00000 Copy: 6278.248 MiB/s
    0       Method: DUMB    Elapsed: 0.47094        MiB: 4096.00000 Copy: 8697.462 MiB/s
    1       Method: DUMB    Elapsed: 0.47085        MiB: 4096.00000 Copy: 8699.217 MiB/s
    2       Method: DUMB    Elapsed: 0.47095        MiB: 4096.00000 Copy: 8697.295 MiB/s
    3       Method: DUMB    Elapsed: 0.47084        MiB: 4096.00000 Copy: 8699.309 MiB/s
    4       Method: DUMB    Elapsed: 0.47097        MiB: 4096.00000 Copy: 8696.945 MiB/s
    5       Method: DUMB    Elapsed: 0.47097        MiB: 4096.00000 Copy: 8697.000 MiB/s
    6       Method: DUMB    Elapsed: 0.47100        MiB: 4096.00000 Copy: 8696.335 MiB/s
    7       Method: DUMB    Elapsed: 0.47090        MiB: 4096.00000 Copy: 8698.219 MiB/s
    8       Method: DUMB    Elapsed: 0.47110        MiB: 4096.00000 Copy: 8694.526 MiB/s
    9       Method: DUMB    Elapsed: 0.47099        MiB: 4096.00000 Copy: 8696.649 MiB/s
    AVG     Method: DUMB    Elapsed: 0.47095        MiB: 4096.00000 Copy: 8697.295 MiB/s
    0       Method: MCBLOCK Elapsed: 0.18193        MiB: 4096.00000 Copy: 22514.525 MiB/s
    1       Method: MCBLOCK Elapsed: 0.18064        MiB: 4096.00000 Copy: 22674.431 MiB/s
    2       Method: MCBLOCK Elapsed: 0.18076        MiB: 4096.00000 Copy: 22659.755 MiB/s
    3       Method: MCBLOCK Elapsed: 0.18101        MiB: 4096.00000 Copy: 22628.334 MiB/s
    4       Method: MCBLOCK Elapsed: 0.18088        MiB: 4096.00000 Copy: 22645.473 MiB/s
    5       Method: MCBLOCK Elapsed: 0.18073        MiB: 4096.00000 Copy: 22664.144 MiB/s
    6       Method: MCBLOCK Elapsed: 0.18074        MiB: 4096.00000 Copy: 22663.015 MiB/s
    7       Method: MCBLOCK Elapsed: 0.18090        MiB: 4096.00000 Copy: 22642.469 MiB/s
    8       Method: MCBLOCK Elapsed: 0.18069        MiB: 4096.00000 Copy: 22669.161 MiB/s
    9       Method: MCBLOCK Elapsed: 0.18072        MiB: 4096.00000 Copy: 22665.272 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.18090        MiB: 4096.00000 Copy: 22642.569 MiB/s
    root@rocky5-2:/home/enok# mbw 1024
    Long uses 8 bytes. Allocating 2*134217728 elements = 2147483648 bytes of memory.
    Using 262144 bytes as blocks for memcpy block copy test.
    Getting down to business... Doing 10 runs per test.
    0       Method: MEMCPY  Elapsed: 0.12354        MiB: 1024.00000 Copy: 8289.149 MiB/s
    1       Method: MEMCPY  Elapsed: 0.12347        MiB: 1024.00000 Copy: 8293.445 MiB/s
    2       Method: MEMCPY  Elapsed: 0.12344        MiB: 1024.00000 Copy: 8295.327 MiB/s
    3       Method: MEMCPY  Elapsed: 0.12349        MiB: 1024.00000 Copy: 8292.169 MiB/s
    4       Method: MEMCPY  Elapsed: 0.12347        MiB: 1024.00000 Copy: 8293.445 MiB/s
    5       Method: MEMCPY  Elapsed: 0.12348        MiB: 1024.00000 Copy: 8292.505 MiB/s
    6       Method: MEMCPY  Elapsed: 0.12348        MiB: 1024.00000 Copy: 8292.639 MiB/s
    7       Method: MEMCPY  Elapsed: 0.12349        MiB: 1024.00000 Copy: 8291.901 MiB/s
    8       Method: MEMCPY  Elapsed: 0.12351        MiB: 1024.00000 Copy: 8290.760 MiB/s
    9       Method: MEMCPY  Elapsed: 0.12351        MiB: 1024.00000 Copy: 8290.961 MiB/s
    AVG     Method: MEMCPY  Elapsed: 0.12349        MiB: 1024.00000 Copy: 8292.230 MiB/s
    0       Method: DUMB    Elapsed: 0.11363        MiB: 1024.00000 Copy: 9012.101 MiB/s
    1       Method: DUMB    Elapsed: 0.11368        MiB: 1024.00000 Copy: 9007.820 MiB/s
    2       Method: DUMB    Elapsed: 0.11363        MiB: 1024.00000 Copy: 9011.387 MiB/s
    3       Method: DUMB    Elapsed: 0.11359        MiB: 1024.00000 Copy: 9014.799 MiB/s
    4       Method: DUMB    Elapsed: 0.11362        MiB: 1024.00000 Copy: 9012.418 MiB/s
    5       Method: DUMB    Elapsed: 0.11471        MiB: 1024.00000 Copy: 8926.703 MiB/s
    6       Method: DUMB    Elapsed: 0.11366        MiB: 1024.00000 Copy: 9009.564 MiB/s
    7       Method: DUMB    Elapsed: 0.11361        MiB: 1024.00000 Copy: 9013.053 MiB/s
    8       Method: DUMB    Elapsed: 0.11363        MiB: 1024.00000 Copy: 9011.625 MiB/s
    9       Method: DUMB    Elapsed: 0.11363        MiB: 1024.00000 Copy: 9012.022 MiB/s
    AVG     Method: DUMB    Elapsed: 0.11374        MiB: 1024.00000 Copy: 9003.076 MiB/s
    0       Method: MCBLOCK Elapsed: 0.05301        MiB: 1024.00000 Copy: 19315.652 MiB/s
    1       Method: MCBLOCK Elapsed: 0.05176        MiB: 1024.00000 Copy: 19781.706 MiB/s
    2       Method: MCBLOCK Elapsed: 0.05226        MiB: 1024.00000 Copy: 19594.711 MiB/s
    3       Method: MCBLOCK Elapsed: 0.05186        MiB: 1024.00000 Copy: 19746.992 MiB/s
    4       Method: MCBLOCK Elapsed: 0.05178        MiB: 1024.00000 Copy: 19776.739 MiB/s
    5       Method: MCBLOCK Elapsed: 0.05155        MiB: 1024.00000 Copy: 19862.668 MiB/s
    6       Method: MCBLOCK Elapsed: 0.05161        MiB: 1024.00000 Copy: 19841.501 MiB/s
    7       Method: MCBLOCK Elapsed: 0.05122        MiB: 1024.00000 Copy: 19993.752 MiB/s
    8       Method: MCBLOCK Elapsed: 0.05140        MiB: 1024.00000 Copy: 19920.241 MiB/s
    9       Method: MCBLOCK Elapsed: 0.05117        MiB: 1024.00000 Copy: 20013.290 MiB/s
    AVG     Method: MCBLOCK Elapsed: 0.05176        MiB: 1024.00000 Copy: 19782.776 MiB/s

The result seems to indicate a bug in the benchmark program, it’s either copying from the same place or copying to the same place, causing a lot of cache usage and not much actual memory access.

Maybe check the source code of mbw.

As to the run-to-run variance, if you’re using the BSP kernel it has dynamic memory frequency, and would down clock the memory to somewhere around 1000MT/s at idle to save power. Its responsiveness is uh, abysmal, to put it lightly, and it takes a long time of high memory controller load for the frequency to actually ramp up to where it’s supposed to be.

I don’t have a 5A but on 5B it’s 4224 MT/s.

mbw allocates two memory blocks of the given length (e.g. 2x4GB) and then copies data from one block to the other, repeatedly. It tries three methods: memcpy, a naive loop, and finally “mcblock” which is a loop over smaller memcpy, e.g. 256kB chunks. The last one “mcblock” is what gives huge values for me. I don’t see how cache could cause problems as long as the total block sizes are much bigger than cache memory? Could it perhaps be kernel merging virtual memory pages with identical content?

Here is the sourcecode for “mcblock”.

    } else if(type==TEST_MCBLOCK) { /* memcpy block test */
        char* src = (char*)a;
        char* dst = (char*)b;
        gettimeofday(&starttime, NULL);
        for (t=array_bytes; t >= block_size; t-=block_size, src+=block_size){
            dst=(char *) memcpy(dst, src, block_size) + block_size;
        }
        if(t) {
            dst=(char *) memcpy(dst, src, t) + t;
        }
        gettimeofday(&endtime, NULL);

Ah!

And with four channels at 16 bits each that means 4x2x4224 = 33792 MB/s right? So I should see something like 16GB/s with mbw?

mbw before version 1.3 have a bug where it doesn’t advance its source pointer in MCBLOCK test, and its default block size of 256K isn’t that big, it can fit in L2 cache. If you installed it from Debian repository it has that bug.

And with four channels at 16 bits each that means 4x2x4224 = 33792 MB/s right?

That is the theoretical peak, yes.

So I should see something like 16GB/s with mbw?

Depends on memory timings(probably not adjustable), with JEDEC timings it’s usually 2/3 of theoretical bandwidth, so somewhere around 11000MB/s.

Excellent. Everything makes sense then. 22GB/s is exactly 2/3 of peak value, given that it’s only write taking place (read is only from cache due to that bug).

And yes I used the debian mbw. I tried the latest mbw from github instead now and got lower and more stable values (around 8.8GB/s). Bug fixed apparently.

And yes: the 5A also has four channels x 16bit lpddr4x, just like the 5B, the schematics also make sense to me now.

Thanks Linaea for helping me understand!