Numa emulation on rk3588

Absolutely!

Also it’s important to keep in mind that the RPi5 still has a prehistoric single-channel 32-bit memory bus. 32 bits!!! For 4 cores, when most workloads nowadays are memory-bound! Other SoCs featuring reasonably modern cores have at least 64 bits, sometimes even more, and split into multiple channels (the Altra I have at work has six 64-bit channels, the LX2160A has two 64-bit ones). RK3588 splits its 64 bits in 4x16 bit channels which means that the 4 big cores can have a relative independence providing low latency when fetching from or writing to multiple areas in parallel. On an RPi5 you need to wait an eternity for the operation in progress to complete, that’s minimum 16 memory bus cycles for a single cache line to finish before the 3 other cores can fight again for whom will be granted access to the RAM.

While I was attracted by the correct cores in the RPi5 (at last!), I never bought one due to its ridiculous memory controller that makes it unfit for most tasks, as is even shown in benchmarks like above, and in the patches that try hard to work around that misdesign.

3 Likes