I recently tested IPFire on the RPi 5 (yeah, not supported but if you’re able to combine bootloader+kernel with whatever userland – former Armbian/OMV dev here – then it works ofc). When testing NAT throughput and allowing the A76s to clock lower than ~2000 MHz cpu0 was always the bottleneck since all the IRQs ended up there and througput started to suffer.
Have you had a look at this (/proc/interrupts and watching atop/htop output)? And maybe know a way to use IRQ affinity with RPi kernels to get rid of this stupid cpu0 bottleneck by moving the IRQs in question to cpu1 – cpu3?
Or more generally speaking… did you have a look at what the bottleneck is in these situations? ![]()
BTW: As mentioned in some Github issue where we met: RPi folks don’t use io_is_busy=1 while relying on ondemand cpufreq governor and in certain situations (though usually not benchmarks) this will result in the CPU cores not ramping up clockspeeds when needed.