Floating performance degradation time to time

Hi, all

I have a wierd issue with performance ROCK 5b on official image “Ubuntu 22.04.4” with Linux kernel LTS 5.10.160-rockchip #37 SMP Fri Apr 26 05:16:30.

Sometime something happened and I have about 10-25% perf degradation.

To catch this I’ve prepared monitoring tools.
I use dgemm to make a hight-CPU usage loading.

I’ve:

  • run dgemm in an one by one sequence. Makes about 100 iterations. I could see a degradation time to time.
  • grabbing in parallel a top of CPU usage processes (by linux top). I could see only my dgemm process that generate high-load noise. Nothing else.
  • grabbing in parallel the SBC temperature. It was about 29-41C but without any correlations with time when I could see the degradations

This is kind of chart with perf by time/iteration

I’ve checked dmesg and it was clear. No any wierd problems like deals with DRAM or power or something else.

It looks like I have some problem with throttling or power limitation problem or something else. But I haven’t found any handles to check this theory,

  • how to check current throttle state or events ?
  • how to check current power consumption ?
  • maybe we have some counters to check DRAM error ?

Maybe someone has had such a problem ?
Any ideas are welcome.

Are you certain that your workload always runs on the big cores ? Maybe you’re having less than 8 threads and it migrates from CPU to CPU, oscillating between the perf of the A55 and those of the A76 ? Regarding performance stability over time (throttling etc), sbc-bench has some tests which monitor the frequency and temperature over a series of tests. If you run it and observe variations it will definitely indicate some throttling. But I suspect that you’re essentially affected by thread migrations. You could try to run your test tool only on the big clusters, prefixing it with “taskset -c 4-7” to verify if this is the case.

Unfortunately I’m sure
Use for this taskset -c 4-7

for i in {1..1000}; { 
        time OPENBLAS_NUM_THREADS=4 OPENBLAS_LOOPS=10 \
            taskset -c 4-7 ./dgemm 1200 2400 600; 
}

And one more thing. This work for debuging was initiated by other users who constantly could see some degradation time to time.

Thanks for your advice about sbc-bench
Will try to look it.

1 Like