Rock5B RMA procedure?

Hi,

I got two Rock5B from AllNet China but it looks like one of the two has some HW problem.
It randomly crash with kernel stack traces after a few minutes running any stress test suite like stress-ng (both on the radxa produced images and on armbian).

The other works fine.

Anyone has tried to RMA their device ?

Paul

The most common cause of an unstable system like this has been memory reads occasionally returning corrupted data.

It is apparently fixed by newer DDR initialization code, but lowering the maximum frequency might workaround the problem, or at least can be used to test whether you are affected by this problem:

$ echo 1560000000 | sudo tee /sys/class/devfreq/dmc/max_freq
1 Like

Please update to the latest image with newer rkbin blob. Also, check the power supply voltage with sensors command.

Where can I find the latest image ?
https://github.com/radxa/debos-radxa/releases/download/20221031-1045/rock-5b-debian-bullseye-xfce4-arm64-20221031-1558-gpt.img.xz ?

I’ll report the sensors output if I manage to boot it up without crashing.
This is the sensors output on the other board, same power supply:

admin@rock-5b:~$ sensors
gpu_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

littlecore_thermal-virtual-0
Adapter: Virtual device
temp1: +31.5°C

bigcore0_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

tcpm_source_psy_4_0022-i2c-4-22
Adapter: rk3x-i2c
in0: 20.00 V (min = +20.00 V, max = +20.00 V)
curr1: 3.00 A (max = +3.00 A)

npu_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

center_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

bigcore1_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

soc_thermal-virtual-0
Adapter: Virtual device
temp1: +31.5°C (crit = +115.0°C)

admin@rock-5b:~$

I loaded the latest debian images on the eMMC and booted it.
This is the output of sensors:

root@rock-5b:~# sensors
gpu_thermal-virtual-0
Adapter: Virtual device
temp1: +33.3°C

littlecore_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C

bigcore0_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C

tcpm_source_psy_4_0022-i2c-4-22
Adapter: rk3x-i2c
in0: 20.00 V (min = +20.00 V, max = +20.00 V)
curr1: 2.25 A (max = +2.25 A)

npu_thermal-virtual-0
Adapter: Virtual device
temp1: +33.3°C

center_thermal-virtual-0
Adapter: Virtual device
temp1: +33.3°C

bigcore1_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C

soc_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C (crit = +115.0°C)

root@rock-5b:~#

Crashed when running

stress-ng --class=memory --all=0 --timeout=120s --metrics

No interesting output on serial port.

I’ve lowered the maximum frequency to 528MHz and it’s still crashing.
No kernel stack trace with the latest image.
It just freeze or reboot.

With the latest image ubuntu server (rock-5b-ubuntu-focal-server-arm64-20221105-1012-gpt.img), it exhibit same type of crashes.
See attached minicom capture file for content.
minicom-20221106133900.zip (10.4 KB)

@jack Any idea there ? I’m willing to try to debug but the system is completely unusable at the moment.
Most of he time, ti does not even gets to the linux login prompt.

Got a new board thanks to @NBA and this one seems to be working fine after a quick stress-ng test…