Rock5B RMA procedure?

PloPli · November 4, 2022, 12:55pm

Hi,

I got two Rock5B from AllNet China but it looks like one of the two has some HW problem.
It randomly crash with kernel stack traces after a few minutes running any stress test suite like stress-ng (both on the radxa produced images and on armbian).

The other works fine.

Anyone has tried to RMA their device ?

Paul

icecream95 · November 5, 2022, 4:55am

The most common cause of an unstable system like this has been memory reads occasionally returning corrupted data.

It is apparently fixed by newer DDR initialization code, but lowering the maximum frequency might workaround the problem, or at least can be used to test whether you are affected by this problem:

$ echo 1560000000 | sudo tee /sys/class/devfreq/dmc/max_freq

jack · November 5, 2022, 9:35am

Please update to the latest image with newer rkbin blob. Also, check the power supply voltage with sensors command.

PloPli · November 6, 2022, 11:11am

Where can I find the latest image ?
https://github.com/radxa/debos-radxa/releases/download/20221031-1045/rock-5b-debian-bullseye-xfce4-arm64-20221031-1558-gpt.img.xz ?

I’ll report the sensors output if I manage to boot it up without crashing.
This is the sensors output on the other board, same power supply:

admin@rock-5b:~$ sensors
gpu_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

littlecore_thermal-virtual-0
Adapter: Virtual device
temp1: +31.5°C

bigcore0_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

tcpm_source_psy_4_0022-i2c-4-22
Adapter: rk3x-i2c
in0: 20.00 V (min = +20.00 V, max = +20.00 V)
curr1: 3.00 A (max = +3.00 A)

npu_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

center_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

bigcore1_thermal-virtual-0
Adapter: Virtual device
temp1: +30.5°C

soc_thermal-virtual-0
Adapter: Virtual device
temp1: +31.5°C (crit = +115.0°C)

admin@rock-5b:~$

PloPli · November 6, 2022, 11:12am

I loaded the latest debian images on the eMMC and booted it.
This is the output of sensors:

root@rock-5b:~# sensors
gpu_thermal-virtual-0
Adapter: Virtual device
temp1: +33.3°C

littlecore_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C

bigcore0_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C

tcpm_source_psy_4_0022-i2c-4-22
Adapter: rk3x-i2c
in0: 20.00 V (min = +20.00 V, max = +20.00 V)
curr1: 2.25 A (max = +2.25 A)

npu_thermal-virtual-0
Adapter: Virtual device
temp1: +33.3°C

center_thermal-virtual-0
Adapter: Virtual device
temp1: +33.3°C

bigcore1_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C

soc_thermal-virtual-0
Adapter: Virtual device
temp1: +34.2°C (crit = +115.0°C)

root@rock-5b:~#

Crashed when running

stress-ng --class=memory --all=0 --timeout=120s --metrics

No interesting output on serial port.

PloPli · November 6, 2022, 11:09am

I’ve lowered the maximum frequency to 528MHz and it’s still crashing.
No kernel stack trace with the latest image.
It just freeze or reboot.

PloPli · November 6, 2022, 1:45pm

With the latest image ubuntu server (rock-5b-ubuntu-focal-server-arm64-20221105-1012-gpt.img), it exhibit same type of crashes.
See attached minicom capture file for content.
minicom-20221106133900.zip (10.4 KB)

PloPli · November 12, 2022, 10:54am

@jack Any idea there ? I’m willing to try to debug but the system is completely unusable at the moment.
Most of he time, ti does not even gets to the linux login prompt.

PloPli · November 24, 2022, 3:12pm

Got a new board thanks to @NBA and this one seems to be working fine after a quick stress-ng test…

Thanador_Lx · December 8, 2022, 12:47am

It seems some boards have issues… My worked for three weeks while doing some reboots at random. After that it stopped to boot at all…
Got a response from raxda support with a suggestion to try maskrom boot, but the board is not seen on my windows os… Now I am looking for a way to get a replacement… Where did you file for the RMA? The site allnet china has no support section…

PloPli · December 8, 2022, 3:48pm

I was contacted by womeone from allnet.cn after complaining loudly on this site.
I first contacted them on their email address without success.
I also emailed allnet.de but it did not prove more successful.

Emmanuel_Nyachoke · January 21, 2023, 5:21pm

I also bought 3 of them, and one of them is very problematic. It is randomly crashing. I have been thinking it is an issue with the power supply as I am using PD power supplies. I got some stability with the Raspberry PI power supply but even that crashes after an hour or two. I am hoping to get hold of a 12v 3A power supply with a type c connector but its proving difficult to get in locally.

eysteinh · January 22, 2023, 3:25pm

Have one (I got 11 out of 12 working) I can’t get to boot out of 12 ordered. 2 months so far no success. From https://shop.allnetchina.cn/.

Emmanuel_Nyachoke · January 23, 2023, 4:12pm

Thanks, @icecream95 for pointing in this direction. By setting echo 1068000000 | sudo tee /sys/class/devfreq/dmc/max_freq on one of the boards I have managed to get it running without crashing for 5 hours. Which I have not managed on any of the boards. I have done this on the other 2 and will report on how it goes.

Btw I had to give up a bit of performance but if it works I can live with that https://browser.geekbench.com/v5/cpu/compare/20044916?baseline=20043344. @jack hopefully this is something that can be fixed in the future.