Rock Pi 4 B Randomly shutting off

Azerai · February 9, 2020, 5:18pm

Hello,

I have a Rock Pi 4 B with a PoE Hat and an M.2 adapter, functioning as a sort of NAS for the attached 2TB Intel 660p. Currently it is running on the Debian OS from the RockPi downloads page, I have updated the bootloader, as well.

After a number of hours of being online, it will randomly go offline and stop functioning (I am unable to access my samba share). I have observed this same behavior when running Ubuntu server (also from the RockPi downloads page) as well. LEDs are still on the device when it is “offline”, with the network lights and the green power light being on, but no flashing blue light. Last night I created a simple python script to write the system time to a log file every 10 seconds in order to try and observe when this device may be going offline. Here is the output: https://pastebin.com/HFLE8dfu

You can ignore line 5582 and onward as this was created when I turned the device back on after it went “offline”. It appears to have gone nonfunctional at line 5581, which you can see for whatever reason it logged a bunch of NUL characters and then wrote a heartbeat ping from the wrong time (13:27:11 2020 UTC, when the next log write should have been 14:24:25)

What could this mean? Is there something wrong with the clock or board? How else can I troubleshoot to find why it may be randomly turning off and require me to pull the PoE ethernet cable out and put it back in to get it to boot back up?

igorp · February 9, 2020, 6:14pm

BTW. They are identical in hardware support. Alternative with different support is this https://www.armbian.com/rock-pi-4/

Overheating? Board has a safety switch to power off if certain temp is reached. Can you check temperatures?

Azerai · February 9, 2020, 6:21pm

What is the best way to check and write CPU temps to a log? Lm-sensors does not detect any sensors: https://pastebin.com/kJy44Rww

igorp · February 9, 2020, 7:13pm

There is no such luxury yet … but on Armbian you have it this way:

armbianmonitor -m

Stop monitoring using [ctrl]-[c]
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU
12:00:13:  912MHz  0.00   0%   0%   0%   0%   0%   0%   66°C
12:00:18:  240MHz  0.00   0%   0%   0%   0%   0%   0%   66°C
12:00:23:  240MHz  0.00   0%   0%   0%   0%   0%   0%   67°C
12:00:28:  240MHz  0.00   0%   0%   0%   0%   0%   0%   66°C
12:00:34:  240MHz  0.00   0%   0%   0%   0%   0%   0%   67°C
12:00:39:  240MHz  0.00   0%   0%   0%   0%   0%   0%   67°C
12:00:44:  240MHz  0.00   0%   0%   0%   0%   0%   0%   66°C
12:00:49:  240MHz  0.00   0%   0%   0%   0%   0%   0%   67°C

Azerai · February 9, 2020, 7:16pm

Found by checking the contents of /sys/class/thermal/thermal_zone0/temp and /sys/class/thermal/thermal_zone1/temp on Debian. Temps look okay for now, but I will see what happens when it next becomes nonfunctional: https://pastebin.com/GTaaS2Ph

foobar · February 10, 2020, 8:36am

Mine has kernel panics from time to time. Maybe that is the cause? Do you have an option to attach a serial cable and monitor debug output?

Dante4 · February 10, 2020, 8:11pm

Btw, which hardware revision you use?

martin · February 11, 2020, 1:57am

It could be another reason for stopping such as static electricity. Or a short between the board and a heatsink.

lbdroidman · February 12, 2020, 3:26pm

Do what @foobar said; hook up a UART and see if the kernel says anything when it dies. Since it spits out a weird time, something is definitely happening besides simply stopping.

Denyska · April 4, 2020, 8:37pm

I have same topic in the past, there was problem in wifi module, for me was solution blacklist bcmdhd

https://forum.radxa.com/t/ubuntu-server-becomes-unresponsive-every-few-days-rockpi4b-4gb-bcmsdh-sdmmc/981/8

Azerai · April 4, 2020, 8:37pm

Thank you to everyone who posted solutions.

My Rock Pi has now been up and running for just over 6 days, so although I fear I’m jinxing myself, the issue appears to have been fixed by disabling the WiFi module using blacklist bcmdhd