Radxa Rock 5b Hangs multiple times a week

Hi All, I have a new problem with my Rock 5b. Unfortunately, a few months ago it started hanging randomly a few times a week - with kernel oops (or similar messages) being displayed on the screen (can take a photo next time it occurs). Initially, I thought it was caused by overheating, but even after installing a large 120 mm fan and removing most of the workload from it (was using it as NVR, and now am doing only object detection using an M2 coral TPU) - the problem persists. I would appreciate if anyone could help me troubleshoot the issue. Iā€™m suspecting that it might be a hardware problem. Hereā€™s an overview of the last time it happened.

This seems to be something related to kernel, many builds seems to be stable, but they drop kernel panic on long run. What version do You have now?
Check out dmesg just before system died - was there anything suspicious? Anything related to failing disk?

1 Like

Thanks for your reply, Dominik. I havenā€™t seen anything related to storage in dmesg, but there are a few rockchip-vop2 errors (see below). The system has been running for 6 days since the last reboot, and the kernel version is: # uname -r
5.10.110-37-rockchip-g74457be0716d

dmesg excerpt:
[596859.841774] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: hdptx phy lane locked!
[596899.087413] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596899.114522] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596912.420690] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596912.426671] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596992.887408] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597335.204028] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597335.209893] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597503.170689] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597503.176416] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597662.070720] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597662.082772] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597799.191309] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_disable] Crtc atomic disable vp0
[597799.797959] dwhdmi-rockchip fde80000.hdmi: use tmds mode
[597799.798308] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_enable] Update mode to 3840x2160p60, type: 11(if:800) for vp0 dclk: 594000000
[597799.798923] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: hdptx_ropll_cmn_config bus_width:2d5190 rate:2970000
[597799.799226] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: hdptx phy pll locked!
[597799.807413] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_enable] dclk_out0 div: 0 dclk_core0 div: 1
[597799.807551] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_enable] set dclk_vop0 to 297000000, get 297000000
[597799.807611] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597799.815312] dwhdmi-rockchip fde80000.hdmi: final tmdsclk = 297000000
[597799.817453] dwhdmi-rockchip fde80000.hdmi: i2c read err!
[597799.823459] dwhdmi-rockchip fde80000.hdmi: don't use dsc mode
[597799.823466] dwhdmi-rockchip fde80000.hdmi: dw hdmi qp use tmds mode
[597799.823481] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: bus_width:0x2d5190,bit_rate:2970000
[597799.823698] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: hdptx phy lane locked!
[598287.757357] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_disable] Crtc atomic disable vp0
[598288.554690] dwhdmi-rockchip fde80000.hdmi: use tmds mode
[598288.554931] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_enable] Update mode to 3840x2160p60, type: 11(if:800) for vp0 dclk: 594000000
[598288.555923] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: hdptx_ropll_cmn_config bus_width:2d5190 rate:2970000
[598288.556337] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: hdptx phy pll locked!
[598288.582444] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_enable] dclk_out0 div: 0 dclk_core0 div: 1
[598288.583932] rockchip-vop2 fdd90000.vop: [drm:vop2_crtc_atomic_enable] set dclk_vop0 to 297000000, get 297000000
[598288.584068] dwhdmi-rockchip fde80000.hdmi: final tmdsclk = 297000000
[598288.587931] dwhdmi-rockchip fde80000.hdmi: don't use dsc mode
[598288.587937] dwhdmi-rockchip fde80000.hdmi: dw hdmi qp use tmds mode
[598288.587951] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: bus_width:0x2d5190,bit_rate:2970000
[598288.588167] rockchip-hdptx-phy-hdmi fed60000.hdmiphy: hdptx phy lane locked!

dmesg | egrep -i ā€˜error|failā€™

expad to see the output
[    2.757347] OF: fdt: Reserved memory: failed to reserve memory for node 'drm-logo@00000000': base 0x0000000000000000, size 0 MiB
[    2.757364] OF: fdt: Reserved memory: failed to reserve memory for node 'drm-cubic-lut@00000000': base 0x0000000000000000, size 0 MiB
[    3.703831] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up vdpu-supply property in node /power-management@fd8d8000/power-controller failed
[    3.704085] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up rga30-supply property in node /power-management@fd8d8000/power-controller failed
[    3.704258] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up rga31-supply property in node /power-management@fd8d8000/power-controller failed
[    3.705453] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up venc0-supply property in node /power-management@fd8d8000/power-controller failed
[    3.705605] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up vcodec-supply property in node /power-management@fd8d8000/power-controller failed
[    3.705709] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up venc1-supply property in node /power-management@fd8d8000/power-controller failed
[    3.705978] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up rkvdec0-supply property in node /power-management@fd8d8000/power-controller failed
[    3.706251] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up rkvdec1-supply property in node /power-management@fd8d8000/power-controller failed
[    4.189678] phy phy-fd5d0000.syscon:usb2-phy@0.0: Looking up phy-supply property in node /syscon@fd5d0000/usb2-phy@0/otg-port failed
[    4.189727] phy phy-fd5d0000.syscon:usb2-phy@0.0: Looking up vbus-supply property in node /syscon@fd5d0000/usb2-phy@0/otg-port failed
[    4.190960] phy phy-fd5d8000.syscon:usb2-phy@8000.1: Looking up phy-supply property in node /syscon@fd5d8000/usb2-phy@8000/host-port failed
[    4.192022] phy phy-fd5dc000.syscon:usb2-phy@c000.2: Looking up phy-supply property in node /syscon@fd5dc000/usb2-phy@c000/host-port failed
[    4.193213] phy phy-fd5d4000.syscon:usb2-phy@4000.3: Looking up phy-supply property in node /syscon@fd5d4000/usb2-phy@4000/otg-port failed
[    4.193252] phy phy-fd5d4000.syscon:usb2-phy@4000.3: Looking up vbus-supply property in node /syscon@fd5d4000/usb2-phy@4000/otg-port failed
[    4.194899] phy phy-fee00000.phy.4: Looking up phy-supply property in node /phy@fee00000 failed
[    4.195023] phy phy-fee20000.phy.5: Looking up phy-supply property in node /phy@fee20000 failed
[    4.195194] phy phy-fee10000.phy.6: Looking up phy-supply property in node /phy@fee10000 failed
[    4.195786] phy phy-fed60000.hdmiphy.7: Looking up phy-supply property in node /hdmiphy@fed60000 failed
[    4.196659] phy phy-fed70000.hdmiphy.8: Looking up phy-supply property in node /hdmiphy@fed70000 failed
[    4.197588] phy phy-fee80000.phy.9: Looking up phy-supply property in node /phy@fee80000 failed
[    4.198215] phy phy-fed80000.phy.10: Looking up phy-supply property in node /phy@fed80000/dp-port failed
[    4.198248] phy phy-fed80000.phy.11: Looking up phy-supply property in node /phy@fed80000/u3-port failed
[    4.198484] phy phy-fed90000.phy.12: Looking up phy-supply property in node /phy@fed90000/u3-port failed
[    4.204454] mpp-iep2 fdbb0000.iep: allocate roi buffer failed
[    4.206007] mpp_rkvdec2 fdc38100.rkvdec-core: Looking up vdec-supply property in node /rkvdec-core@fdc38000 failed
[    4.206657] mpp_rkvdec2 fdc48100.rkvdec-core: Looking up vdec-supply property in node /rkvdec-core@fdc48000 failed
[    4.208375] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up av1-supply property in node /power-management@fd8d8000/power-controller failed
[    4.230101] [drm] failed to init overlay plane Cluster0-win1
[    4.230104] [drm] failed to init overlay plane Cluster1-win1
[    4.230106] [drm] failed to init overlay plane Cluster2-win1
[    4.230108] [drm] failed to init overlay plane Cluster3-win1
[    4.243485] rockchip-drm display-subsystem: failed to parse loader memory
[    4.546684] pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x40000000]
[    4.546715] pci 0000:00:00.0: BAR 1: failed to assign [mem size 0x40000000]
[    4.863997] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up usb-supply property in node /power-management@fd8d8000/power-controller failed
[    5.148541] dwmmc_rockchip fe2d0000.mmc: Looking up vmmc-supply property in node /mmc@fe2d0000 failed
[    5.149344] sdhci-dwcmshc fe2e0000.mmc: Looking up vmmc-supply property in node /mmc@fe2e0000 failed
[    5.149918] arm-scmi firmware:scmi: Failed. SCMI protocol 17 not active.
[    5.149960] SMCCC: SOC_ID: ARCH_FEATURES(ARCH_SOC_ID) returned error: fffffffffffffffd
[    5.150571] dwmmc_rockchip fe2d0000.mmc: Looking up vqmmc-supply property in node /mmc@fe2d0000 failed
[    5.153545] sdhci-dwcmshc fe2e0000.mmc: Looking up vqmmc-supply property in node /mmc@fe2e0000 failed
[    5.157241] optee: probe of firmware:optee failed with error -22
[    5.163631] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up audio-supply property in node /power-management@fd8d8000/power-controller failed
[    5.204282] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up gpu-supply property in node /power-management@fd8d8000/power-controller failed
[    5.204937] vdd_gpu_s0: Failed to create debugfs directory
[    5.204977] vdd_gpu_s0: Failed to create debugfs directory
[    5.205180] vdd_gpu_s0: Failed to create debugfs directory
[    5.206926] vdd_ddr_s0: Failed to create debugfs directory
[    5.207105] vdd_log_s0: Failed to create debugfs directory
[    5.208172] rockchip-dmc dmc: failed to get vop bandwidth to dmc rate
[    5.208176] rockchip-dmc dmc: failed to get vop pn to msch rl
[    5.211851] mali fb000000.gpu: Error -22, no DT entry: mali-simple-power-model.static-coefficient = 1*[0]
[    5.212048] mali fb000000.gpu: Error -22, no DT entry: mali-simple-power-model.dynamic-coefficient = 1*[0]
[    5.212236] mali fb000000.gpu: Error -22, no DT entry: mali-simple-power-model.ts = 4*[0]
[    5.212425] mali fb000000.gpu: Error -22, no DT entry: mali-simple-power-model.thermal-zone = ''
[    5.362198] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up gmac-supply property in node /power-management@fd8d8000/power-controller failed
[    5.362353] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up npu-supply property in node /power-management@fd8d8000/power-controller failed
[    5.373305] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up nputop-supply property in node /power-management@fd8d8000/power-controller failed
[    5.373430] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up npu1-supply property in node /power-management@fd8d8000/power-controller failed
[    5.373523] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up npu2-supply property in node /power-management@fd8d8000/power-controller failed
[    5.373702] vdd_npu_s0: Failed to create debugfs directory
[    5.374192] vdd_npu_s0: Failed to create debugfs directory
[    5.374817] vdd_npu_s0: Failed to create debugfs directory
[    5.400118] RKNPU fdab0000.npu: failed to find power_model node
[    5.400129] RKNPU fdab0000.npu: RKNPU: failed to initialize power model
[    5.400138] RKNPU fdab0000.npu: RKNPU: failed to get dynamic-coefficient
[    5.426007] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    5.426018] cfg80211: failed to load regulatory.db
[    5.426856] rockchip-pm-domain fd8d8000.power-management:power-controller: Looking up sdio-supply property in node /power-management@fd8d8000/power-controller failed
[394376.097012] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394470.263683] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394470.264147] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394470.264452] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394470.264752] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394470.265067] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394525.713627] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394528.797051] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394528.797937] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394528.798625] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[394871.861768] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.229233] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.229508] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.229682] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.229828] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.229992] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.230153] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.230308] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.230468] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[506845.230606] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[518533.745824] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[518533.746165] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[518533.746370] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[518533.746574] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[518533.746784] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[518533.746983] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[518533.747180] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[534497.635209] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[534497.635931] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[542792.351843] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[542792.352668] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[547840.707891] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596899.087413] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596899.114522] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596912.420690] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596912.426671] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[596992.887408] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597335.204028] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597335.209893] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597503.170689] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597503.176416] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597662.070720] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597662.082772] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0
[597799.807611] rockchip-vop2 fdd90000.vop: [drm:vop2_isr] *ERROR* POST_BUF_EMPTY irq err at vp0

Iā€™ve alos updated to 5.10.110-37-rockchip-g74457be0716d after runing the above - just noticed there was an update available.

This is functionality connected with hdmi/display features, Iā€™m not sure if this is the case, but You can always try to change monitor/cable or maybe go headless to see if problem occur again.

Hi Dominik, the issues persisted long before I connected the HDMI output. The only reason I connected it is to see the display output when the issue occurs. Now again the board is offline, however, Iā€™m out of the countryā€¦ Will take a photo of screen output and post here once I get home.

Maybe it is some corruption? Did you try ā€˜sudo touch /forcefsckā€™ and reboot? You could also try lowering ram frequency ā€˜echo 1560000000 | sudo tee /sys/class/devfreq/dmc/max_freqā€™

1 Like

Thanks for the tips. Running an fsck now, once itā€™s done will lower the frequency. Why do you think it could be either of the two? Hereā€™s the trace that I see on the screen. Probably not that usefulā€¦

1 Like

Hi Vadim, good luck! Idk, experience, intuition and fortunes telling ;D I think those two cover many szenariosā€¦

Thanks, mate. Iā€™ll give it a try.

Unfortunately, this morning - the system crashed with the same symptoms again :frowning:
There is one more thing that I didnā€™t try: Iā€™m using a google coral TPU on the board for object detection, and maybe it has something to do with the issues. Iā€™ll try removing it tonight and run the inference on the built-in NPUā€¦

2 Likes

Update - after moving the workloads to the build-in NPU, the overall temps decreased, however, the issues persisted. I am giving it a last try with another power supply. Moved it to a dedicated 65W GaN2 power supply, let see if it helps.

Was coral that hot? have You used basic package or max version?

Not sure what you mean by max version or basic package. Iā€™ll try to give more detail below:

The card is a dual TPU coral M2 card, however, only one of the TPUs is detected by the OS. Probably due to only one lane being available. I tried running it in this metal case and the card would throttle as the temperature would exceed 80-something degree Celsius after a few seconds of operation. I added a heat sync to the card and removed the plastic front and back panels to allow for air circulation. The card would operate at around 72-75C Better, but not ideal. I bought a 120mm fan, and added it to the setup, blowing at an angle at the whole case (cooling the RK3588 SoC) but also penetrating inside to cool the TPUā€™s smaller heat sync. The fan would turn on when the SOC would report 60C or above, and turn it off when the temperature would drop to 40C.

After removing the TPU, and using the built-in NPU (much better results on CodeOwners.AI by the way), the SoC temperature never went above 59C (mostly running at around 55-56C), even though Iā€™m continuously running object detection (multiple times a second) from multiple cameras around my home.

Also, FYI, after changing the power supply I havenā€™t seen it hang, but it might be just a coincidence - Iā€™ll keep monitoring it.

From our test, the Radxa Metal Case is enough for the heat dispatching at full load at room temperature. The surface of the metal case is hot(~ 60C), but the ROCK 5B will not hang or reboot.

2 Likes

And it should throttle if needed to stay stable.
I think that this problem was related to coral somehow, itā€™s getting quite hot on heavy workloads, especially with max package, as far as I remember it will throttle at about 115ā€™C. According to specs it needs cooling from both sides to move heat from IC as well as its NPU.

Basically there are two libs for it, std and max, second one will use higher frequencies and throttle at higher temp. Also If You have dual edge tpu You don;t have that small heatsink.
If You used that in passive case then itā€™s clear why it added heat. Especially on high workloads when all resources are used. If You also added nvme on bottom then itā€™s another source of high temp, this depends on particular m.2 board.

I learned something about RAM on one of my (8gen) NUC, it was stable with 32GB RAM, but adding another 32GB caused instability. Then I needed to get it out of rack and tested on table, it was ok. So I needed to find what is wrong, quick memtest was always ok, but when I left that for night it showed some errors. faulty RAM? I switched them and same result. Then I placed big fan on top of that, night test passed. So it was stable when it was cold but on workloads after some time lower RAM module has switched several bits. Proper cooling solved issue and same unit work about 6 months with no issues.
If board is warm retest everything with big fan blowing directly at everything. If this solves issues then maybe on long workloads there is something related to temperature. This is especially true for passive cases where many are designed for mostly idle usage. Of course software, kernel is something to keep in mind too. Good luck :slight_smile:

Hello Mr. Vadim! I just had a different stability issue with my 32GB Rock. I had to patch new rkbins into my bootimage to get it going stable:
1: https://github.com/rockchip-linux/rkbin/blob/master/bin/rk35/rk3588_bl31_v1.45.elf
2: https://github.com/rockchip-linux/rkbin/blob/master/bin/rk35/rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.16.bin

With the help of inindevs great u-boot repo:

(just wget the files into rkbin dir and change makefile accordingly)
I also use Inindevs image idk if itā€™s compatible or not but maybe you can adapt it smartly to your image :smiley:

Before I coudnā€™t allocate any bigger size of ram but now everything butter. (I changed main server with 32 GB and everything went nuts.)
Happy Easter maybe it helps

Edit: and if you try it, I guess itā€™s best to disconnect power one time on reboot. Maybe Its running slower without your coral and therefore more stable. Who knows :smiley:

Thanks for your comments and suggestions everyone. I wanted to give you an update that I havenā€™t seen stability issues after changing the power supply, so for now Iā€™ll refrain from using compiled binaries from a random third party repo. If the issue will return Iā€™ll try looking into inindevā€™s work to try understand if itā€™s safe to use his binaries.

As an experiment, I installed the coral TPU back on the board, but am not using it for inference. Just to test if the accelerator card wasnā€™t drawing too much current. After a few weeks Iā€™ll start using it for inference, too - to continue testing. Frankly, I prefer the Rock NPU - it runs much cooler and the YOLOv5 model gives much better results than any of the coral.ai models - so once Iā€™m done testing Iā€™ll keep using the NPUā€¦ Will keep you posted.

1 Like

maybe you can provide a link howto compile an uboot according to this repo ?
Iā€˜m not that deep in the topic of compiling uboot on my own.

2 Likes

Hi There,

you just have to download the non-random-non-third-party-rockchip binaries to your computer and use the script from inindev to download and compile uboot to make your life easier. So something like this:

apt install git
git clone -d 1 $REPOURL
cd uboot-rockship/rknn
wget file1
wget file2
nano ā€¦/Makefile -> Change the path for the two files
make

The script will tell you exactly what to do to flash the images and if you need to install more packages.

Oh and if it asks for git username itā€™s because itā€™s patching locally just insert 1:1 the example text ā€œYour Usernameā€ etc. It wont upload anything. But just check the script first before using like everything if possibe :smiley:

Edit: here are some snippets:
In Makefile

RK3588_ATF := ā€¦/rkbin/rk3588_bl31_v1.45.elf
RK3588_TPL := ā€¦/rkbin/rk3588_ddr_lp4_2112MHz_lp5_2400MHz_v1.16.bin
ā€¦
TARGETS := target_rock-5b

I had to install these:

apt -y install screen bc libssl-dev python3-pyelftools python3-setuptools swig

git config --global user.email ā€œyou@example.comā€
git config --global user.name ā€œYour Nameā€

mv rock-5b_idbloader.img idbloader.img
mv rock-5b_u-boot.itb u-boot.itb

Good luck!