Bad Memory modules on RockPi 4?

Hi All,

I have a RockPi 4B (4G of RAM) with the Penta SATA kit and I got everything setup no problem and was excited for this neat little NAS. Soon, however, it began locking up and freezing at seemingly random times. It was hard to catch these freezes because I’m running it headless, but a few times I was ssh’d in and saw some things like “Internal error: Oops: 96000004 [#1] PREEMPT SMP” dumped right before the lockup.

To make a long story short I ended up suspecting faulty hardware (most likely the memory). This suspicion came from the following tests:

  • Identical lockups occurred with 4 different operating systems: Armbian Buster, Armbian Focal, and the official Radxa images for Debian and Ubuntu.
  • The lockups occurred with many different physical eMMC modules
  • Stress testing the CPU did not seem to increase the frequency of lockups
  • The linux utility memtester showed many many faults across a wide range of memory sizes from 2G down to 16M.
  • The memtester faults occurred regardless of whether I was testing on the bare board or with it attached to the SATA HAT.
  • I have a much older RockPi 4B that has been in use for another project (no SATA HAT) nonstop for months and months. I ran memtester on it as a comparison and found no flaults after hours of running.

Really it was the huge number of memtester faults that lead to my next test: I bought a copy of basically every part of the Penta SATA kit except for the case. Instead of a RockPi 4B I got a 4A this time, but still with 4G of memory. Because I was paranoid, the first thing I did was install a new OS (Armbian Focal with kernel 5.10) onto a newly purchased eMMC and ran memtester. This was before I had attached any part of the SATA HAT. It ran for a very long time with no faults (so better that the first board!), but still had one fault on a 1G memory test. In any case I forged ahead.

The whole setup has been running much smoother than the first board, but I do get a very similar lockup every few days or so (even if the computer is just sitting there doing nothing). For whatever reason when the system locks up now I first notice it because the display on top of the HAT freezes. At this point I can still ssh in and grab some logs with journalctl, but if I attempt to do basically anything else (including restart) the whole system locks up. This gives way more information that the first board, though, and maybe has some clue. Here’s an example:

Jul 05 06:27:30 HOST kernel: Unable to handle kernel paging request at virtual address 000000002f00d980
Jul 05 06:27:30 HOST kernel: Mem abort info:
Jul 05 06:27:30 HOST kernel:   ESR = 0x96000004
Jul 05 06:27:30 HOST kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 05 06:27:30 HOST kernel:   SET = 0, FnV = 0
Jul 05 06:27:30 HOST kernel:   EA = 0, S1PTW = 0
Jul 05 06:27:30 HOST kernel: Data abort info:
Jul 05 06:27:30 HOST kernel:   ISV = 0, ISS = 0x00000004
Jul 05 06:27:30 HOST kernel:   CM = 0, WnR = 0
Jul 05 06:27:30 HOST kernel: user pgtable: 4k pages, 48-bit VAs, pgdp=000000002b65f000
Jul 05 06:27:30 HOST kernel: [000000002f00d980] pgd=0000000000000000, p4d=0000000000000000
Jul 05 06:27:30 HOST kernel: Internal error: Oops: 96000004 [#1] PREEMPT SMP
Jul 05 06:27:30 HOST kernel: Modules linked in: zram xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge aufs rfkill governor_performance zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) snd_soc_hdmi_codec snd_soc_audio_graph_card panfrost snd_soc_simple_card snd_soc_simple_card_utils gpu_sched dw_hdmi_cec snd_soc_rockchip_i2s dw_hdmi_i2s_audio hantro_vpu(C) rockchip_vdec(C) v4l2_h264 snd_soc_es8316 rockchip_rga videobuf2_dma_contig v4l2_mem2mem videobuf2_dma_sg snd_soc_core videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_pcm_dmaengine videobuf2_common snd_pcm snd_timer videodev sg snd mc soundcore cpufreq_dt sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod realtek
Jul 05 06:27:30 HOST kernel:  rockchipdrm analogix_dp dw_hdmi dw_mipi_dsi dwmac_rk stmmac_platform drm_kms_helper stmmac pcs_xpcs cec rc_core drm drm_panel_orientation_quirks
Jul 05 06:27:30 HOST kernel: CPU: 3 PID: 3809147 Comm: python3 Tainted: P         C OE     5.10.43-rockchip64 #21.05.4
Jul 05 06:27:30 HOST kernel: Hardware name: Radxa ROCK Pi 4A (DT)
Jul 05 06:27:30 HOST kernel: pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
Jul 05 06:27:30 HOST kernel: pc : lock_page_memcg+0x34/0xc0
Jul 05 06:27:30 HOST kernel: lr : lock_page_memcg+0x28/0xc0
Jul 05 06:27:30 HOST kernel: sp : ffff800029e238c0
Jul 05 06:27:30 HOST kernel: x29: ffff800029e238c0 x28: 00e000003b16dbc3 
Jul 05 06:27:30 HOST kernel: x27: 0000000000000000 x26: ffff800029e23b08 
Jul 05 06:27:30 HOST kernel: x25: 0000ffff927ae000 x24: ffff0000344a43e8 
Jul 05 06:27:30 HOST kernel: x23: fffffe0000cc5b40 x22: 0000ffff927ad000 
Jul 05 06:27:30 HOST kernel: x21: ffff800029e239f8 x20: fffffe0000cc5b40 
Jul 05 06:27:30 HOST kernel: x19: ff0000002f00d000 x18: 0000000000000000 
Jul 05 06:27:30 HOST kernel: x17: 0000000000000000 x16: 0000000000000000
Jul 05 06:27:30 HOST kernel: x15: 0000000000000001 x14: 0000000000000002 
Jul 05 06:27:30 HOST kernel: x13: 000000000004111a x12: 0000000000000018 
Jul 05 06:27:30 HOST kernel: x11: 0101010101010101 x10: ffff8000e6211000 
Jul 05 06:27:30 HOST kernel: x9 : ffff0000f77966e0 x8 : 00000000000001ff 
Jul 05 06:27:30 HOST kernel: x7 : ffff800029e238c0 x6 : ffff0000f77966f0 
Jul 05 06:27:30 HOST kernel: x5 : 0000ffff927ad000 x4 : ffff0000344a43e8 
Jul 05 06:27:30 HOST kernel: x3 : 00000000fffffecb x2 : 0000000000000001 
Jul 05 06:27:30 HOST kernel: x1 : ffff0000010b4880 x0 : 0000000000000001 
Jul 05 06:27:30 HOST kernel: Call trace:
Jul 05 06:27:30 HOST kernel:  lock_page_memcg+0x34/0xc0
Jul 05 06:27:30 HOST kernel:  page_remove_rmap+0x1c/0x568
Jul 05 06:27:30 HOST kernel:  unmap_page_range+0x56c/0x848
Jul 05 06:27:30 HOST kernel:  unmap_single_vma+0x88/0x100
Jul 05 06:27:30 HOST kernel:  unmap_vmas+0xdc/0x100
Jul 05 06:27:30 HOST kernel:  exit_mmap+0xd4/0x188
Jul 05 06:27:30 HOST kernel:  mmput+0x7c/0x160
Jul 05 06:27:30 HOST kernel:  begin_new_exec+0x2d4/0xa60
Jul 05 06:27:30 HOST kernel:  load_elf_binary+0x73c/0x1800
Jul 05 06:27:30 HOST kernel:  bprm_execve+0x28c/0x638
Jul 05 06:27:30 HOST kernel:  do_execveat_common.isra.48+0x1a8/0x1c8
Jul 05 06:27:30 HOST kernel:  __arm64_sys_execve+0x40/0x58
Jul 05 06:27:30 HOST kernel:  el0_svc_common.constprop.2+0x8c/0x190
Jul 05 06:27:30 HOST kernel:  do_el0_svc+0x24/0x90
Jul 05 06:27:30 HOST kernel:  el0_svc+0x14/0x20
Jul 05 06:27:30 HOST kernel:  el0_sync_handler+0x90/0xb8
Jul 05 06:27:30 HOST kernel:  el0_sync+0x160/0x180
Jul 05 06:27:30 HOST kernel: Code: 97f92203 d503201f f9401e93 b40002d3 (b9498260) 
Jul 05 06:27:30 HOST kernel: ---[ end trace 3474353eefa9fd6b ]---
Jul 05 06:27:30 HOST kernel: note: python3[3809147] exited with preempt_count 1
Jul 05 06:27:30 HOST python3[5056]: Process Process-3:
Jul 05 06:27:30 HOST python3[5056]: Traceback (most recent call last):
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
Jul 05 06:27:30 HOST python3[5056]:     self.run()
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
Jul 05 06:27:30 HOST python3[5056]:     self._target(*self._args, **self._kwargs)
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/bin/rockpi-penta/oled.py", line 108, in auto_slider
Jul 05 06:27:30 HOST python3[5056]:     slider(lock)
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/bin/rockpi-penta/oled.py", line 101, in slider
Jul 05 06:27:30 HOST python3[5056]:     for item in misc.slider_next(gen_pages()):
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/bin/rockpi-penta/oled.py", line 87, in gen_pages
Jul 05 06:27:30 HOST python3[5056]:     {'xy': (0, 21), 'text': misc.get_info('ip'), 'fill': 255, 'font': font['11']},
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/bin/rockpi-penta/misc.py", line 48, in get_info
Jul 05 06:27:30 HOST python3[5056]:     return check_output(cmds[s])
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/bin/rockpi-penta/misc.py", line 36, in check_output
Jul 05 06:27:30 HOST python3[5056]:     return subprocess.check_output(cmd, shell=True).decode().strip()
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
Jul 05 06:27:30 HOST python3[5056]:     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
Jul 05 06:27:30 HOST python3[5056]:   File "/usr/lib/python3.8/subprocess.py", line 512, in run
Jul 05 06:27:30 HOST python3[5056]:     raise CalledProcessError(retcode, process.args,
Jul 05 06:27:30 HOST python3[5056]: subprocess.CalledProcessError: Command 'hostname -I | awk '{printf "IP %s", $1}'' died with <Signals.SIGSEGV: 11>.
Jul 05 06:27:30 HOST kernel: Unable to handle kernel paging request at virtual address 000000002f00d980
Jul 05 06:27:30 HOST kernel: Mem abort info:
Jul 05 06:27:30 HOST kernel:   ESR = 0x96000006
Jul 05 06:27:30 HOST kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 05 06:27:30 HOST kernel:   SET = 0, FnV = 0
Jul 05 06:27:30 HOST kernel:   EA = 0, S1PTW = 0
Jul 05 06:27:30 HOST kernel: Data abort info:
Jul 05 06:27:30 HOST kernel:   ISV = 0, ISS = 0x00000006
Jul 05 06:27:30 HOST kernel:   CM = 0, WnR = 0
Jul 05 06:27:30 HOST kernel: user pgtable: 4k pages, 48-bit VAs, pgdp=0000000010aa1000
Jul 05 06:27:30 HOST kernel: [000000002f00d980] pgd=0000000010aa2003, p4d=0000000010aa2003, pud=0000000010aa3003, pmd=0000000000000000
Jul 05 06:27:30 HOST kernel: Internal error: Oops: 96000006 [#2] PREEMPT SMP
Jul 05 06:27:30 HOST kernel: Modules linked in: zram xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge aufs rfkill governor_performance zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) snd_soc_hdmi_codec snd_soc_audio_graph_card panfrost snd_soc_simple_card snd_soc_simple_card_utils gpu_sched dw_hdmi_cec snd_soc_rockchip_i2s dw_hdmi_i2s_audio hantro_vpu(C) rockchip_vdec(C) v4l2_h264 snd_soc_es8316 rockchip_rga videobuf2_dma_contig v4l2_mem2mem videobuf2_dma_sg snd_soc_core videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_pcm_dmaengine videobuf2_common snd_pcm snd_timer videodev sg snd mc soundcore cpufreq_dt sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod realtek
Jul 05 06:27:30 HOST kernel:  rockchipdrm analogix_dp dw_hdmi dw_mipi_dsi dwmac_rk stmmac_platform drm_kms_helper stmmac pcs_xpcs cec rc_core drm drm_panel_orientation_quirks
Jul 05 06:27:30 HOST kernel: CPU: 1 PID: 5056 Comm: python3 Tainted: P      D  C OE     5.10.43-rockchip64 #21.05.4
Jul 05 06:27:30 HOST kernel: Hardware name: Radxa ROCK Pi 4A (DT)
Jul 05 06:27:30 HOST kernel: pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
Jul 05 06:27:30 HOST kernel: pc : lock_page_memcg+0x34/0xc0
Jul 05 06:27:30 HOST kernel: lr : lock_page_memcg+0x28/0xc0
Jul 05 06:27:30 HOST kernel: sp : ffff8000157b3a80
Jul 05 06:27:30 HOST kernel: x29: ffff8000157b3a80 x28: 00e000003b16dbc3 
Jul 05 06:27:30 HOST kernel: x27: 0000000000000000 x26: ffff8000157b3cc8 
Jul 05 06:27:30 HOST kernel: x25: 0000ffff927ae000 x24: ffff000021587bb8 
Jul 05 06:27:30 HOST kernel: x23: fffffe0000cc5b40 x22: 0000ffff927ad000 
Jul 05 06:27:30 HOST kernel: x21: ffff8000157b3bb8 x20: fffffe0000cc5b40 
Jul 05 06:27:30 HOST kernel: x19: ff0000002f00d000 x18: 0000000000000000 
Jul 05 06:27:30 HOST kernel: x17: 0000000000000000 x16: 0000000000000000 
Jul 05 06:27:30 HOST kernel: x15: 0000000000000001 x14: 0000000000000002 
Jul 05 06:27:30 HOST kernel: x13: 0000000000040fba x12: 0000000000000000 
Jul 05 06:27:30 HOST kernel: x11: 0000000000000000 x10: ffff8000e61d1000 
Jul 05 06:27:30 HOST kernel: x9 : ffff0000f77566e0 x8 : 0000000000000000 
Jul 05 06:27:30 HOST kernel: x7 : ffff8000157b3a80 x6 : ffff0000f77566f0 
Jul 05 06:27:30 HOST kernel: x5 : 0000ffff927ad000 x4 : ffff000021587bb8 
Jul 05 06:27:30 HOST kernel: x3 : 00000000fffffecb x2 : 0000000000000001 
Jul 05 06:27:30 HOST kernel: x1 : ffff0000053d4880 x0 : 0000000000000001 
Jul 05 06:27:30 HOST kernel: Call trace:
Jul 05 06:27:30 HOST kernel:  lock_page_memcg+0x34/0xc0
Jul 05 06:27:30 HOST kernel:  page_remove_rmap+0x1c/0x568
Jul 05 06:27:30 HOST kernel:  unmap_page_range+0x56c/0x848
Jul 05 06:27:30 HOST kernel:  unmap_single_vma+0x88/0x100
Jul 05 06:27:30 HOST kernel:  unmap_vmas+0xdc/0x100
Jul 05 06:27:30 HOST kernel:  exit_mmap+0xd4/0x188
Jul 05 06:27:30 HOST kernel:  mmput+0x7c/0x160
Jul 05 06:27:30 HOST kernel:  do_exit+0x31c/0xab8
Jul 05 06:27:30 HOST kernel:  do_group_exit+0x44/0xa0
Jul 05 06:27:30 HOST kernel:  __wake_up_parent+0x0/0x30
Jul 05 06:27:30 HOST kernel:  el0_svc_common.constprop.2+0x8c/0x190
Jul 05 06:27:30 HOST kernel:  do_el0_svc+0x24/0x90
Jul 05 06:27:30 HOST kernel:  el0_svc+0x14/0x20
Jul 05 06:27:30 HOST kernel:  el0_sync_handler+0x90/0xb8
Jul 05 06:27:30 HOST kernel:  el0_sync+0x160/0x180
Jul 05 06:27:30 HOST kernel: Code: 97f92203 d503201f f9401e93 b40002d3 (b9498260) 
Jul 05 06:27:30 HOST kernel: ---[ end trace 3474353eefa9fd6c ]---
Jul 05 06:27:30 HOST kernel: note: python3[5056] exited with preempt_count 1
Jul 05 06:27:30 HOST kernel: Fixing recursive fault but reboot is needed!
Jul 05 06:27:30 HOST kernel: ------------[ cut here ]------------
Jul 05 06:27:30 HOST kernel: WARNING: CPU: 1 PID: 5056 at kernel/rcu/tree_plugin.h:297 rcu_note_context_switch+0x5c/0x400
Jul 05 06:27:30 HOST kernel: Modules linked in: zram xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge aufs rfkill governor_performance zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) snd_soc_hdmi_codec snd_soc_audio_graph_card panfrost snd_soc_simple_card snd_soc_simple_card_utils gpu_sched dw_hdmi_cec snd_soc_rockchip_i2s dw_hdmi_i2s_audio hantro_vpu(C) rockchip_vdec(C) v4l2_h264 snd_soc_es8316 rockchip_rga videobuf2_dma_contig v4l2_mem2mem videobuf2_dma_sg snd_soc_core videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_pcm_dmaengine videobuf2_common snd_pcm snd_timer videodev sg snd mc soundcore cpufreq_dt sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod realtek
Jul 05 06:27:30 HOST kernel:  rockchipdrm analogix_dp dw_hdmi dw_mipi_dsi dwmac_rk stmmac_platform drm_kms_helper stmmac pcs_xpcs cec rc_core drm drm_panel_orientation_quirks
Jul 05 06:27:30 HOST kernel: CPU: 1 PID: 5056 Comm: python3 Tainted: P      D  C OE     5.10.43-rockchip64 #21.05.4
Jul 05 06:27:30 HOST kernel: Hardware name: Radxa ROCK Pi 4A (DT)
Jul 05 06:27:30 HOST kernel: pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
Jul 05 06:27:30 HOST kernel: pc : rcu_note_context_switch+0x5c/0x400
Jul 05 06:27:30 HOST kernel: lr : rcu_note_context_switch+0x4c/0x400
Jul 05 06:27:30 HOST kernel: sp : ffff8000157b35e0
Jul 05 06:27:30 HOST kernel: x29: ffff8000157b35e0 x28: ffff0000053d4880 
Jul 05 06:27:30 HOST kernel: x27: 0000000000000000 x26: ffff800011b41000 
Jul 05 06:27:30 HOST kernel: x25: ffff8000102dca7c x24: 0000000000000000 
Jul 05 06:27:30 HOST kernel: x23: 0000000000000000 x22: ffff0000053d4880 
Jul 05 06:27:30 HOST kernel: x21: ffff800011b27858 x20: ffff0000053d4880 
Jul 05 06:27:30 HOST kernel: x19: ffff0000f7751980 x18: 0000000000000010 
Jul 05 06:27:30 HOST kernel: x17: 0000000000000000 x16: 0000000000000000 
Jul 05 06:27:30 HOST kernel: x15: 0000000000000329 x14: ffff8000157b33c0 
Jul 05 06:27:30 HOST kernel: x13: 00000000ffffffea x12: ffff80001194edc8 
Jul 05 06:27:30 HOST kernel: x11: 0000000000000003 x10: ffff800011936d88 
Jul 05 06:27:30 HOST kernel: x9 : ffff800011936de0 x8 : 0000000000017fe8 
Jul 05 06:27:30 HOST kernel: x7 : c0000000ffffefff x6 : 0000000000000001 
Jul 05 06:27:30 HOST kernel: x5 : 0000000000000001 x4 : ffff8000e61d1000 
Jul 05 06:27:30 HOST kernel: x3 : 0000000000000001 x2 : ffff80001156a000 
Jul 05 06:27:30 HOST kernel: x1 : ffff8000e61d1000 x0 : 0000000000000001 
Jul 05 06:27:30 HOST kernel: Call trace:
Jul 05 06:27:30 HOST kernel:  rcu_note_context_switch+0x5c/0x400
Jul 05 06:27:30 HOST kernel:  __schedule+0xac/0x758
Jul 05 06:27:30 HOST kernel:  schedule+0x40/0xf8
Jul 05 06:27:30 HOST kernel:  do_exit+0xf4/0xab8
Jul 05 06:27:30 HOST kernel:  die+0x208/0x248
Jul 05 06:27:30 HOST kernel:  die_kernel_fault+0x64/0x78
Jul 05 06:27:30 HOST kernel:  __do_kernel_fault+0x74/0x148
Jul 05 06:27:30 HOST kernel:  do_page_fault+0x1c8/0x3a8
Jul 05 06:27:30 HOST kernel:  do_translation_fault+0x50/0x60
Jul 05 06:27:30 HOST kernel:  do_mem_abort+0x40/0xa0
Jul 05 06:27:30 HOST kernel:  el1_abort+0x48/0x70
Jul 05 06:27:30 HOST kernel:  el1_sync_handler+0x64/0xe8
Jul 05 06:27:30 HOST kernel:  el1_sync+0x84/0x140
Jul 05 06:27:30 HOST kernel:  lock_page_memcg+0x34/0xc0
Jul 05 06:27:30 HOST kernel:  page_remove_rmap+0x1c/0x568
Jul 05 06:27:30 HOST kernel:  unmap_page_range+0x56c/0x848
Jul 05 06:27:30 HOST kernel:  unmap_single_vma+0x88/0x100
Jul 05 06:27:30 HOST kernel:  unmap_vmas+0xdc/0x100
Jul 05 06:27:30 HOST kernel:  exit_mmap+0xd4/0x188
Jul 05 06:27:30 HOST kernel:  mmput+0x7c/0x160
Jul 05 06:27:30 HOST kernel:  do_exit+0x31c/0xab8
Jul 05 06:27:30 HOST kernel:  do_group_exit+0x44/0xa0
Jul 05 06:27:30 HOST kernel:  __wake_up_parent+0x0/0x30
Jul 05 06:27:30 HOST kernel:  el0_svc_common.constprop.2+0x8c/0x190
Jul 05 06:27:30 HOST kernel:  do_el0_svc+0x24/0x90
Jul 05 06:27:30 HOST kernel:  el0_svc+0x14/0x20
Jul 05 06:27:30 HOST kernel:  el0_sync_handler+0x90/0xb8
Jul 05 06:27:30 HOST kernel:  el0_sync+0x160/0x180
Jul 05 06:27:30 HOST kernel: ---[ end trace 3474353eefa9fd6d ]---

After that every three minutes I see the following logs that didn’t show up prior to the above dump:

Jul 05 06:28:30 HOST kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Jul 05 06:28:30 HOST kernel: rcu:         Tasks blocked on level-0 rcu_node (CPUs 0-5):
Jul 05 06:28:30 HOST kernel:         (detected by 5, t=15002 jiffies, g=8841773, q=66)
Jul 05 06:28:30 HOST kernel: rcu: All QSes seen, last rcu_preempt kthread activity 1 (4352681183-4352681182), jiffies_till_next_fqs=1, root ->qsmask 0x0

So I’m starting to think that I’ve gotten unlucky and managed to get 2 different boards with bad memory modules, but I would love for the problem to be something else; the novelty of getting new boards is going to wear off pretty quick. Here are some more clues/info that might be useful:

  • A sure fire way to get my first board to crash is to do some heavy write activity on the 4 disk SATA array. For this I use bonnie++ and can usually get a crash within an hour every time.
  • I am using ZFS to manage the 4 HDDs, but I did some tests and confirmed that the problem still exists when using mdadm with a simple RAID5 as well
  • I have 2 1TB Seagate Barracuda and 2 1TB WD Blue HDDs
  • I am using the 60w power supply that came with the SATA HAT kit
  • The problem still exists with every combination of Arbian ramlogging and zram being on/off
  • It seems like other members on these forums have faced similar issues that remain un-resolved. For example, here and here. (There are more, but I’m only allowed to put two links in this post. Just search “Oops” on the forums.)

Any insight anyone has would be greatly appreciated. Thanks!

1 Like

Can you test on the legacy Rockchip 4.4 kernel? I will also test the 5.10.43-rockchip64 kernel here, which image are you using?

Hi, thanks for the response. I will test my first board with legacy kernel and post the results when I have them. To answer your question, I was using Armbian Buster 5.10 on my first board and am currently running Armbian Focal 5.10 on the board that is in use (i.e., the one that produced the logs I posted).

Ok, I installed Armbian Buster Legacy with kernel version 4.4 onto my first board (the 4B) and ran a bunch of tests. Most tests involved using bonnie++ to just hammer away onto either the 4 disk ZFS array or the eMMC. In all cases the computer eventually locked up. I was able to actually catch some stuff this time by having a second terminal ssh’d in with journalctl -f running so the text was still there after a lockup. Here is a representative example of an error:

Jul 07 05:21:55 rockpi-4b kernel: Unable to handle kernel paging request at virtual address ffffff00c17dda98
Jul 07 05:21:55 rockpi-4b kernel: pgd = ffffffc0d996a000
Jul 07 05:21:55 rockpi-4b kernel: [ffffff00c17dda98] *pgd=0000000000000000, *pud=0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: Internal error: Oops: 96000004 [#1] SMP
Jul 07 05:21:55 rockpi-4b kernel: Modules linked in: af_packet zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) midgard_kba
Jul 07 05:21:55 rockpi-4b kernel: CPU: 2 PID: 1623 Comm: dp_sync_taskq Tainted: P           O    4.4.213-rockchip64 #1
Jul 07 05:21:55 rockpi-4b kernel: Hardware name: ROCK PI 4B (DT)
Jul 07 05:21:55 rockpi-4b kernel: task: ffffffc0ec7daa00 task.stack: ffffffc0ec614000
Jul 07 05:21:55 rockpi-4b kernel: PC is at dmu_objset_userquota_find_data.isra.6+0x60/0xd0 [zfs]
Jul 07 05:21:55 rockpi-4b kernel: LR is at dmu_objset_userquota_get_ids+0x7c/0x2e8 [zfs]
Jul 07 05:21:55 rockpi-4b kernel: pc : [<ffffff80018a3a18>] lr : [<ffffff80018a3b04>] pstate: 20000145
Jul 07 05:21:55 rockpi-4b kernel: sp : ffffffc0ec617b10
Jul 07 05:21:55 rockpi-4b kernel: x29: ffffffc0ec617b10 x28: ffffffc0bb60bdc0 
Jul 07 05:21:55 rockpi-4b kernel: x27: ffffffc0d7a01e90 x26: ffffffc0d7a01ed0 
Jul 07 05:21:55 rockpi-4b kernel: x25: ffffffc0d9ba6600 x24: 0000000000000005 
Jul 07 05:21:55 rockpi-4b kernel: x23: 0000000000000000 x22: ffffffc0bb60bf40 
Jul 07 05:21:55 rockpi-4b kernel: x21: ffffffc0ecb88000 x20: ffffffc0eb0b4cf8 
Jul 07 05:21:55 rockpi-4b kernel: x19: ffffff00c17dda88 x18: 0000000030d00800 
Jul 07 05:21:55 rockpi-4b kernel: x17: 0000000000000000 x16: 0000000000000000 
Jul 07 05:21:55 rockpi-4b kernel: x15: 0000000000000000 x14: 0000000000000000 
Jul 07 05:21:55 rockpi-4b kernel: x13: 0000000000000000 x12: 0000000000000000 
Jul 07 05:21:55 rockpi-4b kernel: x11: 0000000000000000 x10: 0000000000000000 
Jul 07 05:21:55 rockpi-4b kernel: x9 : 0000000000000000 x8 : 0000000000000000 
Jul 07 05:21:55 rockpi-4b kernel: x7 : 0000000000000000 x6 : ffffffc0f3001a00 
Jul 07 05:21:55 rockpi-4b kernel: x5 : ffffff80010eb8fc x4 : 0000000000000001 
Jul 07 05:21:55 rockpi-4b kernel: x3 : 000000000001f4e2 x2 : 0000000000000000 
Jul 07 05:21:55 rockpi-4b kernel: x1 : 0000000000000000 x0 : ffffffc0c63aba78 
Jul 07 05:21:55 rockpi-4b kernel: 
                              PC: 0xffffff80018a3998:
Jul 07 05:21:55 rockpi-4b kernel: 3998  a94153f3 a9425bf5 a94363f7 f94023f9 a8c77bfd d65f03c0 019b3580 ffffff80
Jul 07 05:21:55 rockpi-4b kernel: 39b8  a9be7bfd 910003fd a90153f3 aa0003f3 aa0103f4 aa1e03e0 d503201f 39478e60
Jul 07 05:21:55 rockpi-4b kernel: 39d8  350000c0 f9400e73 aa1303e0 a94153f3 a8c27bfd d65f03c0 9106c260 91068261
Jul 07 05:21:55 rockpi-4b kernel: 39f8  f940da73 f9400283 eb13001f 54000160 f9400421 cb0103e2 8b020273 b4fffe73
Jul 07 05:21:55 rockpi-4b kernel: 3a18  f9400a64 eb04007f 540000c2 f8616a73 eb13001f 54ffff21 d2800013 17ffffeb
Jul 07 05:21:55 rockpi-4b kernel: 3a38  54ffffc1 f9401260 58000201 f9401400 91079421 940313f7 f9401261 f9406673
Jul 07 05:21:55 rockpi-4b kernel: 3a58  f9401420 f9406c02 79410842 350000a2 f9402c21 b100083f 54000041 f9403a73
Jul 07 05:21:55 rockpi-4b kernel: 3a78  940313dc 17ffffd9 0199d738 ffffff80 a9b97bfd 910003fd a90153f3 a9025bf5
Jul 07 05:21:55 rockpi-4b kernel: 
                              LR: 0xffffff80018a3a84:
Jul 07 05:21:55 rockpi-4b kernel: 3a84  ffffff80 a9b97bfd 910003fd a90153f3 a9025bf5 a90363f7 aa0003f3 2a0103f7
Jul 07 05:21:55 rockpi-4b kernel: 3aa4  aa1e03e0 aa0203f4 d503201f f9402e75 f90027ff b9443278 aa1503e0 97fff2dc
Jul 07 05:21:55 rockpi-4b kernel: 3ac4  340009e0 b94022a0 35000940 35000a37 79410a60 34000b40 f941ee60 b4000a20
Jul 07 05:21:55 rockpi-4b kernel: 3ae4  f90027e0 91030000 95cf6bc5 f94027e0 d5384101 f9008c01 9100e281 97ffffae
Jul 07 05:21:55 rockpi-4b kernel: 3b04  aa0003f4 52800016 f9400ea0 910143e2 f9416001 58001280 f8617803 aa1403e1
Jul 07 05:21:55 rockpi-4b kernel: 3b24  39421a60 d63f0060 2a0003f5 34000df7 f9402be0 f9020260 f9402fe0 f9020660
Jul 07 05:21:55 rockpi-4b kernel: 3b44  f94033e0 f9020a60 f94027e0 b4000140 f9008c1f 91040000 95cf7205 f94027e0
Jul 07 05:21:55 rockpi-4b kernel: 3b64  91030000 95cf6bd8 f94027e0 91040000 95cf7227 91050274 aa1403e0 95cf6ba0
Jul 07 05:21:55 rockpi-4b kernel: 
                              SP: 0xffffffc0ec617a90:
Jul 07 05:21:55 rockpi-4b kernel: 7a90  bb60bf40 ffffffc0 00000000 00000000 00000005 00000000 d9ba6600 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7ab0  d7a01ed0 ffffffc0 d7a01e90 ffffffc0 bb60bdc0 ffffffc0 ec617b10 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7ad0  018a3b04 ffffff80 ec617b10 ffffffc0 018a3a18 ffffff80 20000145 00000000
Jul 07 05:21:55 rockpi-4b kernel: 7af0  ec617b10 ffffffc0 08c7ea1c ffffff80 ffffffff ffffffff bb60db80 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b10  ec617b30 ffffffc0 018a3b04 ffffff80 bb60bdc0 ffffffc0 eb0b4cc0 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b30  ec617ba0 ffffffc0 018b7c18 ffffff80 bb60bf00 ffffffc0 e3062200 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b50  00000002 00000000 bb60bf40 ffffffc0 eb0b4cc0 ffffffc0 ecb88000 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b70  bb60bf40 ffffffc0 c63ab8c8 ffffffc0 ec617ba0 ffffffc0 018b7c08 ffffff80
Jul 07 05:21:55 rockpi-4b kernel: 
                              X0: 0xffffffc0c63ab9f8:
Jul 07 05:21:55 rockpi-4b kernel: b9f8  346545f4 00000000 00000000 dead4ead ffffffff 00000000 ffffffff ffffffff
Jul 07 05:21:55 rockpi-4b kernel: ba18  c63aba18 ffffffc0 c63aba18 ffffffc0 00000000 dead4ead ffffffff 00000000
Jul 07 05:21:55 rockpi-4b kernel: ba38  ffffffff ffffffff c63aba00 ffffff00 c63aba00 ffffffc0 00000040 000000c0
Jul 07 05:21:55 rockpi-4b kernel: ba58  00000040 000000c0 00000001 00000000 00000100 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: ba78  c17dda88 ffffff00 c17dda28 ffffffc0 00000100 dead0000 00000200 dead0000
Jul 07 05:21:55 rockpi-4b kernel: ba98  ffffffff 00000000 7b97fe30 ffffffc0 01000001 00000000 0001d5f2 00000000
Jul 07 05:21:55 rockpi-4b kernel: bab8  ffffffff ffffffff 00000140 00000000 c17dde00 ffffffc0 ecb88000 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: bad8  929090a8 ffffffc0 eca4b4f8 ffffffc0 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 
                              X5: 0xffffff80010eb87c:
Jul 07 05:21:55 rockpi-4b kernel: b87c  f9401bf7 a8cc7bfd d65f03c0 a9be7bfd 910003fd f9000bf3 aa0003f3 aa1e03e0
Jul 07 05:21:55 rockpi-4b kernel: b89c  d503201f aa1303e0 95c3a4d9 f9400bf3 a8c27bfd d65f03c0 a9be7bfd 910003fd
Jul 07 05:21:55 rockpi-4b kernel: b8bc  f9000bf3 aa0003f3 aa1e03e0 d503201f b2652fe0 12a90021 8b000260 f2c007a1
Jul 07 05:21:55 rockpi-4b kernel: b8dc  eb01001f aa1303e0 540000a8 95c35971 f9400bf3 a8c27bfd d65f03c0 95c3a4c4
Jul 07 05:21:55 rockpi-4b kernel: b8fc  17fffffc a9bd7bfd 910003fd a90153f3 f90013f5 aa0003f5 aa1e03e0 d503201f
Jul 07 05:21:55 rockpi-4b kernel: b91c  aa1503e0 95d30560 11000400 93407c13 d5384100 b9402c01 52a00140 6a00003f
Jul 07 05:21:55 rockpi-4b kernel: b93c  52884000 72a04800 52885801 72a04801 1a800021 aa1303e0 95c3a045 aa0003f4
Jul 07 05:21:55 rockpi-4b kernel: b95c  b4000080 aa1303e2 aa1503e1 95d303c6 aa1403e0 a94153f3 f94013f5 a8c37bfd
Jul 07 05:21:55 rockpi-4b kernel: 
                              X6: 0xffffffc0f3001980:
Jul 07 05:21:55 rockpi-4b kernel: 1980  ec1d33c0 ffffffc0 f3001a88 ffffffc0 f3001888 ffffffc0 ec11b4a8 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 19a0  ec11b480 ffffffc0 09494c90 ffffff80 ec1da9d8 ffffffc0 00000002 00000007
Jul 07 05:21:55 rockpi-4b kernel: 19c0  00000001 00000000 f30019c8 ffffffc0 f30019c8 ffffffc0 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 19e0  00000000 00000000 00000000 00000000 ec1d8200 ffffffc0 f3000d00 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1a00  09431570 ffffff80 40000000 00000000 00000005 00000000 00000200 00000200
Jul 07 05:21:55 rockpi-4b kernel: 1a20  00000000 0000000d 00010010 00000000 00010010 00000000 00000008 00000000
Jul 07 05:21:55 rockpi-4b kernel: 1a40  00004000 00000005 00000000 00000000 00000200 00000040 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 1a60  08f79da7 ffffff80 f3001b68 ffffffc0 f3001968 ffffffc0 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 
                              X20: 0xffffffc0eb0b4c78:
Jul 07 05:21:55 rockpi-4b kernel: 4c78  00000000 00000000 00000000 00000000 00000000 00000000 0947a068 ffffff80
Jul 07 05:21:55 rockpi-4b kernel: 4c98  0947a0d8 ffffff80 0947bac0 ffffff80 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 4cb8  00000000 00000000 00000048 00000000 00000008 00000000 eb0b4cd0 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 4cd8  eb0b4cd0 ffffffc0 00000000 00000000 00000000 00000000 eb378000 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 4cf8  0001f4e2 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 4d18  00000000 00000000 00000000 00000000 00000000 00000000 00000020 00000000
Jul 07 05:21:55 rockpi-4b kernel: 4d38  00000000 00000000 eb0b4d40 ffffffc0 eb0b4d40 ffffffc0 00000001 00000000
Jul 07 05:21:55 rockpi-4b kernel: 4d58  5bff2068 00000368 00000000 00000000 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 
                              X21: 0xffffffc0ecb87f80:
Jul 07 05:21:55 rockpi-4b kernel: 7f80  7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c
Jul 07 05:21:55 rockpi-4b kernel: 7fa0  7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c
Jul 07 05:21:55 rockpi-4b kernel: 7fc0  7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c
Jul 07 05:21:55 rockpi-4b kernel: 7fe0  7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c 7c7c7c7c
Jul 07 05:21:55 rockpi-4b kernel: 8000  d8bbf000 ffffffc0 dcbc0000 ffffffc0 e90bc400 ffffffc0 d87b8000 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 8020  00000000 00000000 00000001 00000000 00000000 dead4ead ffffffff 00000000
Jul 07 05:21:55 rockpi-4b kernel: 8040  ffffffff ffffffff ecb88048 ffffffc0 ecb88048 ffffffc0 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 8060  00000000 00000000 00010001 dead4ead ffffffff 00000000 ffffffff ffffffff
Jul 07 05:21:55 rockpi-4b kernel: 
                              X22: 0xffffffc0bb60bec0:
Jul 07 05:21:55 rockpi-4b kernel: bec0  00000100 dead0000 00000200 dead0000 00000100 dead0000 00000200 dead0000
Jul 07 05:21:55 rockpi-4b kernel: bee0  bb60db60 ffffffc0 6a864870 ffffffc0 00000100 dead0000 00000200 dead0000
Jul 07 05:21:55 rockpi-4b kernel: bf00  00000001 00000000 00000000 dead4ead ffffffff 00000000 ffffffff ffffffff
Jul 07 05:21:55 rockpi-4b kernel: bf20  bb60bf20 ffffffc0 bb60bf20 ffffffc0 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: bf40  000e000e dead4ead ffffffff 00000000 ffffffff ffffffff 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: bf60  00000188 00000000 00000000 00000000 bb60bf70 ffffffc0 bb60bf70 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: bf80  00000188 00000000 00000000 00000000 bb60bf90 ffffffc0 bb60bf90 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: bfa0  00000188 00000000 00000000 00000000 c17dda00 ffffffc0 c17dda00 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 
                              X25: 0xffffffc0d9ba6580:
Jul 07 05:21:55 rockpi-4b kernel: 6580  00000000 00000000 00000000 00000000 00000001 00000001 00000000 dead4ead
Jul 07 05:21:55 rockpi-4b kernel: 65a0  ffffffff 00000000 ffffffff ffffffff 01020000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 65c0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 65e0  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 6600  00030003 dead4ead ffffffff 00000000 ffffffff ffffffff 00020002 dead4ead
Jul 07 05:21:55 rockpi-4b kernel: 6620  ffffffff ffffffbd ffffffff ffffffff d9ba6630 ffffffc0 d9ba6630 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 6640  c1668000 ffffffbd 00000000 00000000 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 6660  d9ba6640 ffffffc0 00000003 ffffffff ffffffff 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: 
                              X26: 0xffffffc0d7a01e50:
Jul 07 05:21:55 rockpi-4b kernel: 1e50  00000000 00000004 0000008b 00000001 7fffffff 00000002 80000008 45530073
Jul 07 05:21:55 rockpi-4b kernel: 1e70  00006b39 00000000 00006b33 00000000 d7a01e80 ffffffc0 d7a01e80 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1e90  d7a01e90 ffffffc0 d7a01e90 ffffffc0 d7a01ea0 ffffffc0 d7a01ea0 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1eb0  d7a01eb0 ffffffc0 d7a01eb0 ffffffc0 d7a016c0 ffffffc0 d79706c0 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1ed0  1bda1bda dead4ead ffffffff dead4ead ffffffff ffffffff d5cabd20 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1ef0  ec493d20 ffffffc0 e583e583 dead4ead ffffffff ffffffff ffffffff ffffffff
Jul 07 05:21:55 rockpi-4b kernel: 1f10  dc943a70 ffffffc0 dc943a70 ffffffc0 00000000 dead4ead ffffffff ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1f30  ffffffff ffffffff 00030003 dead4ead ffffffff 00000000 ffffffff ffffffff
Jul 07 05:21:55 rockpi-4b kernel: 
                              X27: 0xffffffc0d7a01e10:
Jul 07 05:21:55 rockpi-4b kernel: 1e10  ffffffff ffffffff db2ff1c0 ffffffc0 00000000 00000000 d7c30d80 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1e30  d7c30800 ffffffc0 d7c30f90 ffffffc0 d7c30810 ffffffc0 00000002 00000004
Jul 07 05:21:55 rockpi-4b kernel: 1e50  00000000 00000004 0000008b 00000001 7fffffff 00000002 80000008 45530073
Jul 07 05:21:55 rockpi-4b kernel: 1e70  00006b39 00000000 00006b33 00000000 d7a01e80 ffffffc0 d7a01e80 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1e90  d7a01e90 ffffffc0 d7a01e90 ffffffc0 d7a01ea0 ffffffc0 d7a01ea0 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1eb0  d7a01eb0 ffffffc0 d7a01eb0 ffffffc0 d7a016c0 ffffffc0 d79706c0 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1ed0  1bda1bda dead4ead ffffffff dead4ead ffffffff ffffffff d5cabd20 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 1ef0  ec493d20 ffffffc0 e583e583 dead4ead ffffffff ffffffff ffffffff ffffffff
Jul 07 05:21:55 rockpi-4b kernel: 
                              X28: 0xffffffc0bb60bd40:
Jul 07 05:21:55 rockpi-4b kernel: bd40  00000000 dead4ead ffffffff 00000000 ffffffff ffffffff bb60bd58 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: bd60  bb60bd58 ffffffc0 00000000 00000000 00000000 00000000 00000000 dead4ead
Jul 07 05:21:55 rockpi-4b kernel: bd80  ffffffff 00000000 ffffffff ffffffff 00000000 00000000 00000090 00000000
Jul 07 05:21:55 rockpi-4b kernel: bda0  00000080 00000000 bb60bda8 ffffffc0 bb60bda8 ffffffc0 bb60b900 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: bdc0  00000000 00000000 bb60bdc8 ffffffc0 bb60bdc8 ffffffc0 00000000 dead4ead
Jul 07 05:21:55 rockpi-4b kernel: bde0  ffffffff 00000000 ffffffff ffffffff 00000000 00000000 00000000 00000000
Jul 07 05:21:55 rockpi-4b kernel: be00  00000000 00000000 bb60b948 ffffffc0 bb60c2c8 ffffffc0 ecb88000 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: be20  0001d5f1 00000000 eca4b4f8 ffffffc0 92908fc8 ffffffc0 e3062200 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 
                              X29: 0xffffffc0ec617a90:
Jul 07 05:21:55 rockpi-4b kernel: 7a90  bb60bf40 ffffffc0 00000000 00000000 00000005 00000000 d9ba6600 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7ab0  d7a01ed0 ffffffc0 d7a01e90 ffffffc0 bb60bdc0 ffffffc0 ec617b10 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7ad0  018a3b04 ffffff80 ec617b10 ffffffc0 018a3a18 ffffff80 20000145 00000000
Jul 07 05:21:55 rockpi-4b kernel: 7af0  ec617b10 ffffffc0 08c7ea1c ffffff80 ffffffff ffffffff bb60db80 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b10  ec617b30 ffffffc0 018a3b04 ffffff80 bb60bdc0 ffffffc0 eb0b4cc0 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b30  ec617ba0 ffffffc0 018b7c18 ffffff80 bb60bf00 ffffffc0 e3062200 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b50  00000002 00000000 bb60bf40 ffffffc0 eb0b4cc0 ffffffc0 ecb88000 ffffffc0
Jul 07 05:21:55 rockpi-4b kernel: 7b70  bb60bf40 ffffffc0 c63ab8c8 ffffffc0 ec617ba0 ffffffc0 018b7c08 ffffff80
Jul 07 05:21:55 rockpi-4b kernel: 
Jul 07 05:21:55 rockpi-4b kernel: Process dp_sync_taskq (pid: 1623, stack limit = 0xffffffc0ec614000)
Jul 07 05:21:55 rockpi-4b kernel: Stack: (0xffffffc0ec617b10 to 0xffffffc0ec618000)
Jul 07 05:21:55 rockpi-4b kernel: 7b00:                                   ffffffc0ec617b30 ffffff80018a3b04
Jul 07 05:21:55 rockpi-4b kernel: 7b20: ffffffc0bb60bdc0 ffffffc0eb0b4cc0 ffffffc0ec617ba0 ffffff80018b7c18
Jul 07 05:21:55 rockpi-4b kernel: 7b40: ffffffc0bb60bf00 ffffffc0e3062200 0000000000000002 ffffffc0bb60bf40
Jul 07 05:21:55 rockpi-4b kernel: 7b60: ffffffc0eb0b4cc0 ffffffc0ecb88000 ffffffc0bb60bf40 ffffffc0c63ab8c8
Jul 07 05:21:55 rockpi-4b kernel: 7b80: ffffffc0ec617ba0 ffffff80018b7c08 ffffffc0bb60bf00 ffffffc0d7a01ed0
Jul 07 05:21:55 rockpi-4b kernel: 7ba0: ffffffc0ec617c50 ffffff80018a2198 ffffffc0bb60bdc0 ffffffc069d6b8c0
Jul 07 05:21:55 rockpi-4b kernel: 7bc0: ffffffc069d6be00 ffffffc0eca90800 ffffffc0eb0b4cc0 ffffffc0d7a01e38
Jul 07 05:21:55 rockpi-4b kernel: 7be0: ffffffc0d9ba6600 ffffffc0d7a01ed0 ffffffc0d7a01e90 ffffffc0d7a01ea0
Jul 07 05:21:55 rockpi-4b kernel: 7c00: ffffffc0ec617c20 ffffff80018eef08 ffffffc06a864800 ffffffc0ec7daa00
Jul 07 05:21:55 rockpi-4b kernel: 7c20: ffffffc0bb60bfa0 ffffff80018a218c ffffffc0bb60bdc0 ffffffc069d6b8c0
Jul 07 05:21:55 rockpi-4b kernel: 7c40: ffffffc069d6be00 ffffffc0eca90800 ffffffc0ec617c90 ffffff80010f0b80
Jul 07 05:21:55 rockpi-4b kernel: 7c60: ffffffc0d7a01e00 ffffffc0d7c30f80 0000000000000000 0000000000000140
Jul 07 05:21:55 rockpi-4b kernel: 7c80: ffffffc0d7a01ef8 ffffffc0d7a01e38 ffffffc0ec617e00 ffffff80080d4040
Jul 07 05:21:55 rockpi-4b kernel: 7ca0: ffffffc0db2ff700 ffffffc0ec7daa00 ffffff8008f6b8ee ffffffc0d7c30f80
Jul 07 05:21:55 rockpi-4b kernel: 7cc0: ffffff80010f08a4 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7ce0: 0000000000000000 0000000000000000 ffffffc0ec7daa80 ffffffc0ec7daa00
Jul 07 05:21:55 rockpi-4b kernel: 7d00: ffffffffffffffff 0000000000000001 ffffffc0ec7daa00 ffffff80080e1178
Jul 07 05:21:55 rockpi-4b kernel: 7d20: dead000000000100 dead000000000200 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7d40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7d60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7d80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7da0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7dc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7de0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7e00: 0000000000000000 ffffff8008082ef0 ffffff80080d3f60 ffffffc0db2ff700
Jul 07 05:21:55 rockpi-4b kernel: 7e20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7e40: ffffffc0ec617e60 0000000000000000 ffffffc0f7ee51c0 ffffffc0d7c30f80
Jul 07 05:21:55 rockpi-4b kernel: 7e60: ffffffc000000000 dead4ead00000000 ffffffc0ffffffff ffffffffffffffff
Jul 07 05:21:55 rockpi-4b kernel: 7e80: ffffffc0ec617e80 ffffffc0ec617e80 0000000000000000 dead4ead00000000
Jul 07 05:21:55 rockpi-4b kernel: 7ea0: 00000000ffffffff ffffffffffffffff ffffffc0ec617eb0 ffffffc0ec617eb0
Jul 07 05:21:55 rockpi-4b kernel: 7ec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7ee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7f20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7fa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7fc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: Call trace:
Jul 07 05:21:55 rockpi-4b kernel: Exception stack(0xffffffc0ec617940 to 0xffffffc0ec617a70)
Jul 07 05:21:55 rockpi-4b kernel: 7940: ffffff00c17dda88 0000008000000000 ffffffc0ec617b10 ffffff80018a3a18
Jul 07 05:21:55 rockpi-4b kernel: 7960: ffffff80010ec89c ffffffc0df3d63d0 ffffffc0d60d6400 ffffffc0df3d63d0
Jul 07 05:21:55 rockpi-4b kernel: 7980: ffffffc0ec617a80 ffffff80081d4dc8 ffffffc08a555e00 ffffffc0c47925b8
Jul 07 05:21:55 rockpi-4b kernel: 79a0: ffffff80010eb8fc ffffff800199c754 ffffffc0ec6179e0 ffffff8008c80390
Jul 07 05:21:55 rockpi-4b kernel: 79c0: ffffffc0ec6179e0 ffffff8008c80430 ffffffc0c47926b8 ffffff800199c754
Jul 07 05:21:55 rockpi-4b kernel: 79e0: ffffffc0c63aba78 0000000000000000 0000000000000000 000000000001f4e2
Jul 07 05:21:55 rockpi-4b kernel: 7a00: 0000000000000001 ffffff80010eb8fc ffffffc0f3001a00 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7a20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7a40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: 7a60: 0000000000000000 0000000000000000
Jul 07 05:21:55 rockpi-4b kernel: [<ffffff80018a3a18>] dmu_objset_userquota_find_data.isra.6+0x60/0xd0 [zfs]
Jul 07 05:21:55 rockpi-4b kernel: [<ffffff80018a3b04>] dmu_objset_userquota_get_ids+0x7c/0x2e8 [zfs]
Jul 07 05:21:55 rockpi-4b kernel: [<ffffff80018b7c18>] dnode_sync+0xf8/0x770 [zfs]
Jul 07 05:21:55 rockpi-4b kernel: [<ffffff80018a2198>] sync_dnodes_task+0xb8/0x108 [zfs]
Jul 07 05:21:55 rockpi-4b kernel: [<ffffff80010f0b80>] taskq_thread+0x2dc/0x3cc [spl]
Jul 07 05:21:55 rockpi-4b kernel: [<ffffff80080d4040>] kthread+0xe0/0xf0
Jul 07 05:21:55 rockpi-4b kernel: [<ffffff8008082ef0>] ret_from_fork+0x10/0x20
Jul 07 05:21:55 rockpi-4b kernel: Code: f9400421 cb0103e2 8b020273 b4fffe73 (f9400a64) 
Jul 07 05:21:55 rockpi-4b kernel: ---[ end trace 0ce8ddc67b7d1630 ]---

I’m not sure if this is relevent/useful, but I also found this in the startup logs:

Jul 07 04:17:01 rockpi-4b kernel: Virtual kernel memory layout:
                                  modules : 0xffffff8000000000 - 0xffffff8008000000   (   128 MB)
                                  vmalloc : 0xffffff8008000000 - 0xffffffbdbfff0000   (   246 GB)
                                    .init : 0xffffff8009330000 - 0xffffff8009460000   (  1216 KB)
                                    .text : 0xffffff8008080000 - 0xffffff8008c90000   ( 12352 KB)
                                  .rodata : 0xffffff8008c90000 - 0xffffff8009330000   (  6784 KB)
                                    .data : 0xffffff8009460000 - 0xffffff80095ec008   (  1585 KB)
                                  vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
                                            0xffffffbdc0008000 - 0xffffffbdc3e00000   (    61 MB actual)
                                  fixed   : 0xffffffbffe7fb000 - 0xffffffbffec00000   (  4116 KB)
                                  PCI I/O : 0xffffffbffee00000 - 0xffffffbfffe00000   (    16 MB)
                                  memory  : 0xffffffc000200000 - 0xffffffc0f8000000   (  3966 MB)

I might be waay off base here, but the values in the dump in X21 - X27 look pretty suspicious, especially the dead values. According to the memory layout table those memory locations are all in physical memory.

The first line of the errors says the system was unable to handle a request at ffffff00c17dda98. I’m not sure if this is a clue or not, but that address is outside of the range of allocated memory given the table above. (Maybe?? I out of my depth here, and the next line, pgd = ffffffc0d996a000, shows an in-range address).

Another big clue is that I did not get any crashes when running a test on just the bare board (i.e., without the SATA HAT installed). I had still installed the SATA HAT software from Radxa, but just didn’t have the hardware attached. When I ran the identical test (i.e., bonnie++ on the eMMC) with the SATA HAT installed I got another lockup.

One more thing to note is that these failures happen pretty randomly. In other words, it’s not the case that every time I run a test it fails exactly 3 minutes later. Sometimes the failure happened almost immediately and sometimes it could take up to an hour or 2 for the system to lockup.

Hi All,

I’ve been doing lots and lots of tests to try to hone in on what is causing these crashes. I’ve tried to test each component of the SATA HAT stack as independently as I could, but, as you’ll see the results are not totally conclusive. For a while I also suspected that the HAT-mounted 12V power socket was somehow faulty or that I had a bad 12V 60W power supply, so that’s why there are some different power source tests as well. Here are the results, such as they are:

Tests that NEVER failed

  • Bare board, writing to eMMC, USB-c 96W power
  • HAT with no HDD, writing to eMMC, USB-c 96W power
  • HAT with HDD attached, no fan/oled, writing to eMMC, 12V 60W power

Tests that almost never failed

  • HAT with HDD and fan, writing to HDD (zfs), USB-c 96W power

Tests that failed

  • HAT with HDD and fan, writing to eMMC, 12V 60W power
  • HAT with HDD and fan, writing to HDD (zfs), 12V 60W power
  • HAT with no HDD, writing to eMMC, 12V 60W power
  • HAT with HDD, no fan/oled, writing to HDD (zfs), 12V 60W power
  • HAT with HDD and fan, writing to eMMC, 12V 60W power
  • HAT with HDD and fan, writing to HDD (zfs), 12V 120W power

I had a question about that last one. When I was suspicious of the 60W power supply I was using I went and bought another 12V power supply and figured that more juice was better, but maybe 120W was too much? Is this possible? I haven’t yet tried a different 60W adapter or a 12V 90W adapter.

In any case, at the end of the day it’s not clear to me that there’s a single thing in common with any of the failed or passed tests. This fact and the frustratingly random nature of the failures seem to point clearly to some sort of hardware issue, but I’m not sure what. If anyone can think of other things to look into I would be more than happy to test them, but at this point I’m running out of ideas.

Thanks!

Hey JJJ, sorry no good ideas on my side, just wanted to give some kudos. i’m struggling with a similar problem (random freezes over time) with my rock pi 4b+ with the penta sata kit and at least now i have some clues where and how to look at.

Was planning on booting up memtest86 and do some runs but turns out it isn’t as easy as i expected it to be (or i’m doing something wrong).

This is very strange. 120W should be no problem. We will check possible reason from the hardware side. It seems powering from the SATA HAT to the ROCK Pi 4 can not provide enough stable power. That’s my current guess, we need some test here too.

Hi, JJJ

First of all, I would like to check if the SATA HAT can not provide enough 5V current for the Pi 4 and your HDD, since your HDD eats 5V as well.

You can test with the following:
take off two hdd or three, then test if it passes the burning. 3.5inch HDD need 12V as well as 5V.

You’re powering an SBC, not a poorly designed soldering iron. PSU capable of more current won’t hurt, it can only help. Go ahead and try a better PSU.

Hi All,

A lot more tests done over here. At samtu’s suggestion I tested my setup with 4, 3, 2, and 1 HDD, each with three different power sources: the 12V 60W HAT plug, a 96W USB-c, and a 65W USB-c. That’s 12 different tests! In all cases I used ZFS to construct a file system from the disks (raidz except for the 1 disk test). The actual test was the same as above: use bonnie++ to hammer away on the disks while watching with journalctl -f.

Unfortunately, ALL of these tests failed. Some quickly and some slowly, but all in the same way (as seen above).

Just to rule out any ZFS weirdness I tried an additional test with 4 HDDs, 96W USB-c power, and a simple RAID-5 with mdadm. The system wasn’t even able to finish building the array before it experienced the same failure.

So… the frustrations continue. I still suspect a hardware issue and I don’t think we can rule out some sort of power issue, but I’m remembering those memtester failures that kicked this whole thing off. A while back I tried to get memtester86 to load on the board, but, as cmru has found, this doesn’t seem to be possible?

But back to power. I had an idea that I want to run by y’all before I give it a try. Would I be endangering any components if I unplugged the HAT from the GPIO pins (leaving the M2 ribbon cable attached) and then powered the HAT with the 12V 60W plug and the Pi with USB-c? Would this even work? I’m assuming the GPIO connection is only for power and controlling the fan/oled, but I might be wrong about that.

Thanks

Hi All,

It’s been a while, but I’ve been testing furiously and have returned with some good news and some bad news.

As hinted above, my tests involved powering the whole thing via both the 12V plug and USB-c (with a 96W adapter). To be super safe I disconnected the HAT from the GPIO pins during these tests. Not sure if this was necessary, but I didn’t want to take any chances. To make a long story short, all my tests still failed. This is the bad news.

In parallel to this a new RockPi4B+ with 4G of RAM showed up so I decided to test it as well. As before, the first thing I did after installing an OS (Armbian Focal 21.05.1) was to run memtester. And what do you know, I got NO ERRORS! This got me very excited and I integrated this 4B+ into my SATA HAT setup. Well, here I am over a week later and the system is running perfectly! I ran my bonnie++ tests non-stop for over 24 hours with not even a hint of a failure and have been using the system ever since.

So, to summarize, I was able to get a working system, but I had to buy 3 different RockPi’s to do it. The only thing that all failing tests had in common is that they were run on systems that had also failed memtester tests.

1 Like

Hi, @JJJ

So you mean, it turns out that it’s the ROCK Pi 4 itself has the memory issue?

Hi, @jack

I don’t want to say anything definitively, but I think all evidence I’ve collected does make that the most likely scenario (at least in my mind).

Something I didn’t say above was that I did a small subset of all the tests I’ve done with the new 4B+ and none of them failed. So between all the tests I did, the ONLY thing that successfully predicted the failure of a tests was what core ROCK Pi 4 I was using. The A and the B failed and the B+ didn’t.

There was only one test I did that both had the same success/failure segregation and operated only one the core board: memtester. This is why I say my current best guess is something wrong with the core boards themselves, and specifically something with the memory.

I don’t have anywhere near enough information to guess if this is an issue with my specific A and B models or something inherent to the design in general. My guess is the former because otherwise this forum would be flooded with people having the same issue as me.