ROCK 5B Debug Party Invitation

Yeah, but until now I’ve not connected any PCIe device to any of the M.2 slots. So I really need to get a good SSD to test with since judging by reports from ODROID forum idle consumption once an NVMe SSD has been slapped into the M.2 slot rises to insane levels with RK’s BSP kernel (should be 4.19 over there, unfortunately users only sometimes post the relevant details).

Could be an indication that there are driver/settings issues with NVMe’s APST (Autonomous Power State Transition) asides ASPM.

If I can manage to arrange some time for this this week-end, I could run some tests with ASPM. I’m having M2->PCIE adapters and an M2 WIFI card, that should be sufficient.

Testing LVGL on Rock 5B.

Dependencies:

  • SDL2 with HW accel (build and install my version or the latest sdl2 might work better)
  • cmake up to date

Build demo, recipe:

mkdir -p lvgl
cd lvgl
git clone --recursive https://github.com/littlevgl/pc_simulator.git
cd pc_simulator
mkdir -p build
cd build
cmake ..
make -j8

Before you build you should enable, disable or change some settings:

diff --git a/lv_conf.h b/lv_conf.h
index 0b9a6dc..7cf0612 100644
--- a/lv_conf.h
+++ b/lv_conf.h
@@ -49,7 +49,7 @@
 #define LV_MEM_CUSTOM 0
 #if LV_MEM_CUSTOM == 0
     /*Size of the memory available for `lv_mem_alloc()` in bytes (>= 2kB)*/
-    #define LV_MEM_SIZE (128 * 1024U)          /*[bytes]*/
+    #define LV_MEM_SIZE (896 * 1024U)          /*[bytes]*/
 
     /*Set an address for the memory pool instead of allocating it as a normal array. Can be in external SRAM too.*/
     #define LV_MEM_ADR 0     /*0: unused*/
@@ -151,7 +151,7 @@
 
 /*Maximum buffer size to allocate for rotation.
  *Only used if software rotation is enabled in the display driver.*/
-#define LV_DISP_ROT_MAX_BUF (32*1024)
+#define LV_DISP_ROT_MAX_BUF (64*1024)
 
 /*-------------
  * GPU
@@ -184,7 +184,7 @@
 #if LV_USE_GPU_SDL
     #define LV_GPU_SDL_INCLUDE_PATH <SDL2/SDL.h>
     /*Texture cache size, 8MB by default*/
-    #define LV_GPU_SDL_LRU_SIZE (1024 * 1024 * 8)
+    #define LV_GPU_SDL_LRU_SIZE (1024 * 1024 * 64)
     /*Custom blend mode for mask drawing, disable if you need to link with older SDL2 lib*/
     #define LV_GPU_SDL_CUSTOM_BLEND_MODE (SDL_VERSION_ATLEAST(2, 0, 6))
 #endif
diff --git a/lv_drivers b/lv_drivers
--- a/lv_drivers
+++ b/lv_drivers
@@ -1 +1 @@
-Subproject commit 1bd4368e71df5cafd68d1ad0a37ce0f92b8f6b88
+Subproject commit 1bd4368e71df5cafd68d1ad0a37ce0f92b8f6b88-dirty
diff --git a/lv_drv_conf.h b/lv_drv_conf.h
index 4f6a4e2..b40db57 100644
--- a/lv_drv_conf.h
+++ b/lv_drv_conf.h
@@ -95,8 +95,8 @@
 #endif
 
 #if USE_SDL || USE_SDL_GPU
-#  define SDL_HOR_RES     480
-#  define SDL_VER_RES     320
+#  define SDL_HOR_RES     1920
+#  define SDL_VER_RES     1080
 
 /* Scale window by this factor (useful when simulating small screens) */
 #  define SDL_ZOOM        1

Tested with HDMI 1920x1080 and debug info. There is also a Wayland driver that may be a lot faster but as i don’t have Wayland i have not tested it.

Screenshots:

Panfrost now almost works well enough that you might want to use it:

=======================================================
    glmark2 2021.12
=======================================================
    OpenGL Information
    GL_VENDOR:      Panfrost
    GL_RENDERER:    Mali-G610 (Panfrost)
    GL_VERSION:     OpenGL ES 3.1 Mesa 22.3.0-devel (git-7fce4e1bfd)
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 481 FrameTime: 2.079 ms
[build] use-vbo=true: FPS: 453 FrameTime: 2.208 ms
[texture] texture-filter=nearest: FPS: 444 FrameTime: 2.252 ms
[texture] texture-filter=linear: FPS: 431 FrameTime: 2.320 ms
[texture] texture-filter=mipmap: FPS: 441 FrameTime: 2.268 ms
[shading] shading=gouraud: FPS: 423 FrameTime: 2.364 ms
[shading] shading=blinn-phong-inf: FPS: 8 FrameTime: 125.000 ms
[shading] shading=phong: FPS: 432 FrameTime: 2.315 ms
[shading] shading=cel: FPS: 427 FrameTime: 2.342 ms
[bump] bump-render=high-poly: FPS: 131 FrameTime: 7.634 ms
[bump] bump-render=normals: FPS: 449 FrameTime: 2.227 ms
[bump] bump-render=height: FPS: 8 FrameTime: 125.000 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 453 FrameTime: 2.208 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 411 FrameTime: 2.433 ms
[pulsar] light=false:quads=5:texture=false: FPS: 484 FrameTime: 2.066 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 0 FrameTime: inf ms
[desktop] effect=shadow:windows=4: FPS: 151 FrameTime: 6.623 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 228 FrameTime: 4.386 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 217 FrameTime: 4.608 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 174 FrameTime: 5.747 ms
[ideas] speed=duration: FPS: 162 FrameTime: 6.173 ms
[jellyfish] <default>: FPS: 8 FrameTime: 125.000 ms
[terrain] <default>: FPS: 0 FrameTime: inf ms
[shadow] <default>: FPS: 3 FrameTime: 333.333 ms
[refract] <default>: FPS: 0 FrameTime: inf ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 402 FrameTime: 2.488 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 8 FrameTime: 125.000 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 447 FrameTime: 2.237 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 431 FrameTime: 2.320 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 420 FrameTime: 2.381 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 434 FrameTime: 2.304 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 436 FrameTime: 2.294 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 423 FrameTime: 2.364 ms
=======================================================
                                  glmark2 Score: 285 
=======================================================

The current code is in the csf branch of https://gitlab.com/panfork/mesa/.

3 Likes

Great work! Besides, is there any score comparison between arm blob driver?

There’s not much point in doing that yet… currently I wait for the GPU to power off between each frame so there’s an overhead of a few milliseconds each frame, so benchmarks will always be a lot worse. I think the blob is at least five times faster for things like glmark at the moment.

In terms of stability, it’s getting a lot better, and just a few minutes ago I fixed a bug with importing shared buffers, so now Weston, Sway and Mutter all work, and even Xwayland with acceleration can work to a degree, which the blob can’t do.

Also recently fixed is support for using the blob for clients when the compositor is using Panfrost, for non-wlroots compositors. This means that you can have (or will have, once I fix a few more issues) all the performance of the blob for Wayland GLES applications, but X11 can still be accelerated!

2 Likes

A few horrendous hacks later, and SuperTuxKart now runs, even with the fancy GLES3 renderer!

Which driver will win the race?

(Answer: They both run badly, because the real bottleneck is Weston’s screen capture. Maybe Sway+wf-recorder would work better, except that the blob doesn’t work with it unless I do some patching.)

5 Likes

Does this work with mainline kernel or do we still need Rockchip BSP?

@icecream95 Do you have kernel repo somewhere so everybody can try? :joy::rofl:

Any timeline for mainlining mesa and kernel part ? :joy::rofl:

It uses kbase, as panfrost.ko doesn’t yet have any code for firmware loading and the like. So the driver will not work with mainline. Running an out-of-tree kbase module on top of mainline might work, but then you have to work out how to fix the power management bits to get the GPU to actually power on.

That’s unfortunate, as I have found kbase to be a very broken kernel driver, at least when misused as I do… stack corruption, NULL dereferences, undefined instruction faults, spurious MMU faults causing CPU lockup, accessing user memory when it shouldn’t, etc. etc. etc. It really needs a rewrite in Rust!
I think at least some of the bugs were added by Rockchip, though.

I have made no changes to the kernel that are actually required, the “5.10” BSP kernel should be good enough.

It will probably be a long while before anything is upstreamed, don’t hold your breath.

Actually, it appears that rockchip-drm is causing problems at least as often as kbase… I wonder why my kernel is so unreliable, there shouldn’t be too many changes from the Radxa repository.

For example, here’s a bug message I got without using the GPU at all:

[  721.336988] BUG: Bad page map in process InputThread  pte:6800010038044b pmd:10037d003
[  721.337040] page:0000000081d76052 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100380
[  721.337047] head:0000000081d76052 order:1 compound_mapcount:0
[  721.337056] flags: 0x800000000001020a(referenced|dirty|slab|head)
[  721.337067] raw: 800000000001020a dead000000000100 dead000000000122 ffffff83fc509d00
[  721.337074] raw: 0000000000000000 0000000000200020 00000001fffffffe 0000000000000000
[  721.337080] page dumped because: bad pte
[  721.337086] addr:0000007f9254e000 vm_flags:140440fb anon_vma:0000000000000000 mapping:ffffff8103f4ae28 index:49e
[  721.337111] file:card0 fault:0x0 mmap:rockchip_gem_mmap readpage:0x0

Hi Thomas,

I could finally run a quick test with/without ASPM here. So I’ve plugged this WiFi card on it:

0002:21:00.0 Network controller [0280]: Intel Corporation Wireless 3160 [8086:08b4] (rev 93)

The whole board consumes 1.735W with aspm/policy=powersave, and 2.062W with performance, or 328mW difference.

Without the WiFi card (thus with the realtek NIC only), I’m seeing 1.605W in powersave mode vs 1.760W in performance mode, or a 155mW difference. This means the WiFi card on the M.2 slot was responsible for 173mW extra difference (+130mW in powersave, +302mW in performance mode).

I don’t know if the savings are on the controller side, the device side, or both. I suspect that they’re both though, at least because it could allow to stop the PCIe clock.

我发现一个瑕疵,
系统:
root@rock-5b:/home/rock# uname -a
Linux rock-5b 5.10.66-22-rockchip-g882edb720d40 #rockchip SMP Sat Sep 17 11:11:07 UTC 2022 aarch64 GNU/Linux
root@rock-5b:/home/rock#
使用reboot,系统无法正常的重启需要手动断电

2 Likes

@Stephen

Can you check this issue?

使用reboot指令,绿灯一直亮着,halt指令,绿灯也是一直亮

你用的是哪个镜像?我尝试复现看看。

Linux rock-5b 5.10.66-24-rockchip-gcb09ad15af75 #rockchip SMP Fri Sep 23 03:44:14 UTC 2022 aarch64 GNU/Linux,
rock-5b-debian-bullseye-xfce4-arm64-20220919-0912-gpt.img

当我将系统刷入nvme,并且更换为联想笔记本电源之后就正常了,可能是华为手机电源的问题,

Can some people using Radxa Debian go to https://webglsamples.org/persistence/persistence.html in Chromium and drag the persistence slider at the top all the way to the right?

What happens?

  • Hangs or reboots
  • Works fine

0 voters

(I’m interested in the default configuration of XFCE with GPU-accelerated Chromium using libmali. “Hang” is talking about a system hang, the webpage is supposed to stop visually updating when persistence reaches 1.0. If it is still working fine at that point, try dragging the slider back down to see if the system hangs then.)