Awesome work!
I’ve seen some dts to overclock rk3588 to higher values, but probably not that complete like here.
Hopefully we will get best values optimized in future release.
Earlier @tkaiser was able to get the best from this SoC, maybe he can share his methods?
16% more GPU performance for Panthor
I have these values??: I mean, how do i know at which frequency gpu is running?
oot@rock5b:/home/rock# cat /sys/kernel/debug/clk/clk_summary | grep GPU
scmi_clk_gpu 1 1 0 1000000000 0 0 50000 Y
clk_gpu_pvtm 0 0 0 24000000 0 0 50000 N
clk_gpu_src 3 3 0 198000000 0 0 50000 Y
clk_core_gpu_pvtm 0 0 0 198000000 0 0 50000 N
clk_gpu_stacks 1 3 0 198000000 0 0 50000 Y
clk_gpu_coregroup 1 3 0 198000000 0 0 50000 Y
clk_gpu 1 3 0 198000000 0 0 50000 Y
rock@rock5b:~$ glmark2-es2-wayland -b terrain
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'.
=======================================================
glmark2 2021.02
=======================================================
OpenGL Information
GL_VENDOR: ARM
GL_RENDERER: Mali-LODX
GL_VERSION: OpenGL ES 3.2 v1.g6p0-01eac0.ba52c908d926792b8f5fe28f383a2b03
=======================================================
[terrain] <default>: FPS: 319 FrameTime: 3.135 ms
=======================================================
glmark2 Score: 319
=======================================================
Nice work! Just out of curiosity, have you checked whether the power draw increases a bit by bumping the PLL to 4 GHz ? I think it should be negligible but possibly observable.
did not check it, but just pumping the PLL shouldnt increase the consumption, it is just a roaming clock. the consumption comes from a core that is attached to that clock with its own driver. in our case gpu will have divider 4 for with 1 ghz
if you are concerned about 4ghz, you might as well set it 1Ghz, it will apply the same affect.
that is actually interesting, you are running the blob driver.
when i dump the clock of gpu with blob driver i get the following
[alarm@alarm mmm]$ sudo python mmm.py get -c rk3588 -d CRU -r GPU_CLKSEL
-c rk3588 -d CRU -r GPU_CLKSEL -p div = 5, (default=0), (values=[0~31])
-c rk3588 -d CRU -r GPU_CLKSEL -p sel = GPLL, (default=GPLL), (values=GPLL,CPLL,AUPLL,NPLL,SPLL)
-c rk3588 -d CRU -r GPU_CLKSEL -p testout_div = 31, (default=0), (values=[0~31])
-c rk3588 -d CRU -r GPU_CLKSEL -p testout_mux = PLL, (default=PLL), (values=PLL,PVTM)
-c rk3588 -d CRU -r GPU_CLKSEL -p mux = PLL, (default=PLL), (values=PLL,PVTM)
-c rk3588 -d CRU -r GPU_CLKSEL -p reserved = 0, (default=0)
-c rk3588 -d CRU -r GPU_CLKSEL -p clock = 198 Mhz
[alarm@alarm mmm]$ sudo python mmm.py get -c rk3588 -d CRU -p clock
-c rk3588 -d CRU -r V0PLL_CON0 -p clock = 1188 Mhz
-c rk3588 -d CRU -r AUPLL_CON0 -p clock = 786 Mhz
-c rk3588 -d CRU -r CPLL_CON0 -p clock = 1500 Mhz
-c rk3588 -d CRU -r GPLL_CON0 -p clock = 1188 Mhz
-c rk3588 -d CRU -r NPLL_CON0 -p clock = 850 Mhz
-c rk3588 -d CRU -r GPU_CLKSEL -p clock = 198 Mhz
so the gpu clock is set to use PLL (not pvtm), the source is GPLL. and divider is 5+1=6, so the clock is 1188/6=198Mhz.
But i can not make sense of the glmark results. It is too high for 200Mhz.
My only theory is, blob driver is using smcc to set the clocks, and thus the requested clocks are set by the BL31. BL31 has different execution level than normal kernel, so somehow soc might have different IO base addr for the BL31 part. So what normal registers report should not be valid. In any case, a weird situation…
thats a very valid but very hard to answer question
It’s really not a concern, mostly a matter of curiosity. PLLs are free-running clocks controlled on their phase after the divide and at such frequencies they can usually draw a few milliamps. Thanks!
Just thinking, I’ve used opengl a tiny little bit several years ago and found that it was apparently possible to port generic code there, but the communication latency with the host was horrible for me (I really don’t know the right way to do things). Maybe it would be feasible to simply port the mhz utility to the GPU for this, if we find a way to accurately measure the processing time.
that should be somehow possible with the PMU (performance measuring unit) of the GPU, but the code will not be very portable i assume.
I applied your proposed tunning but there are no changes in the case of mali blob.
root@rock5b:/home/rock# cat /sys/kernel/debug/clk/clk_summary | grep gpu
scmi_clk_gpu 1 1 0 1000000000 0 0 50000 Y
clk_gpu_pvtm 0 0 0 24000000 0 0 50000 N
clk_gpu_src 3 3 0 666666667 0 0 50000 Y
clk_core_gpu_pvtm 0 0 0 666666667 0 0 50000 N
clk_gpu_stacks 1 3 0 666666667 0 0 50000 Y
clk_gpu_coregroup 1 3 0 666666667 0 0 50000 Y
clk_gpu 1 3 0 666666667 0 0 50000 Y
root@rock5b:/home/rock#
rock@rock5b:~$ glmark2-es2-wayland -b terrain
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'.
=======================================================
glmark2 2021.02
=======================================================
OpenGL Information
GL_VENDOR: ARM
GL_RENDERER: Mali-LODX
GL_VERSION: OpenGL ES 3.2 v1.g6p0-01eac0.ba52c908d926792b8f5fe28f383a2b03
=======================================================
[terrain] <default>: FPS: 320 FrameTime: 3.125 ms
=======================================================
glmark2 Score: 320
=======================================================
rock@rock5b:~$
@willy
Unfortunately, porting your MHz to a GPU is beyond my knowledge, but I can run it here if someone does.
yes mali blob wont take advantage out of it, because it is using PVTPLL i think i am also checking this in detail, may be i should update the title accrodingly to be more precise
It’s beyond my knowledge as well But it makes an interesting project I should consider. I have no idea where to start to run code on the GPU there, I’m totally ignorant of these things.
So, i would like to clarify more about what i have learned about the clock adventures of rk35xx.
In previous post i mentioned that GPU takes PLL frequency from either of CPLL, GPLL, AUPLL, V0PLL, SPLL
. This is not complete. For the small cores like i2c
, spi
, pcie
even correct but for bigger cores like CPU
, GPU
, NPU
etc, there is another PLL source called PVTPLL
.
PVTPLL
s are dedicated to the core, and not shared across diffrent cores, sometimes there are even multiple PVTPLL
for single core. (ie: CPU has different PVTPLL for litlle cores and for each big cores).
Unlike normal PLLs, PVTPLLs are meant to be dynamically configured with a twist.
PVTPLLs, gives the best possible frequency output for a given voltage
, temperature
, and chip quality.
Ie: you request 1Ghz from a PVTPLL, then you set the voltage to your target voltage, and start monitoring the PVTPLL circuit. PVTPLL runs a very small hardware benchmark circuit called ring oscillator and locks the frequency output to maximum possible. It can be 999Mhz, 950Mhz, or 1Ghz. Then the core gets this voltage and uses it.
Now comes the complicated part. This is my understanding someone may be correct me if i am wrong but, PVTPLL is not directly configured by the kernel. Instead it is configured by the BL31.
Kernel uses an interface called SMCCC to communicate with BL31, and request the frequency. BL31 sets the PVTPLL and configures the Core. This whole communication of BL31 with kernel is sometimes referrred as firmware
or scmi
. There are also other ways to communicate rather than smccc
but in our rk3588 it is smccc
.
So the initial problem with GPU clocks was not reaching to 1Ghz is that GPU was using normal PLL
s rather than PVTPLL
with Panthor driver. It seems that even though smi clock of gpu is defined in GPU block of the mainline DTS, it looks the me devfreq is not taking care of it. I think there needs to to be done something about this in mainline. When the issue in mainline is resolved i can also backport this to bsp hopefully.
When it comes to mali blob driver, it is actually using PVTPLL as a source and can sucessfully set the frequency to desired 1Ghz. However there seems to be still a problem. When you request a frequency from BL31 with PVTPLL, it is reporting the requested frequency as set frequency, not what PVTPLL provides., You can see that from the reference TF-A implementation of rk3588.
So how do we know what the actual frequency is?
when i probe the GPU_GRF
register with mmm
tool, i get directly a kernel crash. I interpret this as a security mechanism somehow since direct access to those registers from kernel or mmapped userspace is not allowed (theory). So my approach now to use pysmccc to probe the BL31, but use functions sip_smc_secure_reg_read and sip_smc_secure_reg_write callbacks to probe GPU_GRF
registers. Normally those callbask are meant to access OTP registers, but worth to give it a shot. I also dont know if they are even implemented in BL31 as well.
Awesome work!
I’ve got great result in excellent score in glmark2-wayland. Now 3093 with the commit and performance governor, was 28XX ish before.
rico [ ~ ]$ glmark2-wayland
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: Mesa
GL_RENDERER: Mali-G610 (Panfrost)
GL_VERSION: 3.1 Mesa 25.0.0-devel (git-7d41cfa1a9)
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 3772 FrameTime: 0.265 ms
[build] use-vbo=true: FPS: 4143 FrameTime: 0.241 ms
[texture] texture-filter=nearest: FPS: 5000 FrameTime: 0.200 ms
[texture] texture-filter=linear: FPS: 5014 FrameTime: 0.199 ms
[texture] texture-filter=mipmap: FPS: 4979 FrameTime: 0.201 ms
[shading] shading=gouraud: FPS: 3353 FrameTime: 0.298 ms
[shading] shading=blinn-phong-inf: FPS: 3294 FrameTime: 0.304 ms
[shading] shading=phong: FPS: 2975 FrameTime: 0.336 ms
[shading] shading=cel: FPS: 3212 FrameTime: 0.311 ms
[bump] bump-render=high-poly: FPS: 2016 FrameTime: 0.496 ms
[bump] bump-render=normals: FPS: 5057 FrameTime: 0.198 ms
[bump] bump-render=height: FPS: 4980 FrameTime: 0.201 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3224 FrameTime: 0.310 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 1634 FrameTime: 0.612 ms
[pulsar] light=false:quads=5:texture=false: FPS: 4919 FrameTime: 0.203 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 661 FrameTime: 1.515 ms
[desktop] effect=shadow:windows=4: FPS: 2618 FrameTime: 0.382 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 509 FrameTime: 1.966 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 510 FrameTime: 1.964 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 613 FrameTime: 1.633 ms
[ideas] speed=duration: FPS: 2284 FrameTime: 0.438 ms
[jellyfish] <default>: FPS: 2851 FrameTime: 0.351 ms
[terrain] <default>: FPS: 119 FrameTime: 8.463 ms
[shadow] <default>: FPS: 1895 FrameTime: 0.528 ms
[refract] <default>: FPS: 287 FrameTime: 3.493 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 4273 FrameTime: 0.234 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 3773 FrameTime: 0.265 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 4261 FrameTime: 0.235 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 4167 FrameTime: 0.240 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 3686 FrameTime: 0.271 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 4151 FrameTime: 0.241 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 4159 FrameTime: 0.240 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 3718 FrameTime: 0.269 ms
=======================================================
glmark2 Score: 3093
=======================================================
Hello, guys. Sorry for bothering, i have Orange Pi 5, not Radxa, but maybe there is helpful info about GPU perf. For Orange Pi 5 we have Ubuntu 24.04 from Joshua Riek with vender-based kernel 6.10 and Panfrost driver. There are also builds for Radxa ROCK 5.
And with it i have 4134 score from glmark2-wayland:
owner@Enterprise:~/Downloads/mmm$ glmark2-wayland
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: Panfrost
GL_RENDERER: Mali-G610 (Panfrost)
GL_VERSION: 3.3 (Compatibility Profile) Mesa 23.0.0-devel
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 4699 FrameTime: 0.213 ms
[build] use-vbo=true: FPS: 5526 FrameTime: 0.181 ms
[texture] texture-filter=nearest: FPS: 6912 FrameTime: 0.145 ms
[texture] texture-filter=linear: FPS: 6901 FrameTime: 0.145 ms
[texture] texture-filter=mipmap: FPS: 6899 FrameTime: 0.145 ms
[shading] shading=gouraud: FPS: 4745 FrameTime: 0.211 ms
[shading] shading=blinn-phong-inf: FPS: 4314 FrameTime: 0.232 ms
[shading] shading=phong: FPS: 3781 FrameTime: 0.265 ms
[shading] shading=cel: FPS: 3698 FrameTime: 0.270 ms
[bump] bump-render=high-poly: FPS: 2140 FrameTime: 0.467 ms
[bump] bump-render=normals: FPS: 6631 FrameTime: 0.151 ms
[bump] bump-render=height: FPS: 6506 FrameTime: 0.154 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3733 FrameTime: 0.268 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 1632 FrameTime: 0.613 ms
[pulsar] light=false:quads=5:texture=false: FPS: 6439 FrameTime: 0.155 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 1055 FrameTime: 0.948 ms
[desktop] effect=shadow:windows=4: FPS: 3502 FrameTime: 0.286 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 576 FrameTime: 1.738 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 579 FrameTime: 1.729 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 661 FrameTime: 1.514 ms
[ideas] speed=duration: FPS: 1969 FrameTime: 0.508 ms
[jellyfish] : FPS: 3314 FrameTime: 0.302 ms
[terrain] : FPS: 152 FrameTime: 6.617 ms
[shadow] : FPS: 2947 FrameTime: 0.339 ms
[refract] : FPS: 326 FrameTime: 3.070 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 6256 FrameTime: 0.160 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 5114 FrameTime: 0.196 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 6239 FrameTime: 0.160 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 6284 FrameTime: 0.159 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 5087 FrameTime: 0.197 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 6386 FrameTime: 0.157 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 6379 FrameTime: 0.157 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 5076 FrameTime: 0.197 ms
=======================================================
glmark2 Score: 4134
=======================================================
warning: queue 0xaaaafa025030 destroyed while proxies still attached:
wl_display@1 still attached
More info:
owner@Enterprise:~/Downloads/mmm$ sudo cat /sys/kernel/debug/clk/clk_summary | grep gpu
scmi_clk_gpu 1 1 0 1000000000 0 0 50000 Y
clk_gpu_pvtm 0 0 0 24000000 0 0 50000 N
clk_gpu_src 3 3 0 198000000 0 0 50000 Y
clk_core_gpu_pvtm 0 0 0 198000000 0 0 50000 N
clk_gpu_stacks 1 3 0 198000000 0 0 50000 Y
clk_gpu_coregroup 1 3 0 198000000 0 0 50000 Y
clk_gpu 1 3 0 198000000 0 0 50000 Y
owner@Enterprise:~/Downloads/mmm$ sudo python3 mmm.py get -c rk3588 -d CRU -p clock
-c rk3588 -d CRU -r V0PLL_CON0 -p clock = 1188 Mhz
-c rk3588 -d CRU -r AUPLL_CON0 -p clock = 786 Mhz
-c rk3588 -d CRU -r CPLL_CON0 -p clock = 1500 Mhz
-c rk3588 -d CRU -r GPLL_CON0 -p clock = 1188 Mhz
-c rk3588 -d CRU -r NPLL_CON0 -p clock = 850 Mhz
-c rk3588 -d CRU -r GPU_CLKSEL -p clock = 0 Mhz
Maybe into this Ubuntu are already some kernel/driver hacks that we can boost to get some more perf? Like overclocking or such.
Link for the repo — GitHub - Joshua-Riek/ubuntu-rockchip: Ubuntu for Rockchip RK35XX Devices
On joshuas kernel default driver is mali ddk, and user space is panfork. Mali ddk uses pvtpll, therefore clocks of the gpu should be more or less fine.
Sorry to bother you guys here, I don’t wanna stain this beautiful thread with a stupid question, but is the GPU on the RK3588S so much different from the one in the RK3588?
How are you guys getting >3000 or even >4000 FPS?
My 5C barely does, like, 1600 where you get 3000-4500 and 3200 where you get 6600-6900 in glmark2-wayland (Joshua’s Ubuntu)…
There is no difference between these GPUs
So I thought. But then what’s up with the differences in performance?
That difference largely comes from the compositor and desktop environment (including e.g. X11 vs. Wayland). The tests themselves are too simple to put any significant load on that beastly GPU, so what you see is, basically, how quickly the application can swap buffers… Don’t look at those scores, as you won’t see that kind of difference in real life.
What you should look at is the score in the terrain
test — that’s the only one that represents real performance.