Talk about Zero2

stuartiannaylor · September 14, 2022, 1:13pm

Would that not be a very small upgrade over the S905Y2 Radxa Zero? Apart from the NPU and A55 vs A53 there is not an awful lot of difference but I could be wrong but I think the memory bottle neck to the GPU in the S905D3 was fixed and even though the same Mali-G31 MP2 the S905D3 might in operation provide better results but with my memory I would have to check if that is right as a foggy memory now.
I guess if the A311D is not cost effective its not cost effective as the Khadas Vim3 Basic (2gb) is $10 less than a Rock5b (4gb) which is prob a hard sell as when avail I know which one I would prefer.

c0rnelius · September 14, 2022, 7:42pm

I agree, this makes for a far less intriguing sbc. Plus the A311D is an hexa-core processor with four Cortex A73 cores and two Cortex A53 cores. Also wouldn’t that be dropping the possibility of having CAM support by using a S905D3? I believe so.

RadxaYuntian · September 15, 2022, 1:56am

The primary issue with Zero from our users is the lack of CSI and DSI interfaces. S905D3 plugs those 2 holes with a basic NPU, so it could be a mid-step between S905Y2 and A311D. We also did not plan any SKU with less than 4GB memory on Zero 2 since the SoC is so premium, so S905D3 could allow us to fill the price gap in between on a largely similar/exact same design.

I do not believe the intention is to drop A311D altogether. The only thing we did with S905D3 so far is just a simple chip swap and a test boot with A311D image (it didn’t boot). So if we are serious about retire A311D entirely we would surely do more than that.

stuartiannaylor · September 15, 2022, 3:21am

I actually kindly received a Zero2 pre release board and its a great little board and have been wondering what happened to release as things just seemed to stop.
My main curiosity was price due to the khadas pricing, I quite like the Khadas Vim3 but IMO the Rock5B pricing really kills its attractiveness, as that is a hell of a lot more SBC for $10 more.
From memory I got in my head a max of $75 for the A311d and if its premium is above that then likely its a dead duck.
I am hoping for a Rock5A with the RK3588S onboard that will be less cost than the Rock5B that makes me think the $75 mark is approx right or maybe there are better options of not much different cost to go for.

Premium keeps being stressed and this is making me think otherwise and maybe its just not a wise idea as the Radxa product line maybe already bloated with functionally & performance wise quite similar offerings which must kill economies of scale and partition user and support bases that are a big part of Raspberry success.

If a S905D3 2GB is nearer the RadxaZero price than maybe a A311D is nearer a Rock5b it could be option if the CSI & DSI interfaces are such a big issue. Can it be even nearer Zero price and maybe even provide a 1GB model? Also maybe think of dropping the 40pin GPIO for something of higher density that can use optional breakout board of standard Pi pinout to stop pin mux revision musical chairs ( I like what Khadas did with the edge2 with 2x 30 & pogo pads and been wondering for a while if onboard 40pin is now just a awkward legacy that can be just an optional daughter board).
If a SBC is not 100% pi format to garner drop in place compliance then why try with what is really a peculiar format based on early pi legacy than any real advantage?

Raspberries achilles heel is the VC4/6 GPU as it sucks big style and whatever they may claim its still a pretty mediocre desktop experience / retro gaming experience that maybe if your not a Pi Fan boy DSI is important to garner lower cost hat based displays but CSI likely a lesser need as Raspberry has $15 Zero2 with a CSI-2 that can provide the headless type of apps relatively easy even if it lacks an onboard NPU.

Things have gone bat-shit-crazy as 1st we had the silicon shortage and now we have global inflation and I am wondering if Raspberries current out of stock status is more due to cost than silicon availability.
If your having such reservations maybe it is better to park that bus than implement another SBC competing in the same space as other Radxa products whilst further splitting your own user base.

Plastic.Panda · September 15, 2022, 5:49am

Stuartiannaylor has some great points. There are plenty of boards with GPIO that I almost never use (except to power fans…) and CSI/DSI which I never use. Unique and space-saving features like pogo-pins and 200-pin SO-DIMM connectors like those used on compute modules would be much more welcome.

The Rock5 model B is incredible but limited pre-orders mean we are already seeing artificial inflation ballooning prices well into the $200+ range. The Zero 2 presents a more affordable option, finally with enough processing power to work as a tiny, low-wattage PC replacement, beating every competitor in that form factor. The A311D is crucial to maintaining that performance advantage, and I’m sure many people would be willing to pay extra for it. (Within reason, of course.)

I understand that there are many different use-cases, but I greatly value the additional CPU cores, 4GB of RAM (the bare minimum for my needs), TF card slot, Wi-Fi, and good USB-C ports. In the future, I dream of M.2 support, MicroSD Express, USB4, 8GB of RAM, etc… But for now, just getting a Zero2 (with the A311D) before the holidays would be wonderful.

stuartiannaylor · September 15, 2022, 4:59pm

PS can someone help with my memory as I can not remember if it was the G31-Mp2 of the Zero or the G52-MP4 of the A311D produced much worse results than expected due to the arrangement Amlogic had chosen for the memory bus which acted a choke point.

I think it might be the g52-MP4 on the A311D but maybe it was the original zero and the S905D3 is very similar to the S905Y2 but received a better GPU/Memory implementation and the 3rd iteration was largely to fix that oversight and a few other additions that the D chips give over the dongle type Y ones.
So even though the GPU looks identical the S905D3 is much faster in operation or its the A311D that still retains a choke point.
I could google it but bet someone on here has read all about it also but doesn’t have my level of amnesia.

With some further thoughts to the Zero2 and the Zero there seems to be a concerted effort to clone raspberry named products but be far off in terms of hardware clone or at least drop in compatible product.
Radxa Zero was a pretty close clone of a much more powerful SoC S905Y2 that they crammed into a zero format sized board that the closest hardware wise it came was approximating a Zero.
The best product by far Radxa does and I am eagerly awaiting its arrival is when finally they stopped trying to make bad clones of the raspberry pi and create from scratch their own killer format and design The Rock5b is absolute amazing and hopefully there will be a Rock5a (not cm module but a cut down basic no frills rk3588s sbc) that likely competes in the same price space as the A311D and makes it a redundant product.

But also the Zero2 needed to be a bigger deeper board with 2x USB C and has absolutely no drop in compatibility and is Zero because its a bigger brother using an Amlogic chip to the Zero and that where it stops.
If you look at the data sheet of the S905Y2 or S905D3 or A311D you will see how feature rich these SoCs are and that most of its been cast off so it can bear some obscure resemblance to a Pi product that range really from not very close to completely not whilst throwing away a huge array of features in the effort.

The A311D is hugely feature rich and so much is cast off to fit into a completely non zero format no wonder its termed as a premium product in comparison to $15 Zero2 because the chosen SoC and destination design is an extremely ill fit.

You need to to design around a SoC not have a design format that you try and find a SoC to squeeze into that resembles another product.
You need to maximise its feature list but make options modular via daughter boards likely on short FFC high density ribbon cables / pogo pads to stackable shims such as GPIO, ethernet, 8 channel audio and the vast array of interfaces a modern SoC embeds and maybe combinations.
This means product with much better GPU’s than raspberries Achilles heel of VC4/6 can have cost effective base units focusing on what beats Raspberry that has a series of shims that stack because of the ease of FCC ribbons that are high density and low cost.

“But for now, just getting a Zero2 (with the A311D) before the holidays would be wonderful” is a really bad idea to rush out what is an extremely bad fitting product that likely could end up with so many revisions it could end up in a similar cul-de-sac that happened to the rock-pi-s where the soc is great but maybe the board implementation not so, that went through revision musical chairs so often, its a product I have lost interest in.
This we are struggling with the premium price of the A311D where likely the zero premise was wrong for that premium soc anyways and furthering that by we will keep the ill fitting premise but now force the round peg of a S905D3 into the square peg of a completely non standard “zero” format board with a 40 pin GPIO that still jetsons the majority of the SoCs interfaces and rush to get one produced because of slight embarrassment to jettison the initial idea as bad, because it’s very far from being cost effective…

If your going to employ a S905D3 then go back to the design board cast off any raspberry zero aspirations and create a Radxa design that is best for that SoC with a sensible release schedule as that from what I have seen is when Radxa are at their best.

ustix · September 17, 2022, 7:36pm

I think Radxa Zero 2 is much more interesting, also the range of sales is wider, in my opinion, so it seems to be a strange choice.

stuartiannaylor · September 18, 2022, 11:44am

PS it was the G52-MP4 of the A311D that is choked by its memory arrangement where the Mali-T860 MP4 of the ROCK 4 SE is supposedly faster.

Some of the GLmark2 benchmarks on the RadxaZero2 where good but not that great and I think that added to the problem of its ‘premium’ price.

tkaiser · September 18, 2022, 1:13pm

Sure? The 4 SE is based on RK3399T and there according to datasheet max GPU frequency is 600 MHz and not 800 MHz as with RK3399.

And since mbw numbers were referenced (‘S922x: 4.8 GiB/s RK3399: 6.6 GiB/S’). What do these mbw numbers represent? -t0: memcpy() test, -t1: dumb (b[i]=a[i] style) test, -t2: memcpy() with arbitrary block size

This is a quick search through my sbc-bench results collection. Not generated by mbw but tinymembench:

VIM3/A311D:

Kernel	Clockspeeds	memcpy	memset
4.9	2208/1800 MHz	4600 MB/sec	8990 MB/sec
4.9	2208/1800 MHz	4660 MB/sec	9230 MB/sec
4.9	2208/1800 MHz	4660 MB/sec	9280 MB/sec
4.9	2208/1800 MHz	4690 MB/sec	9280 MB/sec
4.9	2400/2100 MHz	5080 MB/sec	9350 MB/sec
5.10	2400/2016 MHz	4370 MB/sec	6720 MB/sec
5.10	2400/2016 MHz	4420 MB/sec	6640 MB/sec
5.10	2400/2016 MHz	4770 MB/sec	6580 MB/sec
5.10	2400/2016 MHz	4770 MB/sec	6580 MB/sec
5.10	2400/2016 MHz	4840 MB/sec	8260 MB/sec
5.10	2400/2016 MHz	4850 MB/sec	7370 MB/sec
5.10	2400/2016 MHz	4850 MB/sec	7380 MB/sec
5.10	2400/2016 MHz	4850 MB/sec	8100 MB/sec
5.16	2208/1800 MHz	5000 MB/sec	9560 MB/sec
5.17	2208/1800 MHz	4800 MB/sec	9330 MB/sec
5.17	2208/1800 MHz	4860 MB/sec	9150 MB/sec
5.18	2208/1800 MHz	5000 MB/sec	9840 MB/sec
5.18	2208/1800 MHz	5020 MB/sec	9650 MB/sec
5.18	2208/1800 MHz	5070 MB/sec	9460 MB/sec

ODROID-N2/S922:

Kernel	Clockspeeds	memcpy	memset
5.10	1992/1908 MHz	3740 MB/sec	7500 MB/sec
5.10	1992/1908 MHz	4250 MB/sec	9090 MB/sec
5.10	1992/1908 MHz	4260 MB/sec	9080 MB/sec
5.10	1992/1908 MHz	4260 MB/sec	9080 MB/sec
5.10	1992/1908 MHz	4270 MB/sec	7670 MB/sec
5.15	1908/1800 MHz	3900 MB/sec	7440 MB/sec
5.15	1992/1908 MHz	3910 MB/sec	7700 MB/sec
5.15	1992/1908 MHz	3990 MB/sec	7970 MB/sec
5.15	2004/1992 MHz	3820 MB/sec	7790 MB/sec
5.15	2004/1992 MHz	3850 MB/sec	7630 MB/sec
5.15	2004/1992 MHz	3850 MB/sec	7710 MB/sec
5.17	1992/1908 MHz	4190 MB/sec	8690 MB/sec

ODROID-N2/S922-X:

Kernel	Clockspeeds	memcpy	memset
4.9	2400/2016 MHz	3850 MB/sec	5970 MB/sec
5.10	2400/2016 MHz	3770 MB/sec	7610 MB/sec
5.10	2400/2016 MHz	3770 MB/sec	7620 MB/sec
5.10	2400/2016 MHz	3910 MB/sec	7220 MB/sec
5.10	2400/2016 MHz	3980 MB/sec	7670 MB/sec
5.10	2400/2016 MHz	3990 MB/sec	7460 MB/sec
5.10	2400/2016 MHz	4000 MB/sec	6980 MB/sec
5.10	2400/2016 MHz	4000 MB/sec	7030 MB/sec
5.10	2400/2016 MHz	4020 MB/sec	7140 MB/sec
5.10	2400/2016 MHz	4020 MB/sec	7320 MB/sec
5.10	2400/2016 MHz	4030 MB/sec	7120 MB/sec
5.10	2400/2016 MHz	4030 MB/sec	7690 MB/sec
5.10	2400/2016 MHz	4030 MB/sec	7690 MB/sec
5.10	2400/2016 MHz	4070 MB/sec	7220 MB/sec
5.10	2400/2016 MHz	4090 MB/sec	7170 MB/sec
5.10	2400/2016 MHz	4140 MB/sec	7410 MB/sec
5.10	2400/2016 MHz	4140 MB/sec	7710 MB/sec
5.10	2400/2016 MHz	4160 MB/sec	7680 MB/sec
5.10	2400/2016 MHz	4180 MB/sec	7700 MB/sec
5.10	2400/2016 MHz	4190 MB/sec	7690 MB/sec
5.10	2400/2016 MHz	4200 MB/sec	7680 MB/sec
5.10	2400/2016 MHz	4210 MB/sec	7730 MB/sec
5.10	2400/2016 MHz	4220 MB/sec	7730 MB/sec
5.10	2400/2016 MHz	4220 MB/sec	7730 MB/sec
5.10	2400/2016 MHz	4240 MB/sec	7740 MB/sec
5.10	2400/2016 MHz	4290 MB/sec	7730 MB/sec
5.14	2400/2016 MHz	4030 MB/sec	7120 MB/sec
5.15	2400/2016 MHz	4000 MB/sec	7660 MB/sec
5.15	2400/2016 MHz	4010 MB/sec	7680 MB/sec
5.15	2400/2016 MHz	4030 MB/sec	7700 MB/sec
5.15	2400/2016 MHz	4040 MB/sec	7680 MB/sec
5.15	2400/2016 MHz	4100 MB/sec	7730 MB/sec
5.15	2400/2016 MHz	4140 MB/sec	7720 MB/sec
5.16	2400/2016 MHz	3960 MB/sec	7610 MB/sec
5.16	2400/2016 MHz	4160 MB/sec	7460 MB/sec
5.16	2400/2016 MHz	4190 MB/sec	7470 MB/sec
5.16	2400/2016 MHz	4200 MB/sec	7470 MB/sec
5.16	2400/2016 MHz	4200 MB/sec	7470 MB/sec
5.16	2400/2016 MHz	4200 MB/sec	7480 MB/sec
5.16	2400/2016 MHz	4210 MB/sec	7410 MB/sec
5.16	2400/2016 MHz	4210 MB/sec	7420 MB/sec
5.16	2400/2016 MHz	4210 MB/sec	7460 MB/sec
5.16	2400/2016 MHz	4210 MB/sec	7470 MB/sec
5.16	2400/2016 MHz	4210 MB/sec	7480 MB/sec
5.16	2400/2016 MHz	4210 MB/sec	7480 MB/sec
5.16	2400/2016 MHz	4210 MB/sec	7480 MB/sec
5.16	2400/2016 MHz	4220 MB/sec	7450 MB/sec
5.16	2400/2016 MHz	4220 MB/sec	7460 MB/sec
5.16	2400/2016 MHz	4220 MB/sec	7460 MB/sec
5.17	2400/2016 MHz	4020 MB/sec	7690 MB/sec

stuartiannaylor · September 18, 2022, 2:27pm

You can read about what Collabora found and what is in the Khadas forums.

I ran GLmark2 on the Radxa Zero2 and it was good but not great as some might expect.

radxa@radxa:~/mesa$ glmark2
=======================================================
    glmark2 2021.12
=======================================================
    OpenGL Information
    GL_VENDOR:      Panfrost
    GL_RENDERER:    Mali-G52 (Panfrost)
    GL_VERSION:     3.1 Mesa 21.3.8 (git-813ee839be)
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 772 FrameTime: 1.295 ms
[build] use-vbo=true: FPS: 847 FrameTime: 1.181 ms
[texture] texture-filter=nearest: FPS: 895 FrameTime: 1.117 ms
[texture] texture-filter=linear: FPS: 895 FrameTime: 1.117 ms
[texture] texture-filter=mipmap: FPS: 905 FrameTime: 1.105 ms
[shading] shading=gouraud: FPS: 686 FrameTime: 1.458 ms
[shading] shading=blinn-phong-inf: FPS: 701 FrameTime: 1.427 ms
[shading] shading=phong: FPS: 601 FrameTime: 1.664 ms
[shading] shading=cel: FPS: 623 FrameTime: 1.605 ms
[bump] bump-render=high-poly: FPS: 367 FrameTime: 2.725 ms
[bump] bump-render=normals: FPS: 1044 FrameTime: 0.958 ms
[bump] bump-render=height: FPS: 1021 FrameTime: 0.979 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 768 FrameTime: 1.302 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 345 FrameTime: 2.899 ms
[pulsar] light=false:quads=5:texture=false: FPS: 857 FrameTime: 1.167 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 308 FrameTime: 3.247 ms
[desktop] effect=shadow:windows=4: FPS: 812 FrameTime: 1.232 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 95 FrameTime: 10.526 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 87 FrameTime: 11.494 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 228 FrameTime: 4.386 ms
[ideas] speed=duration: FPS: 445 FrameTime: 2.247 ms
[jellyfish] <default>: FPS: 583 FrameTime: 1.715 ms
[terrain] <default>: FPS: 39 FrameTime: 25.641 ms
[shadow] <default>: FPS: 439 FrameTime: 2.278 ms
[refract] <default>: FPS: 79 FrameTime: 12.658 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 796 FrameTime: 1.256 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 795 FrameTime: 1.258 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 796 FrameTime: 1.256 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 798 FrameTime: 1.253 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 793 FrameTime: 1.261 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 793 FrameTime: 1.261 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 791 FrameTime: 1.264 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 789 FrameTime: 1.267 ms
=======================================================
                                  glmark2 Score: 630 
=======================================================

radxa@radxa:~/mesa$ glmark2-es2
=======================================================
    glmark2 2021.12
=======================================================
    OpenGL Information
    GL_VENDOR:      Panfrost
    GL_RENDERER:    Mali-G52 (Panfrost)
    GL_VERSION:     OpenGL ES 3.1 Mesa 21.3.8 (git-813ee839be)
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 816 FrameTime: 1.225 ms
[build] use-vbo=true: FPS: 1019 FrameTime: 0.981 ms
[texture] texture-filter=nearest: FPS: 1061 FrameTime: 0.943 ms
[texture] texture-filter=linear: FPS: 1060 FrameTime: 0.943 ms
[texture] texture-filter=mipmap: FPS: 1077 FrameTime: 0.929 ms
[shading] shading=gouraud: FPS: 818 FrameTime: 1.222 ms
[shading] shading=blinn-phong-inf: FPS: 826 FrameTime: 1.211 ms
[shading] shading=phong: FPS: 711 FrameTime: 1.406 ms
[shading] shading=cel: FPS: 728 FrameTime: 1.374 ms
[bump] bump-render=high-poly: FPS: 409 FrameTime: 2.445 ms
[bump] bump-render=normals: FPS: 1283 FrameTime: 0.779 ms
[bump] bump-render=height: FPS: 1237 FrameTime: 0.808 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 777 FrameTime: 1.287 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 346 FrameTime: 2.890 ms
[pulsar] light=false:quads=5:texture=false: FPS: 984 FrameTime: 1.016 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 305 FrameTime: 3.279 ms
[desktop] effect=shadow:windows=4: FPS: 817 FrameTime: 1.224 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 90 FrameTime: 11.111 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 89 FrameTime: 11.236 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 224 FrameTime: 4.464 ms
[ideas] speed=duration: FPS: 473 FrameTime: 2.114 ms
[jellyfish] <default>: FPS: 630 FrameTime: 1.587 ms
[terrain] <default>: FPS: 41 FrameTime: 24.390 ms
[shadow] <default>: FPS: 477 FrameTime: 2.096 ms
[refract] <default>: FPS: 86 FrameTime: 11.628 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 906 FrameTime: 1.104 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 909 FrameTime: 1.100 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 907 FrameTime: 1.103 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 909 FrameTime: 1.100 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 862 FrameTime: 1.160 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 901 FrameTime: 1.110 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 903 FrameTime: 1.107 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 862 FrameTime: 1.160 ms
=======================================================
                                  glmark2 Score: 713 
=======================================================

I got rid of my RockPi4 a long time ago so can no longer post GLmark2 or ES on that but I think generally people where expecting it to come some where in the middle between of the Rock4 & Rock5 but as you will see in the above with the images current falls a long way short.

It was Alyssa Rosenweig who was working on the Mesa drivers at the time mentioned about the memory architecture was a choke point for the GPU as that is why figures are lower than what they expected for a G52MP4 you can not seem to get the full tweet anymore and as per usual with my memory I have forgot the full details, but didn’t matter as the glmarks where much less than expected.

As you can see above though the GPU aint that great and many where expecting more from the G52 and maybe someone will post current RK3399 glmark2 / es scores to compare to the above so we have a like for like as its the gpu/memory architecture I am talking about not just a series of mem speed tests.

The G32-mp2 on the Radxa Zero was posting Glmark2 scores of approx 385 and many expected with a G52 MP4 much more than a bit less than double and it was slightly disappointing with the Rock5b managing approx x10 the Radxa Zero and x5 the Zero 2, so yeah it was a long way from being between the 2.

tkaiser · September 18, 2022, 3:10pm

No I can’t since there are just some numbers posted and we neither know what these numbers mean nor how they were generated. I understand the claim (you constantly repeat) but I don’t see numbers really backing this.

On modern SoCs there’s devfreq support for all sorts of cores (CPU, GPU, NPU) and memory. On my Rock 5B with performance dmc governor I get 10830 MB/s memcpy reported by tinymembench/sbc-bench. Now to mbw:

root@rock-5b:/home/tk# taskset -c 5 mbw -t0 256 | grep ^AVG
AVG	Method: MEMCPY	Elapsed: 0.07990	MiB: 256.00000	Copy: 3203.905 MiB/s
root@rock-5b:/home/tk# taskset -c 5 mbw -t0 4096 | grep ^AVG
AVG	Method: MEMCPY	Elapsed: 0.56973	MiB: 4096.00000	Copy: 7189.415 MiB/s

One time 3.2 GB/sec, one time +7 GB/sec. What’s the difference? The arraysize_in_MiB parameter one time being small and the other larger? Only indirectly since what we really see is dmc governor at work. Above this was with dmc_ondemand switching somewhat dynamically between between 528 MHz and 2112 MHz with LPDDR4 RAM.

With powersave (528 MHz) it looks like this:

root@rock-5b:/home/tk# taskset -c 5 mbw -t0 256 | grep ^AVG
AVG	Method: MEMCPY	Elapsed: 0.07938	MiB: 256.00000	Copy: 3224.982 MiB/s
root@rock-5b:/home/tk# taskset -c 5 mbw -t0 4096 | grep ^AVG
AVG	Method: MEMCPY	Elapsed: 1.28748	MiB: 4096.00000	Copy: 3181.400 MiB/s

And with performance (2112 MHz) like this:

root@rock-5b:/home/tk# taskset -c 5 mbw -t0 256 | grep ^AVG
AVG	Method: MEMCPY	Elapsed: 0.03323	MiB: 256.00000	Copy: 7703.511 MiB/s
root@rock-5b:/home/tk# taskset -c 5 mbw -t0 4096 | grep ^AVG
AVG	Method: MEMCPY	Elapsed: 0.54355	MiB: 4096.00000	Copy: 7535.656 MiB/s

Also measured memory bandwidth is massively influenced by CONFIG_HZ kernel config. You can get numbers differing factor 2 depending between CONFIG_HZ=100 and CONFIG_HZ=1000.

And as we’ve seen above with dynamic memory clocking mbw behaves somewhat like a RNG depending on arraysize_in_MiB sizes. This is CPU. What about similar mechanisms with GPU?

Point is: you can generate with a CPU bound memory bandwidth tester numbers that might be completely irrelevant for what the GPU cores do.

As for your glmark numbers with A311D should glmark2-es2-wayland report same numbers as glmark2-es2?

stuartiannaylor · September 18, 2022, 3:41pm

You provide them as I can not be bothered as generally its known the A311D and GPU’s in that series provided less that what was expected and I will go on what was posted before as I stopped testing several months ago as it became obvious the Zero2 had hit a cul-de-sac.

There are likely many enthusiasts who would want another board for board sakes but without some long pointless discourse for Radxa I think the Zero2 due to its ‘Premium’ problems is a dead duck and even though super cute it was good but GPU wise it was a slight disappointment to expectations.

My take is scrap the Zero2 idea completely as the S905D3 in zero2 format is an even worse idea when there is the possibility of a super stripped down cost conscious possibility of 2/4/8gb RK3588s in the form of a Rock5a that is 100% Radxa design without the shackles of a zero moniker.

Thats what I think and that is my personal advice to Radxa than adding more clutter of another SBC that is not much different to what is available whilst retaining a ‘premium’.

You as always can post as many tangential benchmarks from your benchmark suite you may deem fit, but I am not that interested and will likely not reply, but thanks for what you have provided.

tkaiser · September 18, 2022, 6:20pm

…continue to post nonsense all over the place?

Ok, there’s this piece of text you blindly trust into (full twitter ‘thread’ archived here)

‘With Panfrost, the S922x is slower than the older RK3399. I’m investigating this at @Collabora. The suspect? Not enough memory bandwidth. The evidence? mbw, a benchmark. The verdict? Guess they’re both slow. S922x: 4.8 GiB/s RK3399: 6.6 GiB/S Apple M1: 30.2 GiB/s’ (archived)

While I tried to generate some understanding why these mbw numbers are questionable you have no interest getting into details. Great.

Below memory bandwidth measured in two different ways (tinymembench using highest value that could be found per SoC and Alyssa’s mbw numbers):

SoC	Clockspeeds	tinymembench	mbw
A311D	2210 MHz	5050 MB/s	?
S922X	? MHz	4220 MB/s	4.8 GiB/s
RK3399	? MHz	3700 MB/s	6.6 GiB/s
M1 Pro	3000 MHz	27000 MB/s	30.2 GiB/s

The 5050 MB/s are from Radxa Zero 2 BTW…

1st insight: according to tinymembench memory bandwidth on A311D is 20% higher than S922X even when CPU cores are clocked lower (CPU clockspeed has a significant impact on ‘measured’ memory bandwidth)
2nd insight: according to tinymembench memory bandwidth with RK3399 is 12% lower than S922X (or even 27% lower compared with A311D)
3rd insight: mbw shows the opposite: RK3399 having a 38% higher memory bandwidth compared to S922X. Who’s right?

Since Alyssa is telling ‘mbw is used with 512 MB blocks to accommodate the memory pressure on the Mali boards’ let’s give it a try:

root@nanopim4:~# taskset -c 4 mbw -t2 512 -b 536870912
Long uses 8 bytes. Allocating 2*67108864 elements = 1073741824 bytes of memory.
Using 536870912 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0	Method: MCBLOCK	Elapsed: 0.20776	MiB: 512.00000	Copy: 2464.370 MiB/s
1	Method: MCBLOCK	Elapsed: 0.20726	MiB: 512.00000	Copy: 2470.363 MiB/s
2	Method: MCBLOCK	Elapsed: 0.20691	MiB: 512.00000	Copy: 2474.554 MiB/s
3	Method: MCBLOCK	Elapsed: 0.20754	MiB: 512.00000	Copy: 2467.006 MiB/s
4	Method: MCBLOCK	Elapsed: 0.20725	MiB: 512.00000	Copy: 2470.506 MiB/s
5	Method: MCBLOCK	Elapsed: 0.20590	MiB: 512.00000	Copy: 2486.608 MiB/s
6	Method: MCBLOCK	Elapsed: 0.20793	MiB: 512.00000	Copy: 2462.343 MiB/s
7	Method: MCBLOCK	Elapsed: 0.20763	MiB: 512.00000	Copy: 2465.961 MiB/s
8	Method: MCBLOCK	Elapsed: 0.20732	MiB: 512.00000	Copy: 2469.660 MiB/s
9	Method: MCBLOCK	Elapsed: 0.20754	MiB: 512.00000	Copy: 2467.042 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.20730	MiB: 512.00000	Copy: 2469.824 MiB/s

That’s not even 2.5 GiB/s. On another RK3399 I even got just this

AVG	Method: MCBLOCK	Elapsed: 2.47109	MiB: 512.00000	Copy: 207.196 MiB/s

What happened? Throttling/zram. Stuff you discover once you monitor benchmark execution. Now measure on NanoPi M4 also with tinymembench to compare:

 standard memcpy                                      :   2821.3 MB/s (0.2%)

Results in this (RK3399 running with 4.4 BSP kernel and the A72 only running at 1800 MHz):

SoC	Clockspeeds	tinymembench	mbw
A311D	2208 MHz	5050 MB/s	?
S922X	? MHz	4220 MB/s	4.8 GiB/s
RK3399	1800 MHz	2821.3 MB/s	2.5 GiB/s
M1 Pro	3030 MHz	27000 MB/s	30.2 GiB/s

Now this starts to make sense. But I’m sure you will continue to spread the urban myth ‘A311D is choked by its memory arrangement’ and RK3399 would show way higher memory bandwidth than A311D, right?

stuartiannaylor · September 18, 2022, 6:53pm

No that is you creating an urban myth that the ‘A311d’ is choked by its memory arrangement.

What is choked is the G52-MP4 that doubles in cores goes up in spec from G52 from G32 and still the GLmark2 scores are not that great.
Repeatedly, like multiple times its been said there is something about the Gpu/Memory architecture that holds it back but it has never been said what that is apart from it aint that great as I did run GLmark2 on a Radxa Zero2 and yes like Collabora said it was less than expected.

So you carry on as usual paraphrasing what others said with more tangential false claims, posting more pointless benchmarks from your own benchmark suite for no other reason to validate your own self.

The GPUs even though they are G52s on that series of Amlogic chips are actually underpar and that is no myth, why or what has not been exactly said.
Like the Vim3 the simple take is its a nice SBC but it really isn’t worth that premium and pretty much the Rock5b proves that.

Thx so much once more for your wonderful if tangential info but there is a simpler truth here and why Radxa have frozen the A311D Zero2 likely due mainly to premium price as the 4gb Vim3 is a whopping $159.90!

So please don’t paraphrase and try and change what I have said so that you can spam once more your benchmark suite which I have no interest in or really has any context in the GPU and ROI like the Vim3 is pretty poor and hence why Radxa keep mentioning ‘Premium’.

Please Radxa scrap the Zero2 and focus on a 100% radxa Rock5A that has a great GPU and great ROI.

TK I could not give a damn what you have to say or what pointless benchmark you want to make an argument out of, so fire away and argue with yourself if you wish but that is the last reply to you.

tkaiser · September 18, 2022, 7:16pm

To be more precise Alyssa never talked about choked memory ‘architecture’, ‘bus’ or ‘arrangement’ but simply suspected memory bandwidth being to blame and used a CPU memory benchmark with some strange numbers for this as ‘evidence’. But this conclusion is highly questionable as we’ve seen

BTW: tinymembench unlike mbw has been invented by a guy interested in graphics performance (not just CPU) especially with looking at bandwidth issues in mind. That’s why it’s not only measuring memcpy but a bunch of other things as well.

There’s so much to learn every day if not being biased, isn’t it?

stuartiannaylor · September 18, 2022, 7:37pm

There is no bias as the GPU on the A311D is relative c-rap and the Mali-G610 MP4 on the RK3588 proves that as the difference is massively out of proportion to the architectural gains Arm purports.

No one is that bothered to exactly what is causing this apart from as opposed to mobile bench’s of other silicon implementations of the g52 the amlogic one is poor and that is what is important and has no bias as look at the results of the RK3588 and its price range…

Your benchmarks mean nothing and you have posted a whole load of pointless as the amlogic g52-mp4 is ok but for a g52-mp4 the results it provides are pretty bad to what the majority expected.

tkaiser · September 18, 2022, 7:47pm

Except you constantly badmouthing A311D’s memory bandwidth

My ‘free service’ here is just some sort of due diligence in the form of questioning strange numbers on which hasty conclusions seem to be based. I honestly don’t care about different GPUs or pricing and I also can’t provide all this free SBC design and market analysis consultancy to Radxa like you

stuartiannaylor · September 18, 2022, 7:55pm

Its not me who keeps mentioning ‘premium’ that is radxa and the idea of putting a s905d3 into what is a zero2 is completely derailed.
That would be a really bad idea and if the A311D is so ‘premium’ then it should be said that it never really met expectations at least with the GPU.
That is important to say and this is typical of your self interest as when it is important to say you turn the point into a sideshow.

Like I always say often you provide some great info and its such a shame about the rest.

I am replying to a question that was brought up here and once more you just turn this into a pointless personal argument, which is a shame as you do a lot of work then devaluate it, whilst I merely provide opinion.

I have had the Zero2 on my desk for months and its a cute little thing but its sounding like very much a no to me and I am saying it.

tkaiser · September 18, 2022, 8:24pm

It’s only you who mentions ‘premium’ again and again.

BTW: it’s similarly funny to search for ‘GPU’ here

I understood Radxa’s single mentioning of ‘premium SoC’ as if they will provide only a 4GB version with A311D and possible SKUs with S905D3 then with less RAM at lower prices. And the differentiation to S905Y2 equipped Zero was also clearly explained: lack of CSI and DSI interfaces (still there with pin-compatible S905D3, quoting the datasheet ‘4-lane MIPI DSI interface,2–lane MIPI CSI interface’).

stuartiannaylor · September 18, 2022, 8:38pm

That is exactly what I am saying as having the whole support and production on a totally different SBC purely just to provide a CSI & DSI because the A311D is too ‘premium’ to offer with less ram at lower prices is just a bad idea.

What do you not get about that, my opinion is that its a dead duck as said.
I have one on my desk its nice but I am giving my opinion that I think its a bad idea.