Yes.
ROCK5 Android12 Bringup Status Update
lol !
So could you please build a recently updated ncnn benchncnn binary for android and run benchmark on cpu/gpu ?
I’m quite curious how the performance is compared with other soc that are reachable.
I can add rock5 onto the list if it is available --> https://github.com/nihui/ncnn-small-board
Reach me at qq group if help needed of any issues related to ncnn.
Cnxsoft and a few others quoted the speed in a similar way you did 2.4/2.6 which is sort of weird when usually there is just a max clock and it was just my confusion as started to wonder if this was 2+2.
I can not find a ref for clock speed in the datasheet max clock just has a TBA but things looking good if thats without heatsink and seems quite low on wattage.
Also still trying to find a rated max TDP for the rk3588 Tom posted some approx details and still the NPU is to run as also curious how much load/heat 6 TOPs will provide.
Would be interesting to see what the Mali can do with NCNN but guess you guys have enough to do, but also power draw vs the NPU as in the list for ncnn-small-board it seems to be purely GPU with the Maxwell of the nano leading which is approx a 472 GFLOPs device whilst the RK3588 supposedly has a 6 TOPs NPU which is awesome and would love that to be tested also.
Sawyer told me it was indeed 2+2 but both clusters are clocked at 2.4 GHz. Might need some optimizations to enable that extra 200 MHz.
I will leave you guys to that PMU as it looks great but boy with its own MCU looks a tad complex.
So prob the default offering will be 2.4/2.6 2+2 4 core, I guess? I presume because it does have a PMU MCU it is really flexible and nothing is set in stone.
I have absolutely no idea how to format a OPP table for 2+2 so will just leave you guys to it.
console:/data/benchmark # ./benchncnn 10 $(nproc) 0 -1
loop_count = 10
num_threads = 8
powersave = 0
gpu_device = -1
cooling_down = 1
squeezenet min = 10.75 max = 11.04 avg = 10.82
squeezenet_int8 min = 10.92 max = 11.75 avg = 11.19
mobilenet min = 11.60 max = 12.29 avg = 11.77
mobilenet_int8 min = 10.79 max = 11.10 avg = 10.95
mobilenet_v2 min = 11.36 max = 11.68 avg = 11.47
mobilenet_v3 min = 10.04 max = 11.15 avg = 10.27
shufflenet min = 11.36 max = 14.80 avg = 11.84
shufflenet_v2 min = 6.93 max = 7.84 avg = 7.27
mnasnet min = 9.81 max = 11.61 avg = 10.04
proxylessnasnet min = 12.14 max = 12.44 avg = 12.24
efficientnet_b0 min = 19.68 max = 20.08 avg = 19.81
efficientnetv2_b0 min = 31.52 max = 31.93 avg = 31.76
regnety_400m min = 13.84 max = 14.03 avg = 13.91
blazeface min = 4.30 max = 4.73 avg = 4.58
googlenet min = 34.22 max = 37.59 avg = 34.84
googlenet_int8 min = 36.66 max = 37.02 avg = 36.86
resnet18 min = 22.71 max = 25.21 avg = 23.25
resnet18_int8 min = 33.30 max = 33.79 avg = 33.52
alexnet min = 35.89 max = 37.76 avg = 36.30
vgg16 min = 150.28 max = 153.17 avg = 151.37
vgg16_int8 min = 222.55 max = 229.33 avg = 225.95
resnet50 min = 59.84 max = 65.78 avg = 60.80
resnet50_int8 min = 66.24 max = 68.79 avg = 66.95
squeezenet_ssd min = 43.07 max = 43.42 avg = 43.24
squeezenet_ssd_int8 min = 47.50 max = 50.78 avg = 48.28
mobilenet_ssd min = 32.95 max = 35.40 avg = 33.45
mobilenet_ssd_int8 min = 30.31 max = 30.92 avg = 30.57
mobilenet_yolo min = 65.22 max = 68.17 avg = 66.00
mobilenetv2_yolov3 min = 38.73 max = 41.90 avg = 39.34
yolov4-tiny min = 53.76 max = 58.01 avg = 54.81
nanodet_m min = 16.85 max = 20.38 avg = 18.14
yolo-fastest-1.1 min = 9.51 max = 9.90 avg = 9.59
yolo-fastestv2 min = 9.42 max = 10.02 avg = 9.64
console:/data/benchmark #
Thanks!
谢谢!
I have updated the data to the list.
The rock5b cpu runs faster than jetson nano gpu for this workload
Well that is certainly impressive and making us all jealous as we also get to see a RK3588_s in the wild.
That is a Rockchip EVB board isn’t it? Whats it clocked at?
We are running on a early version of ROCK5 board
A76 2.4GHz x 4
A55 1.8GHz x 4
I’m curious if it would be faster if only big cores are used.
./benchncnn 10 4 2 -1 1
Still didn’t have a heat sink maybe faster ddr4? Or did they go all out with ddr5
Also does benchncnn run on the Mali as the vulcan drivers seemed to be inplace, still can not see the GPU being faster than the NPU as 6ToPs is quite a lot but would be interesting as have never had my hands on one of the new G610s or any thing above a G31
Would it be possible to get a dump of /proc/cpuinfo? I’m curious what revision of the Cortex-A76 IP the RK3588 uses (so the variant and revision fields are most important).
Thanks!
console:/ # cat /proc/cpuinfo
Summary
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd05
CPU revision : 0
processor : 1
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd05
CPU revision : 0
processor : 2
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd05
CPU revision : 0
processor : 3
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd05
CPU revision : 0
processor : 4
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 0
processor : 5
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 0
processor : 6
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 0
processor : 7
BogoMIPS : 48.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part : 0xd0b
CPU revision : 0
Serial : 0000000000000000
console:/data/benchmark # ./benchncnn 10 4 2 -1 1
Summary
loop_count = 10
num_threads = 4
powersave = 2
gpu_device = -1
cooling_down = 1
squeezenet min = 9.38 max = 9.66 avg = 9.50
squeezenet_int8 min = 12.04 max = 12.20 avg = 12.13
mobilenet min = 11.97 max = 12.12 avg = 12.05
mobilenet_int8 min = 6.74 max = 10.28 avg = 8.05
mobilenet_v2 min = 11.90 max = 11.99 avg = 11.94
mobilenet_v3 min = 10.33 max = 10.43 avg = 10.38
shufflenet min = 8.02 max = 8.11 avg = 8.06
shufflenet_v2 min = 4.11 max = 6.54 avg = 5.96
mnasnet min = 9.52 max = 9.63 avg = 9.58
proxylessnasnet min = 10.86 max = 10.95 avg = 10.90
efficientnet_b0 min = 16.42 max = 16.55 avg = 16.48
efficientnetv2_b0 min = 23.79 max = 23.90 avg = 23.82
regnety_400m min = 9.84 max = 14.25 avg = 12.46
blazeface min = 2.65 max = 2.76 avg = 2.68
googlenet min = 24.63 max = 35.44 avg = 33.17
googlenet_int8 min = 30.76 max = 44.20 avg = 41.17
resnet18 min = 20.59 max = 29.91 avg = 28.89
resnet18_int8 min = 47.93 max = 48.29 avg = 48.13
alexnet min = 34.20 max = 34.35 avg = 34.30
vgg16 min = 159.27 max = 212.36 avg = 199.48
vgg16_int8 min = 213.41 max = 335.72 avg = 287.41
resnet50 min = 66.80 max = 67.02 avg = 66.91
resnet50_int8 min = 78.17 max = 84.14 avg = 83.08
squeezenet_ssd min = 44.27 max = 44.51 avg = 44.37
squeezenet_ssd_int8 min = 41.50 max = 54.33 avg = 50.75
mobilenet_ssd min = 21.33 max = 30.37 avg = 28.20
mobilenet_ssd_int8 min = 18.67 max = 27.32 avg = 25.15
mobilenet_yolo min = 66.48 max = 66.76 avg = 66.62
mobilenetv2_yolov3 min = 27.37 max = 39.36 avg = 36.79
yolov4-tiny min = 42.01 max = 66.04 avg = 60.49
nanodet_m min = 10.97 max = 17.58 avg = 14.96
yolo-fastest-1.1 min = 8.93 max = 9.38 avg = 8.99
yolo-fastestv2 min = 9.26 max = 9.37 avg = 9.33
console:/data/benchmark #
I can’t find my tiny fan, so I run it without a cooler this time.
Excellent! Rockchip has integrated the r4p0 revision of Arm’s Cortex-A76. This means we get support for the PSTATE Speculative Store Bypass Safe (SSBS) bit and the speculation barrier instructions that got introduced in the Armv8.5-A extension. (This was actually introduced in r3p0, but generally the newer the better.)
They also used the newest revision of the Cortex-A55 - r2p0.
Thanks for dumping this @anon39001862!
Cool I will have to take your knowledge of revisions for that but great.
Guess with the TMSC vs Samsung, 7v8 nm we ended up with Samsung so likely do clock a bit slower but that is whoever Rockchip could get fab with due to current situation so no biggie.
Would be great if that heatsink/fan does ever turn up just to see if there is any thermal throttling in the above as always a good indication of headroom.
PS a 12v fan on 5v or 5v on 3.3v works great and doesn’t sound like a mini hairdryer.
I am wondering though if the MCU’s have blobs that might lock cpu frequency like the amlogic a311d does as Rockchip does seem to be more opensource friendly than amlogic but could even be a arm blob.
PS with revisions loving the pico-itx format as its like its big brother the PC having I/O on one plane is very easy and flexible.
The mini usb on the side is PD power? As no problem as an extension cable is easy and cheap but the boot & reset switches would be more fexible if just simple jumpers as for cases / racks / relays it just creates more and simpler options.
Also allows you to put a simple harness in the shop for that but wow it should make sourcing racks and cases and implmenting them so much easier and also you could use a remote realy.
If its all on one plane or vertical via extention its just such a sweet layout.
How to load m0 cores with binary files? Can we do this in system or have to use jtag port? By the way, I’m curious about whether the jtag port for cortex-a/m cores have been connected to outside pins. Jtag/swd would be very useful for developing m0 cores.
| Cool I will have to take your knowledge of revisions for that but great.
Thankfully you don’t have to take my word for it - Arm does a good job of documenting what changes for each different revision in the TRM (Technical Reference Manual) for each core IP.
So, in the case of the Cotex-A76 you can find the TRM for the latest revision here:
https://developer.arm.com/documentation/100798/0401/?lang=en
and in the “Product revisions” section, A1.7 in this case, you’ll find all the details.
Nah I will just take your knowledge of the revisions as means littlle to me.
I was just curious if the MCU blobs are locked or not, purely wondering if community OC is possible as wondering if its locked.
Not a biggie as if perfromance is somewhere in the ballpark of expectations of 8nm A76 its still going to be pretty awesome.
About only thing I don’t like about that board and the quibble is tiny is the onboard switches as jumpers would be my preference as like a normal mobo there is the option for wired switches.
Thnx for the info though.