Rock 5b can't use AES accel? How to?

Hello,

According to the datasheet:
“ Secure System
 Embedded two cipher engine
 Support Link List Item (LLI) DMA transfer
 Support SHA-1, SHA-256/224, SHA-512/384, MD5, SM3 with hardware padding
 Support HMAC of SHA-1, SHA-256, SHA-512, MD5, SM3 with hardware padding
 Support AES-128, AES-192, AES-256 encrypt & decrypt cipher
 Support AES ECB/CBC/OFB/CFB/CTR/CTS/XTS/CCM/GCM/CBC-MAC/CMAC
mode
RK3588 Datasheet Rev 0.1
Copyright 2021 ©Rockchip Electronics Co., Ltd. 10
 Support SM4 ECB/CBC/OFB/CFB/CTR/CTS/XTS/CCM/GCM/CBC-MAC/CMAC
mode
 Support DES & TDES cipher, with ECB/CBC/OFB/CFB mode
 Support up to 4096 bits PKA mathematical operations for RSA/ECC/SM2
 Support generating random numbers”

But when I run a cryptsetup benchmark, and bear in mind this is running a RAM test only, no IO to storage, I get this:

The top is my pc, the bottom the Rock 5. I would expect such poor performance only if there was absolutely no acceleration and the algo was run on software mode. Has anyone gotten it to work?

I’m not sure if the terrible performance so far is only due to software implementations, or the datasheet is complete rubbish and it just lists everything the cpu CAN do rather than actual instructions (ala AVX, etc).

mine appears to work fine running armbian and just running “apt install cryptsetup”

root@rock-5b:~# cryptsetup benchmark

Tests are approximate using memory only (no storage IO).

PBKDF2-sha1 1175533 iterations per second for 256-bit key
PBKDF2-sha256 2155346 iterations per second for 256-bit key
PBKDF2-sha512 923042 iterations per second for 256-bit key
PBKDF2-ripemd160 603323 iterations per second for 256-bit key
PBKDF2-whirlpool 293883 iterations per second for 256-bit key
argon2i 4 iterations, 901917 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id 4 iterations, 916515 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)

Algorithm | Key | Encryption | Decryption

    aes-cbc        128b      1024.0 MiB/s      1780.5 MiB/s
serpent-cbc        128b               N/A               N/A
twofish-cbc        128b       112.5 MiB/s       118.8 MiB/s
    aes-cbc        256b       834.9 MiB/s      1484.0 MiB/s
serpent-cbc        256b               N/A               N/A
twofish-cbc        256b       114.1 MiB/s       118.6 MiB/s
    aes-xts        256b      1475.6 MiB/s      1476.4 MiB/s
serpent-xts        256b               N/A               N/A
twofish-xts        256b       110.4 MiB/s       116.8 MiB/s
    aes-xts        512b      1253.8 MiB/s      1255.1 MiB/s
serpent-xts        512b               N/A               N/A
twofish-xts        512b       115.3 MiB/s       116.6 MiB/s

@tkaiser

Is this a reliable/valid test to compare RK3588 to RK3588S?
And how can it be optimized?

My results:

 CPU0-3  CPU4-5  CPU6-7     DDR     DSU     GPU     NPU
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200
   1800    2304    2352    2112    1800     200     200


/usr/sbin/cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1291349 iterations per second for 256-bit key
PBKDF2-sha256    2364320 iterations per second for 256-bit key
PBKDF2-sha512     958478 iterations per second for 256-bit key
PBKDF2-ripemd160  648069 iterations per second for 256-bit key
PBKDF2-whirlpool  294875 iterations per second for 256-bit key
argon2i       4 iterations, 992408 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
argon2id      4 iterations, 991492 memory, 4 parallel threads (CPUs) for 256-bit key (requested 2000 ms time)
#     Algorithm |       Key |      Encryption |      Decryption
        aes-cbc        128b      1054.9 MiB/s      1887.2 MiB/s
    serpent-cbc        128b               N/A               N/A
    twofish-cbc        128b       117.0 MiB/s       119.8 MiB/s
        aes-cbc        256b       859.8 MiB/s      1566.8 MiB/s
    serpent-cbc        256b               N/A               N/A
    twofish-cbc        256b       116.8 MiB/s       120.0 MiB/s
        aes-xts        256b      1551.3 MiB/s      1555.3 MiB/s
    serpent-xts        256b               N/A               N/A
    twofish-xts        256b       120.9 MiB/s       121.2 MiB/s
        aes-xts        512b      1308.5 MiB/s      1308.9 MiB/s
    serpent-xts        512b               N/A               N/A
    twofish-xts        512b       120.9 MiB/s       121.2 MiB/s

Any reason why you call your Rock 5 starfive? In case this was running on a JH7110 board, please run my sbc-bench and report back. On RISC-V the issue of crypto acceleration or not and (missing) software optimizations is an interesting one.

RK3588(s) is equipped with ARMv8 Crypto Extensions.

Why would they differ here? AFAIK the only differences between RK3588 and RK3588s is the latter being severly castrated wrt I/O and in the wild I noticed RK3588S boards often being equipped with ‘lower quality’ silicon resulting in lower real clockspeeds for a reason called PVTM.

1 Like

Just a question, the cpu clockspeed reported with the script are not the real clockspeed?

Nope, that’s always just sysfs. In the early Rock 5B days Radxa had Rockchip’s original implementation also adjusting the cpufreq OPP on ‘weak’ silicon where the A76 cores are not able to reach the 2.4 GHz. That’s why your board shows 2304 MHz and 2352 MHz for the two A76 clusters.

In the meantime Radxa decided to cheat a little and now on each and every board 2400 MHz will be reported even if the A76 run only at 2250 MHz or lower.

You need Willy’s mhz tool to measure or simply run sbc-bench (I implemented a new -r mode for reviews recently which collects all this info automagically since not only silicon quality matters but also thermals in contrast what we/I thought in the beginning. As such this mode measures clockspeeds, then runs 5 minutes cpuminer to heat the board up and then measures again).

for i in 1 5 7 ; do taskset -c $i ./mhz 3 100000 ; done
count=413212 us50=11381 us250=56911 diff=45530 cpu_MHz=1815.120
count=413212 us50=11381 us250=56913 diff=45532 cpu_MHz=1815.040
count=413212 us50=11380 us250=56911 diff=45531 cpu_MHz=1815.080
count=516515 us50=11229 us250=56150 diff=44921 cpu_MHz=2299.659
count=516515 us50=11231 us250=56154 diff=44923 cpu_MHz=2299.557
count=516515 us50=11231 us250=56152 diff=44921 cpu_MHz=2299.659
count=516515 us50=11153 us250=55775 diff=44622 cpu_MHz=2315.069
count=516515 us50=11155 us250=55776 diff=44621 cpu_MHz=2315.121
count=516515 us50=11155 us250=55777 diff=44622 cpu_MHz=2315.069

Sure: build and then:

for i in 1 5 7 ; do taskset -c $i /path/to/mhz 3 100000 ; done

If you think that’s weird, wait until i tell you the results are from an RK3588S.
But i believe thermals matters .

Yes, thermals matter. And with some RK3588(S) we have even the phenomenon that PVTM downclocks the A76 cores to slightly above 400 MHz when they should run fully loaded at max cpufreq OPP: ROCK 5B Debug Party Invitation

Yes, that’s it actually. I made a mistake as to which SBC I was connected to, indeed it’s the JH7110 board.
Reran and the benchmark on the Rock 5 are similar to what others reported. Not sure why it can’t do serpent.

This maybe?

root@rock-5b:/home/tk# zgrep SERPENT /proc/config.gz 
# CONFIG_CRYPTO_SERPENT is not set
1 Like