Hi, All
In the link below, the performance of the DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm model is shown as 15.36 tokens/sec in the generate, but when I tried to run it, 15.36 tokens/sec doesn’t seem like a reasonable number.
https://docs.radxa.com/en/rock5/rock5t/app-development/rkllm_deepseek_r1
Performance on RK3588 reaches 15.36 tokens/s :
Stage | Total Time (ms) | Tokens | Time per Token (ms) | Tokens per Second |
---|---|---|---|---|
Prefill | 122.70 | 29 | 4.23 | 236.35 |
Generate | 27539.16 | 423 | 65.10 | 15.36 |
https://docs.radxa.com/en/assets/images/rkllm_ds_3-492e4c4fcea8ff85cea215ebdeec236c.webp
When I actually measured it, I got the numbers below.
12 tokens/s for low and 15 tokens/s for high.
Is the result RADXA tested the maximum value (15.36 tokens/s)? Or is it an average result?
If it’s an average result, why is it low on average in the tests I’ve done?
Or, is everyone else getting a speed of 15.36 tokens/sec?
Let me know your thoughts.
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 1929.53 / / /
I rkllm: Prefill 335.94 30 11.20 89.30
I rkllm: Generate 3014.55 44 68.51 14.60
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2270.14 / / /
I rkllm: Prefill 367.06 49 7.49 133.49
I rkllm: Generate 22758.78 319 71.34 14.02
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2270.14 / / /
I rkllm: Prefill 0.00 0 0.00 0.00
I rkllm: Generate 19324.20 268 72.11 13.87
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 1889.13 / / /
I rkllm: Prefill 290.03 49 5.92 168.95
I rkllm: Generate 21828.04 319 68.43 14.61
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 1941.29 / / /
I rkllm: Prefill 533.74 60 8.90 112.41
I rkllm: Generate 22971.40 332 69.19 14.45
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 1998.16 / / /
I rkllm: Prefill 299.41 60 4.99 200.39
I rkllm: Generate 22896.06 332 68.96 14.50
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2143.00 / / /
I rkllm: Prefill 271.09 41 6.61 151.24
I rkllm: Generate 164455.45 2047 80.34 12.45
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2143.00 / / /
I rkllm: Prefill 192.91 33 5.85 171.07
I rkllm: Generate 164809.91 2047 80.51 12.42
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 1968.99 / / /
I rkllm: Prefill 205.80 31 6.64 150.63
I rkllm: Generate 31269.30 452 69.18 14.46
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2049.23 / / /
I rkllm: Prefill 226.38 33 6.86 145.78
I rkllm: Generate 32675.22 472 69.23 14.45
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2049.23 / / /
I rkllm: Prefill 0.00 0 0.00 0.00
I rkllm: Generate 34048.97 489 69.63 14.36
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2049.23 / / /
I rkllm: Prefill 413.06 34 12.15 82.31
I rkllm: Generate 36204.07 519 69.76 14.34
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 399.41 34 11.75 85.13
I rkllm: Generate 40692.71 581 70.04 14.28
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 144.45 8 18.06 55.38
I rkllm: Generate 3626.40 55 65.93 15.17
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 149.54 5 29.91 33.44
I rkllm: Generate 1952.60 29 67.33 14.85
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 170.69 10 17.07 58.59
I rkllm: Generate 6961.55 103 67.59 14.80
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 238.04 8 29.75 33.61
I rkllm: Generate 1894.84 26 72.88 13.72
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 191.33 6 31.89 31.36
I rkllm: Generate 2450.64 37 66.23 15.10
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 148.70 7 21.24 47.07
I rkllm: Generate 3972.95 59 67.34 14.85
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 194.88 4 48.72 20.53
I rkllm: Generate 2073.36 31 66.88 14.95
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 250.71 11 22.79 43.88
I rkllm: Generate 17271.81 253 68.27 14.65
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Init 2142.52 / / /
I rkllm: Prefill 236.68 6 39.45 25.35
I rkllm: Generate 2615.21 37 70.68 14.15
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Memory Usage (GB)
I rkllm: 1.71
I rkllm: --------------------------------------------------------------------------------------