Hi all,
I bought the Orion O6 primarily with the intent of running LLM’s on it (I needed something with relatively low power consumption - I run exclusively on solar).
I figure some benchmarks for Llama.cpp might interest others. Do not expect great speeds - it is constrained by the RAM bandwidth for token generation - and prompt ingestion is currently CPU-only (though would love to see how it would perform using the NPU).
A preliminary note, I built Llama.cpp using the following:
# NOTE: The use of GGML_CPU_ARM_ARCH - without this, Llama may not detect NEON/SIMD support on the Cix P1 (this was the case for the Radxa Debian image).
cmake -B build -DGGML_CPU_ARM_ARCH=armv9-a+sve2+dotprod+i8mm+fp16+fp16fml+crypto+sha2+sha3+sm4+rcpc+lse+crc+aes+memtag+sb+ssbs+predres+pauth -DCMAKE_C_COMPILER=gcc-12 -DCMAKE_CXX_COMPILER=g++-12 -DGGML_NATIVE=off
# Build it (use 6 threads)
cmake --build build --config Release -j 6
The reason for this is that Llama’s build script has some conditionals when determining the CPU feature-set (and these weren’t being detected correctly using Radxa’s Debian Image). So, it was being built without the ARM NEON kernels (this does impact prompt processing significantly). As to why that feature-list is so long - I pretty much threw the kitchen sink at it in case there were other optimizations possible (most are likely not used).
Some other quick notes:
- I did manage a successful build with KleidiAI support - but prompt ingestion performance did not appear to improve and token generation was actually significantly slower (tg on Qwen3 A3B:30B:Q4_K_M down to around 12 t/s).
- Building with Vulkan, technically, succeeded - but it freezes (or is ridiculously slow) while trying to ingest/generate tokens.
- 7 threads appears to be the sweet spot. Using -1 (auto), it was ridiculously slow (maybe trying to use small cores which aren’t available?)
- My Orion O6 is in the Radxa AI Kit case. I haven’t placed a copper plate to make better contact with the heatsink yet, so it runs a little hot (CPU B1 @ ~65C). Power consumption (according to my USB-C charger) is around ~20W when running (20V, 1A).
Benchmarks
/llama-bench -t 7 -m ../../../models/Qwen_Qwen3-30B-A3B-Q4_K_M.gguf
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium | 17.35 GiB | 30.53 B | CPU | 7 | pp512 | 23.31 ± 0.07 |
| qwen3moe 30B.A3B Q4_K - Medium | 17.35 GiB | 30.53 B | CPU | 7 | tg128 | 16.13 ± 0.08 |
build: cb9178f8 (5857)
# NOTE: Too slow, so only did one repeat.
./llama-bench -t 7 -m ../../../models/Qwen_Qwen3-32B-Q4_K_M.gguf -r 1
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen3 32B Q4_K - Medium | 18.40 GiB | 32.76 B | CPU | 7 | pp512 | 3.90 ± 0.00 |
| qwen3 32B Q4_K - Medium | 18.40 GiB | 32.76 B | CPU | 7 | tg128 | 1.95 ± 0.00 |
build: cb9178f8 (5857)
# NOTE: Too slow, so only did one repeat.
./llama-bench -t 7 -m ../../../models/Hunyuan-A13B-Instruct-Q4_K_M.gguf -r 1
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| hunyuan-moe A13B Q4_K - Medium | 45.43 GiB | 80.39 B | CPU | 7 | pp512 | 6.65 ± 0.00 |
| hunyuan-moe A13B Q4_K - Medium | 45.43 GiB | 80.39 B | CPU | 7 | tg128 | 3.71 ± 0.00 |
build: cb9178f8 (5857)
For anyone interested, here’s an example of Qwen3 A3B:30B without NEON/SIMD.
# NOTE: Built WITHOUT NEON/SIMD
./llama-bench -t 7 -m ../../../models/Qwen_Qwen3-30B-A3B-Q4_K_M.gguf
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q4_K - Medium | 17.35 GiB | 30.53 B | CPU | 7 | pp512 | 14.85 ± 0.03 |
| qwen3moe 30B.A3B Q4_K - Medium | 17.35 GiB | 30.53 B | CPU | 7 | tg128 | 12.41 ± 0.01 |
build: cb9178f8 (5857)