How to measure NPU Usage and feedback regarding llamacpp build with heaxgon support

Hey everyone,

I just wanted to share a quick PoC and ask for some feedback. I managed to get the official ggml Hexagon backend running on the Radxa Fogwise AIRbox Q900 (Dragonwing Linux):

https://youtu.be/errdvVlbU5o

Has anyone else here successfully experimented with llama.cpp (Hexagon backend) on this specific board yet?

As a stress test, I currently have 6 instances (Qwen3.5-2B-Q4) running simultaneously (1x CPU, 4x HTP, 1x Adreno). It works surprisingly well, but I am running into a small issue: I would love to have better monitoring for the NPUs.

Does anyone know a good tool, command, or method to properly measure the real-time NPU hardware utilization on Dragonwing Linux?

(Note: I am planning to submit a PR to the main llama.cpp repo around mid-August once I have the time to clean up my build scripts after my exams).

I also have gotten llama.cpp hexagon up and running on the Q900, under Radxa’s OS. Llama 3.2 1B Q8 runs pretty quick, compared to Qwen 3.6 35B on the CPU. I also wanted some type of utilization on the NPU. The best I could find was NPU temperatures, available in /sys/class/thermal/nsp*. Maybe Qualcomm has something better internally, who knows.

This might be helpful, perhaps?

2 Likes