Hey everyone,
I just wanted to share a quick PoC and ask for some feedback. I managed to get the official ggml Hexagon backend running on the Radxa Fogwise AIRbox Q900 (Dragonwing Linux):
Has anyone else here successfully experimented with llama.cpp (Hexagon backend) on this specific board yet?
As a stress test, I currently have 6 instances (Qwen3.5-2B-Q4) running simultaneously (1x CPU, 4x HTP, 1x Adreno). It works surprisingly well, but I am running into a small issue: I would love to have better monitoring for the NPUs.
Does anyone know a good tool, command, or method to properly measure the real-time NPU hardware utilization on Dragonwing Linux?
(Note: I am planning to submit a PR to the main llama.cpp repo around mid-August once I have the time to clean up my build scripts after my exams).