Hi,
We have been trying to use NPU with llama.cpp, Ollama & Onnx.
With llama.cpp and Ollama, it only uses CPU, there seems to be no option of using NPU with these frameworks.
For Onnx, there is a “ZhouyiExecutionProvider” for loading and running models on the NPU. But this complains about some missing libraries and in the end it does not work.
To us it looks like we can only use NPU with NPU-optimized models from the CIX AI Model Hub using custom python scripts like inference_npu.py.
Do these observations seem right?
Thanks