There’s a thread here showing an example YOLO (Object Detection) C++ implementation, but the Python SDK/APIs are very non-optimal right now (to the point where it’s questionable if they’ll really be of much use).
For Ollama, etc, I wouldn’t get your hopes up: It might be a very long time until Llama.cpp can support the Cix NPU (if ever). Would love to see it though.
Shorter term, If the Python API gets fixed up, we might see an example of the Cix NPU used with the Python Transformers library for LLM’s. I’d really like to see this because it’d give some indication as to whether the Cix NPU is actually as capable as it sounds on paper (and is probably a lower-effort endeavour than the above).
Cix’s NPU SDK is, unfortunately, still “early access” - you have to explicitly sign up on their website to get granted access to it.
As an aside, does anyone know why using Vulkan with Llama.cpp is currently so slow? Has a particular bottleneck been identified?