How is orion considered an AI PC if ollama cant use a GPU/NPU

How is orion considered an AI PC if ollama cant use a GPU/NPU,
So either im doing something wrong or all the TOPs are useless to Ollama

because it clearly says its running on CPU

No you’re not doing anything wrong, for now CIX has not released anything that I’m aware of that allows to make use of the NPU. So for now it’s just a PC with a pretty decent CPU that delivers PC-like AI performance using pure CPU just because it’s relying on 128b at 6000 MT/s. It’s not exceptional by todays PC standards, and cannot use its full potential due to architectural limitations between the CPUs and the memory controller that the NPU could hopefull help work around. For me it delivers exactly the same memory bandwidth as an AMD 5800X. That’s good but not exceptional considering we’ve moved two generations forward now, and that it’s trivial to add more RAM to a PC. And the PC remains significantly faster thanks to a beefier CPU, e.g. with Llama-3.2-1B-Instruct-Q4_0.gguf, I’m gegtting 362 t/s pp512 and 51.74 tg128 on the AMD vs 221 pp512 and 39.76 tg128 on the O6. And my O6 has its RAM controller overclocked by around 12%.

So for now a second-hand PC remains a better choice, but if we consider the form factor, then the O6 probably stands the only option left.

1 Like

There’s a thread here showing an example YOLO (Object Detection) C++ implementation, but the Python SDK/APIs are very non-optimal right now (to the point where it’s questionable if they’ll really be of much use).

For Ollama, etc, I wouldn’t get your hopes up: It might be a very long time until Llama.cpp can support the Cix NPU (if ever). Would love to see it though.

Shorter term, If the Python API gets fixed up, we might see an example of the Cix NPU used with the Python Transformers library for LLM’s. I’d really like to see this because it’d give some indication as to whether the Cix NPU is actually as capable as it sounds on paper (and is probably a lower-effort endeavour than the above).

Cix’s NPU SDK is, unfortunately, still “early access” - you have to explicitly sign up on their website to get granted access to it.

As an aside, does anyone know why using Vulkan with Llama.cpp is currently so slow? Has a particular bottleneck been identified?

So basically their web page is touting all hype and we fell for it, yet nothing is as it really seems.