What GPU can i use with the Orion to run LLMS?

aappiah · April 7, 2025, 3:57am

Id like to be able to run some 20B to 32B LLMs on the orion using a GPU are there any GPUs that will work with Ollama on the Orion under Debian ?

Morgan · April 8, 2025, 1:45am

Hi, @aappiah

We recommend running LLM via llama.cpp to use the CPU for inference

willy · April 8, 2025, 3:27am

That’s what I’ve been doing as well and it works pretty well (use only the big and med cores otherwise the little cores will slow everything down), though around 32B it will get a bit slow. I tried to use the iGPU with llama.cpp built with -DGGML_VULKAN=ON but it’s super slow (~1 tok/s), even if I offload only a single layer, so it looks like the communication with the GPU area is not efficient enough, or maybe the vulkan support is not great in llama.cpp or maybe some data needs to be converted between layers, taking time.

I have not tried other GPUs for LLMs so I don’t know what to expect from these, but at 20-32B it will require a huge one with lots of RAM and I’m not convinced about the interest of running that on this board instead of a regular PC given the power requirements and costs.

aappiah · April 8, 2025, 4:10am

Are you using the 64GB board? I wish id ordered that model as i have the 32GB model.