Vulkan makes a huge positive impact on LLM inference with llama.cpp,especially with multimodal models.
I’m thinking about getting a Radxa Orion O6, but I could not find anything about the state of Vulkan support.
Vulkan makes a huge positive impact on LLM inference with llama.cpp,especially with multimodal models.
I’m thinking about getting a Radxa Orion O6, but I could not find anything about the state of Vulkan support.
I did a llama.cpp build enabling Vulkan but that was super slow, even if I offloaded only one layer to it. I’m clearly ignorant of all the technos related to GPUs, so I don’t understand what vulkan exactly is (driver, library etc), nor how it compares or relates to cuda, opencl etc. I don’t even know if it really used the GPU or fell back to emulation on the CPU. All these things are totally obscure to me, there are probably too many layers of abstraction and cryptic names for me :-/