I would like to share my recent project: running Linux Q83 on the Orion O6 platform .
The main goal of this work was to test the compatibility, performance, and stability of Linux Q83 on Orion O6 hardware. I focused on system installation, configuration, and optimization for this environment.
The underlying dependency of ollama is llama.cpp.
Therefore, if you need to use GPU inference for LLM, it is recommended to use the combination of llama.cpp and GPU acceleration: enable -DGGML_VULKAN=ON, compile llama.cpp, and obtain the “c++ API for opening vulkan configuration: llama-cli”.
Use this llama-cli for GPU inference on Orion O6.