Hi all, while running llama-bench on larger models (e.g. 32B+ Q4_KM), I often kept getting hard crashes on my Orion O6 (fan remains at full pelt and it became inaccessible, requiring hard reset).
Turns out this was just my system config that needed tweaking to make sure I wasn’t overcommitting to memory that I didn’t have available.
With the follow tweaks, larger models can be benched without issue (they’re obviously slow though).
sudo sh -c 'echo 2 > /proc/sys/vm/overcommit_memory'
sudo sh -c 'echo 80 > /proc/sys/vm/overcommit_ratio'
I got my O6 working again (BIOS issue, corrupted the SPI Flash) and gave the Cix GO drivers a shot on the default Debian Radxa image. Roughly:
##
# Update system packages
##
apt update
apt upgrade
###
# CIX GO drivers: https://developer.cixtech.com/
###
# Uninstall existing packages
./uninstall.sh
# Install new Cix GO packages
./install.sh
###
# LLAMA
###
# Clone Llama
git clone https://github.com/ggml-org/llama.cpp.git
# Make change to disable grouping feature
# NOTE: I had to make a few other changes in Vulkan file to disable `VK_EXT_layer_settings`
# These aren't supported on the vulkan-dev package that's available in the Debian repo. I've read that this shouldn't impact performance, but maybe it's the reason why it's slow for me?
# Build Llama.
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON -DLLAMA_CURL=OFF
cmake --build build --config Release -j 8
# Run benchmark
taskset -c 0,5,6,7,8,9,10,11 ./build/bin/llama-bench -m ../llm/qwen2.5-3b-instruct-q4_0.gguf -pg 128,128 -t 8 -ngl 1000
Regarding vulkaninfo, I note you executed the command:
apt upgrade
This may have upgraded some vulkan related libs on your system. As llama.cpp works well with mali vulkan driver, I think you might try to specify vulkan icd to mali when runing vulkaninfo:
Thanks for the details, appreciate it! And sorry for late reply - have been sick the past week.
I did have another play around with all this yesterday but, unfortunately, am still stuck at very low PP (~10tps for 3B Q4). I’m not too sure why yet, but I did try a few other things:
Built VulkanSDK for latest glslc which correctly auto-detects the Vulkan extensions above (GML_VULKAN_INTEGER_DOT_GLSLC_SUPPORT , etc) For context, with the latest Git versions of Llama.cpp, I had a bit of trouble building and there’s a many #if defined … statements for the extensions in the ggml-vulkan.cpp file.
Made sure to export export VK_ICD_FILENAMES=/etc/vulkan/icd.d/mali.json - which DOES fix the vulkaninfo issue.
Probably a lot of other things too - but I can’t recall them all sorry.
When I get some time, I might try again from scratch with the Default Radxa Image… it’s possible, with all my tinkering, I’ve messed something up.
I do think the Cix GO driver itself is working though - using the monitoring tool mentioned in their docs, I was able to see the Mali GPU Cores all go to 100% while Llama was running, so I suspect the issue is probably somewhere in the Llama.cpp version I’m using.