MLC recently posted about running large language models (LLM, e.g. ChatGPT) on the Mali G610 GPU using OpenCL: https://blog.mlc.ai/2023/08/09/GPU-Accelerated-LLM-on-Orange-Pi
I put together a couple prebuilt Docker images with their demo if anyone is interested in playing with it: https://milas.dev/blog/mali-g610-rk3588-mlc-llm-docker/
Make sure you have the Mali firmware in /lib/firmware/
on your host (aka whatever OS you’re running natively on the Rock 5B - it can’t load the firmware from the Docker container, unfortunately.)
docker run --rm -it --privileged docker.io/milas/mlc-llm:redpajama-3b
The image is ~4.5GB once downloaded and uncompressed!
I also published a variant with the Llama-2-7b-chat-hf-q4f16_1
model (tag is llama2-7b
). Be warned you’ll probably need 16GB of RAM for that.
Also, I’ve managed to hard lock the kernel a few times when loading a model. I’m using an rkr3.4
based kernel, you might have better or worse luck on others.
These are more technically interesting demos than useful on their own, but I hope someone else finds it helpful - it was rather tricky getting everything compiling and happy in a container.
disclaimer: I work for Docker Inc, but this is all my personal work, opinions are my own, etc.