Mali / OpenCL Accelerated LLM Inference

MLC recently posted about running large language models (LLM, e.g. ChatGPT) on the Mali G610 GPU using OpenCL: https://blog.mlc.ai/2023/08/09/GPU-Accelerated-LLM-on-Orange-Pi

I put together a couple prebuilt Docker images with their demo if anyone is interested in playing with it: https://milas.dev/blog/mali-g610-rk3588-mlc-llm-docker/

Make sure you have the Mali firmware in /lib/firmware/ on your host (aka whatever OS you’re running natively on the Rock 5B - it can’t load the firmware from the Docker container, unfortunately.)

docker run --rm -it --privileged docker.io/milas/mlc-llm:redpajama-3b

:warning: The image is ~4.5GB once downloaded and uncompressed!

I also published a variant with the Llama-2-7b-chat-hf-q4f16_1 model (tag is llama2-7b). Be warned you’ll probably need 16GB of RAM for that.

Also, I’ve managed to hard lock the kernel a few times when loading a model. I’m using an rkr3.4 based kernel, you might have better or worse luck on others.

These are more technically interesting demos than useful on their own, but I hope someone else finds it helpful - it was rather tricky getting everything compiling and happy in a container.

disclaimer: I work for Docker Inc, but this is all my personal work, opinions are my own, etc.

5 Likes

This is an amazing development, congratulations.

I have finally got round to testing it on a 16 GB Rock 5B. After any prompt, it produces garbage output and then crashes. Any ideas?

Hmm…not sure. I did manage to lock up my kernel a couple times at random while playing with it :sweat_smile:

Is there anything relevant in journalctl -k output? What kernel are you running?

I was trying to reproduce this but now it always seems to work (ignoring the “invalid tar header” errors that docker sometimes throws when unzipping).

Kernel:
Linux rock-5b 5.10.110-rockchip-rk3588 #23.02.2 SMP Fri Feb 17 23:59:20 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

I reproduced the crash. It happens only when I do something in the GUI, ie the GPU is doing the OpenCL calculations and taking care of displaying stuff. For example when I resize a window.

There are no journal messages.

Ahhh I’m running headless, so I don’t have much advice for you there, unfortunately :stuck_out_tongue: