With the help of TVM Unity I have had a success running Stable Diffusion on the Rock 5B!
Currently generate a 512x512 image costs about 500 seconds (including model loading and GPU kernel compilation time. The actual inference time is less). The U-Net runs at 21sec per iteration.
The model is large, you probably need a >=16GB version of the board to run this.
Code: https://github.com/happyme531/RK3588-stable-diffusion-GPU (please star!!)