I have recently created go-rknnlite, a set of Go language bindings for the rknn-toolkit2 C API interface. These allow you to use the Go programming language to perform inference on the RK3588 NPU.
It features a Pooled Runtime mode where you can run multiple RKNN instances of the same Model across all three NPU cores. In our EfficentNet-Lite0 Model average inference time is 7.9ms per image when running on a single NPU core, but by running a Pool of 9 runtimes across all three cores brings inference speed down to 1.65ms per image.
For price/performance the RK3588 NPU is the best option around for Edge AI applications. Some benchmarks of our model running on different platforms.
The following table provides a comparison of the performance.
This is nice. I think the ROCK 5C Lite and CM5 Lite would make the price/performance even better? We want to send you a ROCK 5C Lite to compare the performance difference. Send you a PM.
Thanks for sending the Rock 5C Lite. I received it today and ran our benchmark for comparison to the Rock 5B using the pooled runtimes to spread inference across all NPU cores.
Number of Runtimes
Execution Time: 5B, 5C Lite
Average Inference Time Per Image: 5B, 5C Lite
1
59.97s, 66.91s
7.91ms, 8.83ms
2
34.56s, 33.76s
4.55ms, 4.45ms
3
22.94s, 23.75s
3.02ms, 3.13ms
6
13.89s, 18.16s
1.83ms, 2.40ms
9
12.54s, 17.37s
1.65ms, 2.29ms
12
11.97s, 16.69s
1.57ms, 2.20ms
15
12.03s, 16.63s
1.58ms, 2.19ms
Given the optimal of 9 pool runtimes, the Rock 5B runs average inference per image at 1.65ms and the 5C Lite at 2.29ms.
The performance is very good versus price, just need to wait for Arace to ship my CM5 order so I can build a carrier board and put it to use in a prototype product Iām developing.