Recently I updated https://github.com/swdee/go-rknnlite/ to support other models in the RK35xx series, specifically to compare the two core 6 TOPS NPU of the RK3576 (Rock 4D) versus the three core 6 TOPS NPU RK3588 (Rock 5B). Also I added in the RK3566 single core 1 TOPS NPU (Zero 3E) for comparison as other users were interested in that.
Overall the RK3576’s NPU is comparable, sometimes it performs a bit faster due to the Rock 4D having faster DDR5 memory. On inference models that have a lot of CPU post processing (such as Segmentation Models) these perform slower as the CPU cores are much slower. The raw CPU speed in the RK3576 is about the same as on the RPI 5.