Use YoloV8 in RK3588 NPU

Thank you for this. It works amazingly. My camera currently only goes up to 30fps but it is locked at 30fps using code from that project.

I am currently trying to convert my facial detection model to rknn and encountering some issues. I know it’s the post processing. I am getting error:

“input0_data = outputs[0].reshape([3, -1]+list(outputs[0].shape[-2:]))
ValueError: cannot reshape array of size 42000 into shape (3,newaxis,8400,1)
E RKNN: [14:28:13.784] failed to submit!, op id: 1, op name: Conv:/model.0/conv/Conv, flags: 0x5, task start: 0, task number: 19, run task counter: 0, int status: 0”

I read above you mention the v8 post processing is a copy and paste but it looks fairly different, i’m guessing I need to adapt it?

Hi, Dbetnton,

Bravo!
Maybe it’s locked at the fps of camera?
The multi-thread repo uses a specific version of YoloV5.
You may find some clue from this link. (Credit user: 1117).
I am trying to update the YoloV8 accordingly, so that we don’t have to update the model and post-processing, but don’t have time…:frowning:

This error means the first convolution layer is causing the trouble. Can you check the rknn model is correctly converted?

For the YoloV8, the sub/add node has to be removed from Yolov8 model and move the calculation into the post-processing. The modified model is like the following figure.
image

Let me if there is any questions:)

1 Like

Amazing, thank you!

The extent of my AI knowledge consists of an AI and another ML course for my bachelor’s so a good bit of this is foreign to me.

This has been on the back burner for me for a bit now, but recently stumbled across onnxruntime having the ability to run on rknpu. Anyone tried this?

Site seems to indicate only the RK1808 though, but gonna give it a shot.

hello @nickliu973 and guys,

I have same issue, when running yolov8n model on my rk3588 (from radxa rock5b).

E RKNN: [13:24:08.028] failed to submit!, op id: 171, op name: Add:/model.22/Add_1, flags: 0x5, task start: 1516, task number: 39, run task counter: 16, int status: 0

Which is as you already mentioned in previous posts not supported OP by rk3588 npu.

However I am not experienced enought, to modify / remove it myself. Could you please help me with that?

  • Can this OP be removed directly from ultralytics code (from model) so it is not included in trained model?

  • Can this OP be removed from exported ONNX model? If so how.

I believe this would help many of us, struggling with this issue .

Thank you in advance!

I believe you need to change your opset

i have opset = 12, i need to remove unsupported operation from onnx model

@Dbenton do you have working yolov8 on rk3588?

Oh I see, my fault, yes I do, albeit it fairly slowly, not using the npu atm.

Hi, @hlacik,

diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py
index f7105bf..4516390 100644
--- a/ultralytics/nn/modules/head.py
+++ b/ultralytics/nn/modules/head.py
@@ -57,9 +57,11 @@ class Detect(nn.Module):
             cls = x_cat[:, self.reg_max * 4:]
         else:
             box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
-        dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides
-        y = torch.cat((dbox, cls.sigmoid()), 1)
-        return y if self.export else (y, x)
+        lt, rb = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) 
+        #dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides
+        #y = torch.cat((dbox, cls.sigmoid()), 1)
+        #return y if self.export else (y, x)
+        return  lt, rb, cls.sigmoid()#if self.export else (y, x)

And in /ultralytics/yolo/utils/tal.py:

diff --git a/ultralytics/yolo/utils/tal.py b/ultralytics/yolo/utils/tal.py
index aea8918..b0aef95 100644
--- a/ultralytics/yolo/utils/tal.py
+++ b/ultralytics/yolo/utils/tal.py
@@ -263,11 +263,13 @@ def dist2bbox(distance, anchor_points, xywh=True, dim=-1):
     lt, rb = distance.chunk(2, dim)
     x1y1 = anchor_points - lt
     x2y2 = anchor_points + rb
-    if xywh:
-        c_xy = (x1y1 + x2y2) / 2
-        wh = x2y2 - x1y1
-        return torch.cat((c_xy, wh), dim)  # xywh bbox
-    return torch.cat((x1y1, x2y2), dim)  # xyxy bbox
+    # if xywh:
+    #     c_xy = (x1y1 + x2y2) / 2
+    #     wh = x2y2 - x1y1
+    #     return torch.cat((c_xy, wh), dim)  # xywh bbox
+    #return torch.cat((x1y1, x2y2), dim)  # xyxy bbox
+    return lt, rb  # xyxy bbox
+    #return torch.cat((lt, rb), dim)  # xyxy bbox

Those two modification is able to generate the onnx w/o those add/sub module.
I did not have a chance to test it in NPU.

2 Likes

I can confirm that after applying those patches, exported onnx and then converted to rknn (using rknn-toolkit2 v 1.4 with quantization ON) is runnable on rk3588 / radxa rock5b (using rknn_lite)! thank you very much @nickliu973.

yolov8n runs around 9fps on 1NPU core , so theoretically running it threaded should give around 27fps.

Now the harder part – write own postprocessing code …

1 Like

Finally I was able to convert yolov8 to rknn and able to do inference on my orange pi 5. But the detection score is always 1.62, anybody has any idea?yolov8_result

this is the code for converting and detecting image https://gist.github.com/gilankpam/777713f2899fd385c1792a526e0a1be0

1 Like

hi @gilankpam can you share your onxx and rknn models please?

@phiber here you go https://drive.google.com/drive/folders/1PSXakI7D0oH6fxCj7KXmlpRcOD3vR8ox?usp=sharing

use runtime v.1.5.0

1 Like

Hi @gilankpam!

Thank you for sharing postprocess code, it works perfectly for my yolov8n model but only for QUANTIZE_ON = False, on simulator. ( running rknn-toolkit2 1.5, python 3.10 ) (Don’t have a chance to try it on npu yet)

I also tried (convert and run in simulator) your models from google drive, and got same result: correct predictions for QUANTIZE_ON = False, and nothing (all zeroes in place of instead of class confidences) for QUANTIZE_ON = True

Did you use a big DATASET for quantization step? Maybe some additional steps required to get quantization right?

1 Like

Hi @Mikhael_Danilov
The dataset only contains one image, just copy and pasted from the example here https://github.com/rockchip-linux/rknn-toolkit2/blob/master/examples/onnx/yolov5/dataset.txt

Thanks for mentioning the quantize stuff, I did try set QUANTIZE_ON = False and it works, got the expected scores, but the inference time is way worse

@gilankpam thank you.

I tried your models and they run nicely on NPU (both rknn and onnx when I convert it with rknn-toolkit), however when I trying to convert my own onnx (obtained by yolo export model=yolov8n.pt imgsz=640,640 format=onnx opset=12)
E RKNN: [20:33:42.652] failed to submit!, op id: 122, op name: Mul:/model.22/Mul_2, flags: 0x5, task start: 331, task number: 15, run task counter: 5, int status: 0 - for int8 (quantized) model
E RKNN: [20:34:28.305] failed to submit!, op id: 1, op name: Conv:/model.0/conv/Conv, flags: 0x5, task start: 0, task number: 40, run task counter: 1, int status: 0 - for fp16 one

May you please share an info in how-to make onnх from ultralytics model correctly?

Update:
I was able to convert and run yolov8n and yolov8m on npu with imgsz=416,416 and imgsz=640,480 yet attempts to do so with imgsz=640,640 was failed.

rock@rock-5a:~/rknpu2/examples/rknn_yolov8_demo$ ./build/build_linux_aarch64/rknn_yolov5_demo ~/yolov8s.rknn mode/bus.jpg
./build/build_linux_aarch64/rknn_yolov5_demo: error while loading shared libraries: librga.so: cannot open shared object file: No such file or directory

how can i solve the problem?

Install librga:

wget https://gitlab.com/rk3588_linux/linux/debian/-/raw/master/packages/arm64/rga2/librga2_2.2.0-1_arm64.deb
wget https://gitlab.com/rk3588_linux/linux/debian/-/raw/master/packages/arm64/rga2/librga-dev_2.2.0-1_arm64.deb

Can you share your yolov8s.rknn ?

So has anybody managed to figure out how this is done? Any manual?

Re-program pre- and post- processing code to fit the model into RK3588 NPU hardware.
(The post-processing code in the yolov8 can be copy-paste to the test.py)