Use YoloV8 in RK3588 NPU

Dbenton · May 18, 2023, 7:45pm

Thank you for this. It works amazingly. My camera currently only goes up to 30fps but it is locked at 30fps using code from that project.

I am currently trying to convert my facial detection model to rknn and encountering some issues. I know it’s the post processing. I am getting error:

“input0_data = outputs[0].reshape([3, -1]+list(outputs[0].shape[-2:]))
ValueError: cannot reshape array of size 42000 into shape (3,newaxis,8400,1)
E RKNN: [14:28:13.784] failed to submit!, op id: 1, op name: Conv:/model.0/conv/Conv, flags: 0x5, task start: 0, task number: 19, run task counter: 0, int status: 0”

I read above you mention the v8 post processing is a copy and paste but it looks fairly different, i’m guessing I need to adapt it?

nickliu973 · May 18, 2023, 9:53pm

Hi, Dbetnton,

Bravo!
Maybe it’s locked at the fps of camera?
The multi-thread repo uses a specific version of YoloV5.
You may find some clue from this link. (Credit user: 1117).
I am trying to update the YoloV8 accordingly, so that we don’t have to update the model and post-processing, but don’t have time…

This error means the first convolution layer is causing the trouble. Can you check the rknn model is correctly converted?

For the YoloV8, the sub/add node has to be removed from Yolov8 model and move the calculation into the post-processing. The modified model is like the following figure.

Let me if there is any questions:)

Dbenton · May 19, 2023, 1:19pm

Amazing, thank you!

The extent of my AI knowledge consists of an AI and another ML course for my bachelor’s so a good bit of this is foreign to me.

Dbenton · June 9, 2023, 2:55pm

This has been on the back burner for me for a bit now, but recently stumbled across onnxruntime having the ability to run on rknpu. Anyone tried this?

Site seems to indicate only the RK1808 though, but gonna give it a shot.

hlacik · June 14, 2023, 1:31pm

hello @nickliu973 and guys,

I have same issue, when running yolov8n model on my rk3588 (from radxa rock5b).

E RKNN: [13:24:08.028] failed to submit!, op id: 171, op name: Add:/model.22/Add_1, flags: 0x5, task start: 1516, task number: 39, run task counter: 16, int status: 0

Which is as you already mentioned in previous posts not supported OP by rk3588 npu.

However I am not experienced enought, to modify / remove it myself. Could you please help me with that?

Can this OP be removed directly from ultralytics code (from model) so it is not included in trained model?
Can this OP be removed from exported ONNX model? If so how.

I believe this would help many of us, struggling with this issue .

Thank you in advance!

Dbenton · June 14, 2023, 1:40pm

I believe you need to change your opset

hlacik · June 14, 2023, 2:17pm

i have opset = 12, i need to remove unsupported operation from onnx model

@Dbenton do you have working yolov8 on rk3588?

Dbenton · June 14, 2023, 2:21pm

Oh I see, my fault, yes I do, albeit it fairly slowly, not using the npu atm.

nickliu973 · June 14, 2023, 4:11pm

Hi, @hlacik,

Modify the exported onnx model: https://github.com/ZhangGe6/onnx-modifier
Removed from ultralytics code:
In ultralytics/nn/modules/head.py:

diff --git a/ultralytics/nn/modules/head.py b/ultralytics/nn/modules/head.py
index f7105bf..4516390 100644
--- a/ultralytics/nn/modules/head.py
+++ b/ultralytics/nn/modules/head.py
@@ -57,9 +57,11 @@ class Detect(nn.Module):
             cls = x_cat[:, self.reg_max * 4:]
         else:
             box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
-        dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides
-        y = torch.cat((dbox, cls.sigmoid()), 1)
-        return y if self.export else (y, x)
+        lt, rb = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) 
+        #dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides
+        #y = torch.cat((dbox, cls.sigmoid()), 1)
+        #return y if self.export else (y, x)
+        return  lt, rb, cls.sigmoid()#if self.export else (y, x)

And in /ultralytics/yolo/utils/tal.py:

diff --git a/ultralytics/yolo/utils/tal.py b/ultralytics/yolo/utils/tal.py
index aea8918..b0aef95 100644
--- a/ultralytics/yolo/utils/tal.py
+++ b/ultralytics/yolo/utils/tal.py
@@ -263,11 +263,13 @@ def dist2bbox(distance, anchor_points, xywh=True, dim=-1):
     lt, rb = distance.chunk(2, dim)
     x1y1 = anchor_points - lt
     x2y2 = anchor_points + rb
-    if xywh:
-        c_xy = (x1y1 + x2y2) / 2
-        wh = x2y2 - x1y1
-        return torch.cat((c_xy, wh), dim)  # xywh bbox
-    return torch.cat((x1y1, x2y2), dim)  # xyxy bbox
+    # if xywh:
+    #     c_xy = (x1y1 + x2y2) / 2
+    #     wh = x2y2 - x1y1
+    #     return torch.cat((c_xy, wh), dim)  # xywh bbox
+    #return torch.cat((x1y1, x2y2), dim)  # xyxy bbox
+    return lt, rb  # xyxy bbox
+    #return torch.cat((lt, rb), dim)  # xyxy bbox

Those two modification is able to generate the onnx w/o those add/sub module.
I did not have a chance to test it in NPU.

hlacik · June 14, 2023, 10:02pm

I can confirm that after applying those patches, exported onnx and then converted to rknn (using rknn-toolkit2 v 1.4 with quantization ON) is runnable on rk3588 / radxa rock5b (using rknn_lite)! thank you very much @nickliu973.

yolov8n runs around 9fps on 1NPU core , so theoretically running it threaded should give around 27fps.

Now the harder part – write own postprocessing code …

gilankpam · June 15, 2023, 7:08pm

Finally I was able to convert yolov8 to rknn and able to do inference on my orange pi 5. But the detection score is always 1.62, anybody has any idea? yolov8_result

this is the code for converting and detecting image https://gist.github.com/gilankpam/777713f2899fd385c1792a526e0a1be0

phiber · June 15, 2023, 9:12pm

hi @gilankpam can you share your onxx and rknn models please?

gilankpam · June 16, 2023, 8:58am

@phiber here you go https://drive.google.com/drive/folders/1PSXakI7D0oH6fxCj7KXmlpRcOD3vR8ox?usp=sharing

use runtime v.1.5.0

Mikhael_Danilov · June 16, 2023, 8:17pm

Hi @gilankpam!

Thank you for sharing postprocess code, it works perfectly for my yolov8n model but only for QUANTIZE_ON = False, on simulator. ( running rknn-toolkit2 1.5, python 3.10 ) (Don’t have a chance to try it on npu yet)

I also tried (convert and run in simulator) your models from google drive, and got same result: correct predictions for QUANTIZE_ON = False, and nothing (all zeroes in place of instead of class confidences) for QUANTIZE_ON = True

Did you use a big DATASET for quantization step? Maybe some additional steps required to get quantization right?

gilankpam · June 17, 2023, 9:35am

Hi @Mikhael_Danilov
The dataset only contains one image, just copy and pasted from the example here https://github.com/rockchip-linux/rknn-toolkit2/blob/master/examples/onnx/yolov5/dataset.txt

Thanks for mentioning the quantize stuff, I did try set QUANTIZE_ON = False and it works, got the expected scores, but the inference time is way worse

Mikhael_Danilov · June 20, 2023, 5:39pm

@gilankpam thank you.

I tried your models and they run nicely on NPU (both rknn and onnx when I convert it with rknn-toolkit), however when I trying to convert my own onnx (obtained by yolo export model=yolov8n.pt imgsz=640,640 format=onnx opset=12)
E RKNN: [20:33:42.652] failed to submit!, op id: 122, op name: Mul:/model.22/Mul_2, flags: 0x5, task start: 331, task number: 15, run task counter: 5, int status: 0 - for int8 (quantized) model
E RKNN: [20:34:28.305] failed to submit!, op id: 1, op name: Conv:/model.0/conv/Conv, flags: 0x5, task start: 0, task number: 40, run task counter: 1, int status: 0 - for fp16 one

May you please share an info in how-to make onnх from ultralytics model correctly?

Mikhael_Danilov · June 20, 2023, 9:25pm

Update:
I was able to convert and run yolov8n and yolov8m on npu with imgsz=416,416 and imgsz=640,480 yet attempts to do so with imgsz=640,640 was failed.

zhanghui_china · June 24, 2023, 9:03am

rock@rock-5a:~/rknpu2/examples/rknn_yolov8_demo$ ./build/build_linux_aarch64/rknn_yolov5_demo ~/yolov8s.rknn mode/bus.jpg
./build/build_linux_aarch64/rknn_yolov5_demo: error while loading shared libraries: librga.so: cannot open shared object file: No such file or directory

how can i solve the problem?

avaf · June 24, 2023, 12:32pm

Install librga:

wget https://gitlab.com/rk3588_linux/linux/debian/-/raw/master/packages/arm64/rga2/librga2_2.2.0-1_arm64.deb
wget https://gitlab.com/rk3588_linux/linux/debian/-/raw/master/packages/arm64/rga2/librga-dev_2.2.0-1_arm64.deb

Can you share your yolov8s.rknn ?

DionisisL · June 25, 2023, 10:18am

So has anybody managed to figure out how this is done? Any manual?

Re-program pre- and post- processing code to fit the model into RK3588 NPU hardware.
(The post-processing code in the yolov8 can be copy-paste to the test.py)