yoloV8s seems incompatible with RK3588 NPU

Well, I converted the yolov8s.pt to ONNX

>>> from ultralytics import YOLO
>>> model = YOLO("yolov8s.pt")
>>> model.export(format="onnx",imgsz=640,int8=True,opset=12)
Ultralytics YOLOv8.0.53  Python-3.10.10 torch-1.13.1+cpu CPU
YOLOv8s summary (fused): 168 layers, 11156544 parameters, 0 gradients, 28.6 GFLOPs

PyTorch: starting from yolov8s.pt with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (21.5 MB)

ONNX: starting export with onnx 1.13.1...
ONNX: export success  1.0s, saved as yolov8s.onnx (42.8 MB)

Export complete (1.7s)
Results saved to C:\Users\groff\yolov8\ultralytics
Predict:         yolo predict task=detect model=yolov8s.onnx imgsz=640
Validate:        yolo val task=detect model=yolov8s.onnx imgsz=640 data=coco.yaml
Visualize:       https://netron.app

Then converted to RKNN using the API, which was a very verbose output resulting in

Total Weight memory Size:12103936
Total Internal memory Size:8199424
Predict Internal Memory RW Amount: 209435648
Predict Weight Memory RW Amount: 12103648
<<<<<<<< end: N4rknn21RKNNMemStatisticsPassE
I rknn building done
--> Export rknn model

But when loading that model to the NPU and attempting inference I get:

E RKNN: [01:25:20.303] failed to submit! op id: 1, op name: Conv:/model.0/conv/Conv, flags: 0x5, task start: 0, task number: 19, run task counter: 0, int status: 0

Thats the message coming back from the rkkn_run(), in my logs I show inputs loaded, outputs loaded, and tensors described thusly:

Trying load for:librknnrt.so
Init time:150 ms.
RKNN SDK - API version:1.4.0 (a10f100eb@2022-09-09T09:07:14), Driver version:0.8
.2 Number inputs=1, Number outputs=1
Tensor input layer 0 attributes:
  index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1
228800, fmt=NHWC, type=INT8, quantitative_type=AFFINE, zp=-128, scale=0.003922, 
fl=0, w_stride=640, size_with_stride=1228800, pass_through=0, h_stride=0

Tensor output layer 0 attributes:
  index=0, name=output0, n_dims=4, dims=[1, 84, 8400, 1], n_elems=705600, size=7
05600, fmt=NCHW, type=INT8, quantitative_type=AFFINE, zp=-128, scale=2.498975, f
l=0, w_stride=0, size_with_stride=705600, pass_through=0, h_stride=0

Total category labels=80
Input Frames per second:0 Storing:true. Slew rate=195
545997754224 640 640 3 RKNN_TENSOR_INT8 RKNN_TENSOR_NHWC 1228800
Inputs set
Outputs set
Input Frames per second:7 Storing:true. Slew rate=147

So it seems to enumerate layers properly, though they are radically different from yolov5. It sets the inputs and outputs then starts the message loop thread that gets the input images but the NPU barfs on inference.
This line:
545997754224 640 640 3 RKNN_TENSOR_INT8 RKNN_TENSOR_NHWC 1228800
Is the context Id from the NPU init, and the stats from the query of the model which seem to comply with NPU requirements for yolov5.

1 Like

There’s a bunch of changes in this PR from Ultralytics. Have you tried reaching out to that developer on his PR?

Well, his last commit was over 2 months ago, and yolov8 came out 2 months ago :frowning_face:

The other thing I tried was to turn off quantization, which caused it to convert to float16, which the guide said it would do for some reason. But that model blows up on set_inputs; i.e. it doesnt even get to inference.

so it borks out on some conversion + conversion?

Conv probably means convolution or convolutional but it says task:start so its probably taking a dump right at the beginning.