Well, I converted the yolov8s.pt to ONNX
>>> from ultralytics import YOLO
>>> model = YOLO("yolov8s.pt")
>>> model.export(format="onnx",imgsz=640,int8=True,opset=12)
Ultralytics YOLOv8.0.53 Python-3.10.10 torch-1.13.1+cpu CPU
YOLOv8s summary (fused): 168 layers, 11156544 parameters, 0 gradients, 28.6 GFLOPs
PyTorch: starting from yolov8s.pt with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (21.5 MB)
ONNX: starting export with onnx 1.13.1...
ONNX: export success 1.0s, saved as yolov8s.onnx (42.8 MB)
Export complete (1.7s)
Results saved to C:\Users\groff\yolov8\ultralytics
Predict: yolo predict task=detect model=yolov8s.onnx imgsz=640
Validate: yolo val task=detect model=yolov8s.onnx imgsz=640 data=coco.yaml
Visualize: https://netron.app
'yolov8s.onnx'
>>>
Then converted to RKNN using the API, which was a very verbose output resulting in
...
------------------------------
Total Weight memory Size:12103936
Total Internal memory Size:8199424
Predict Internal Memory RW Amount: 209435648
Predict Weight Memory RW Amount: 12103648
------------------------------
<<<<<<<< end: N4rknn21RKNNMemStatisticsPassE
I rknn building done
done
--> Export rknn model
done
But when loading that model to the NPU and attempting inference I get:
...
E RKNN: [01:25:20.303] failed to submit! op id: 1, op name: Conv:/model.0/conv/Conv, flags: 0x5, task start: 0, task number: 19, run task counter: 0, int status: 0
Thats the message coming back from the rkkn_run(), in my logs I show inputs loaded, outputs loaded, and tensors described thusly:
Trying load for:librknnrt.so
Init time:150 ms.
RKNN SDK - API version:1.4.0 (a10f100eb@2022-09-09T09:07:14), Driver version:0.8
.2 Number inputs=1, Number outputs=1
Tensor input layer 0 attributes:
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1
228800, fmt=NHWC, type=INT8, quantitative_type=AFFINE, zp=-128, scale=0.003922,
fl=0, w_stride=640, size_with_stride=1228800, pass_through=0, h_stride=0
Tensor output layer 0 attributes:
index=0, name=output0, n_dims=4, dims=[1, 84, 8400, 1], n_elems=705600, size=7
05600, fmt=NCHW, type=INT8, quantitative_type=AFFINE, zp=-128, scale=2.498975, f
l=0, w_stride=0, size_with_stride=705600, pass_through=0, h_stride=0
Total category labels=80
Input Frames per second:0 Storing:true. Slew rate=195
545997754224 640 640 3 RKNN_TENSOR_INT8 RKNN_TENSOR_NHWC 1228800
Inputs set
Outputs set
Input Frames per second:7 Storing:true. Slew rate=147
So it seems to enumerate layers properly, though they are radically different from yolov5. It sets the inputs and outputs then starts the message loop thread that gets the input images but the NPU barfs on inference.
This line:
545997754224 640 640 3 RKNN_TENSOR_INT8 RKNN_TENSOR_NHWC 1228800
Is the context Id from the NPU init, and the stats from the query of the model which seem to comply with NPU requirements for yolov5.