Yolov8: Custom rknn is failed to give output with output tensors fmt=UNDEFINED

I’m trying to deploy custom model. where pt gets converted to onnx and onnx gets converted to rknn. Both pt and onnx results the proper output at host. But the rknn doesn’t give the expected output at rk3588 target and it shows same out image that input has.

Here are the details steps,

yolov8 $ yolo export model=./best.pt imgsz=640,640 format=onnx opset=12
rknn_model_zeoo $ $ python convert.py ../model/best.onnx rk3588
W __init__: rknn-toolkit2 version: 1.6.0+81f21f4d
--> Config model
done
--> Loading model
W load_onnx: It is recommended onnx opset 19, but your onnx model opset is 12!
W load_onnx: Model converted from pytorch, 'opset_version' should be set 19 in torch.onnx.export for successful convert!
Loading : 100%|███████████████████████████████████████████████| 186/186 [00:00<00:00, 100989.07it/s]
done
--> Building model
W build: found outlier value, this may affect quantization accuracy
const name               abs_mean    abs_std     outlier value
model.0.conv.weight      5.34        7.41        62.182      
GraphPreparing : 100%|██████████████████████████████████████████| 227/227 [00:00<00:00, 1283.46it/s]
Quantizating : 100%|██████████████████████████████████████████████| 227/227 [00:13<00:00, 17.00it/s]
W build: The default input dtype of 'images' is changed from 'float32' to 'int8' in rknn model for performance!
                       Please take care of this change when deploy rknn model with Runtime API!
W build: The default output dtype of 'output0' is changed from 'float32' to 'int8' in rknn model for performance!
                      Please take care of this change when deploy rknn model with Runtime API!
done
--> Export rknn model
done

On rk3588 target

rknn_yolov8_demo$ ./rknn_yolov8_demo /mnt/leaf/yolov8.rknn /mnt/leaf/plant.jpg 
load lable ./model/coco_80_labels_list.txt
model input num: 1, output num: 1
input tensors:
  index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
  index=0, name=output0, n_dims=3, dims=[1, 6, 8400, 0], n_elems=50400, size=50400, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=-128, scale=2.533227
model is NHWC input fmt
model input height=640, width=640, channel=3
origin size=640x640 crop size=640x640
input image: 640 x 640, subsampling: 4:2:0, colorspace: YCbCr, orientation: 1
scale=1.000000 dst_box=(0 0 639 639) allow_slight_change=1 _left_offset=0 _top_offset=0 padding_w=0 padding_h=0
src width=640 height=640 fmt=0x1 virAddr=0x0xaaaaffff7940 fd=0
dst width=640 height=640 fmt=0x1 virAddr=0x0xaaab00123950 fd=0
src_box=(0 0 639 639)
dst_box=(0 0 639 639)
color=0x72
rga_api version 1.10.0_[2]
rknn_run
write_image path: out.png width=640 height=640 channel=3 data=0xaaaaffff7940

However the default yolov8 onnx mentioned in rknn_model_zoo is working as expected.

rknn_yolov8_demo$ ./rknn_yolov8_demo /mnt/leaf/yolov8-default.rknn /mnt/leaf/plant.jpg 
load lable ./model/coco_80_labels_list.txt
model input num: 1, output num: 9
input tensors:
  index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
output tensors:
  index=0, name=318, n_dims=4, dims=[1, 64, 80, 80], n_elems=409600, size=409600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-56, scale=0.110522
  index=1, name=onnx::ReduceSum_326, n_dims=4, dims=[1, 80, 80, 80], n_elems=512000, size=512000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003452
  index=2, name=331, n_dims=4, dims=[1, 1, 80, 80], n_elems=6400, size=6400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003482
  index=3, name=338, n_dims=4, dims=[1, 64, 40, 40], n_elems=102400, size=102400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-17, scale=0.098049
  index=4, name=onnx::ReduceSum_346, n_dims=4, dims=[1, 80, 40, 40], n_elems=128000, size=128000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003592
  index=5, name=350, n_dims=4, dims=[1, 1, 40, 40], n_elems=1600, size=1600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003755
  index=6, name=357, n_dims=4, dims=[1, 64, 20, 20], n_elems=25600, size=25600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-49, scale=0.078837
  index=7, name=onnx::ReduceSum_365, n_dims=4, dims=[1, 80, 20, 20], n_elems=32000, size=32000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003817
  index=8, name=369, n_dims=4, dims=[1, 1, 20, 20], n_elems=400, size=400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003835
model is NHWC input fmt
model input height=640, width=640, channel=3
origin size=640x640 crop size=640x640
input image: 640 x 640, subsampling: 4:2:0, colorspace: YCbCr, orientation: 1
scale=1.000000 dst_box=(0 0 639 639) allow_slight_change=1 _left_offset=0 _top_offset=0 padding_w=0 padding_h=0
src width=640 height=640 fmt=0x1 virAddr=0x0xaaab0e0510c0 fd=0
dst width=640 height=640 fmt=0x1 virAddr=0x0xaaab0e17d0d0 fd=0
src_box=(0 0 639 639)
dst_box=(0 0 639 639)
color=0x72
rga_api version 1.10.0_[2]
rknn_run
vase @ (398 371 530 502) 0.801
potted plant @ (320 131 618 503) 0.798
write_image path: out.png width=640 height=640 channel=3 data=0xaaab0e0510c0

The only difference I can see in not working is fmt=UNDEFINED

output tensors:
  index=0, name=output0, n_dims=3, dims=[1, 6, 8400, 0], n_elems=50400, size=50400, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=-128, scale=2.533227

Any help where it gets wrong?

Thanks,
Jagan.