ROCK 5B Debug Party Invitation

Here are some notes of my try:

Kernel without SRAM

  • running with python3
  • your test.py was missing import time, after adding it i have the error:

rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ python3 test.py
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [18:05:47.097] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:05:47.097] RKNN Driver Information: version: 0.8.2
I RKNN: [18:05:47.097] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
Traceback (most recent call last):
File “/home/rock/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite/test.py”, line 94, in
starttime = time()
TypeError: ‘module’ object is not callable

Kernel with SRAM

  • with untouched test.py

rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ sudo python3 test.py
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [18:15:58.610] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:15:58.612] RKNN Driver Information: version: 0.8.2
I RKNN: [18:15:58.613] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
E RKNN: [18:15:58.614] failed to allocate fd, ret: -1, errno: 22, errstr: Invalid argument
E RKNN: [18:15:58.614] failed to malloc npu memory!, size: 11959616, flags: 0x2
E RKNN: [18:15:58.614] rknn_init, load model failed!
E Catch exception when init runtime!
E Traceback (most recent call last):
File “/usr/local/lib/python3.9/dist-packages/rknnlite/api/rknn_lite.py”, line 148, in init_runtime
self.rknn_runtime.build_graph(self.rknn_data, self.load_model_in_npu)
File “rknnlite/api/rknn_runtime.py”, line 840, in rknnlite.api.rknn_runtime.RKNNRuntime.build_graph
Exception: RKNN init failed. error code: RKNN_ERR_FAIL

Init runtime environment failed

Kernel log:

[ 290.501752] RKNPU: rknpu_mem_create_ioctl: malloc iommu memory unsupported in current!

I enabled this:

CONFIG_ROCKCHIP_RKNPU_DMA_HEAP=y

PS: i am stranger to python…

Dunno as guess you only did as https://github.com/rockchip-linux/rknpu2/blob/master/doc/RK3588_NPU_SRAM_usage.md

Add

syssram: sram@ff001000 {
    compatible = "mmio-sram";
    reg = <0x0 0xff001000 0x0 0xef000>;

    #address-cells = <1>;
    #size-cells = <1>;
    ranges = <0x0 0x0 0xff001000 0xef000>;
    /* 分配RKNPU SRAM */
    /* start address and size should be 4k algin */
    rknpu_sram: rknpu_sram@0 {
        reg = <0x0 0xef000>; // 956KB
    };
};

Then convert to an overlay

rknpu: npu@fdab0000 {
    compatible = "rockchip,rk3588-rknpu";
    /* ... */
    /* 增加RKNPU sram的引用 */
    rockchip,sram = <&rknpu_sram>;
    status = "disabled";
};

Then tack that onto extlinux.conf to load the dtbo I guess.

As for errors I dunno should be able to run as normal user without sudo but doubt that and apols as forgot about the time from time import time

In https://github.com/rockchip-linux/rknpu2#140 it looks very much like it would improve inference speed but have no idea about RKNPU: rknpu_mem_create_ioctl: malloc iommu memory unsupported in current!

Thanks though

Python terrible language for noobs like me :slight_smile:

Used GEM instead of DMA:

  • First run without any optimization

–> Init runtime environment
I RKNN: [18:54:27.733] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:54:27.734] RKNN Driver Information: version: 0.8.2
I RKNN: [18:54:27.735] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done

  • Second run without any optimization

rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ export RKNN_INTERNAL_MEM_TYPE=sram
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ sudo python3 test.py
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [18:56:43.499] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:56:43.499] RKNN Driver Information: version: 0.8.2
I RKNN: [18:56:43.499] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ export RKNN_INTERNAL_MEM_TYPE=sram#256
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ sudo python3 test.py
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [18:57:06.900] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:57:06.900] RKNN Driver Information: version: 0.8.2
I RKNN: [18:57:06.900] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ export RKNN_SEPARATE_WEIGHT_MEM=1
export RKNN_WEIGHT_MEM_TYPE=sram
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ sudo python3 test.py
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [18:57:21.863] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:57:21.864] RKNN Driver Information: version: 0.8.2
I RKNN: [18:57:21.864] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ export RKNN_SEPARATE_WEIGHT_MEM=1
export RKNN_WEIGHT_MEM_TYPE=sram#128
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ sudo python3 test.py
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [18:58:07.455] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:58:07.455] RKNN Driver Information: version: 0.8.2
I RKNN: [18:58:07.455] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ export RKNN_INTERNAL_MEM_TYPE=sram#256
export RKNN_SEPARATE_WEIGHT_MEM=1
export RKNN_WEIGHT_MEM_TYPE=sram#128
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ sudo python3 test.py
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [18:58:24.035] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:58:24.035] RKNN Driver Information: version: 0.8.2
I RKNN: [18:58:24.035] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$

It did not allocate and used SRAM:

cat /sys/kernel/debug/rknpu/mm
    SRAM bitmap: "*" - used, "." - free (1bit = 4KB)
    [000] [................................]
    [001] [................................]
    [002] [................................]
    [003] [................................]
    [004] [................................]
    [005] [................................]
    [006] [................................]
    [007] [...............]
    SRAM total size: 978944, used: 0, free: 978944

[ 868.106264] RKNPU: allocate iova start: 0x00000000fe740000, size: 12091392
[ 868.118667] RKNPU: allocate size: 11960320 with sram size: 131072
[ 868.132467] RKNPU: allocate iova start: 0x00000000fe5a0000, size: 1417216
[ 868.133622] RKNPU: allocate size: 1155072 with sram size: 262144
[ 989.802333] RKNPU: allocate iova start: 0x00000000fd970000, size: 12939264
[ 989.815149] RKNPU: allocate size: 11960320 with sram size: 978944

@stuartiannaylor
The real-time detection is not possible to run due to Radxa librknnrt version: 1.2.0 (1867aec5b@2022-01-14T15:16:40) being older than 1.3.

Is the results what you expected?
Do you see the use of SRAM in your experiments?

No haven’t made the DTS mods guess I should thought it may make more sense to you.
I will give it a go tomoz maybe and get back to you.
But from the run info

I RKNN: [18:58:24.035] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [18:58:24.035] RKNN Driver Information: version: 0.8.2
I RKNN: [18:58:24.035] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW

We are running 1.4.0 ?

yes. I RKNN: [19:43:06.149] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)

I copied the yolov5s-640-640.rknn from /home/rock/rockchip/npu/rknpu2/examples/rknn_yolov5_demo/model/RK3588/yolov5s-640-640.rknn’ should work but i get error:

E RKNN: [19:51:38.540] rknn_query, info_len(372) < sizeof(rknn_tensor_attr)(376)!
rknn_init error ret=-5

The rknpu2 demo works as expected:

./rknn_yolov5_demo ./model/RK3588/yolov5s-640-640.rknn bus.jpg
post process config: box_conf_threshold = 0.25, nms_threshold = 0.45
Read bus.jpg …
img width = 640, img height = 640
Loading mode…
sdk version: 1.4.0 (a10f100eb@2022-09-09T09:07:14) driver version: 0.8.2
model input num: 1, output num: 3
index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
index=0, name=output, n_dims=5, dims=[1, 3, 85, 80], n_elems=1632000, size=1632000, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=77, scale=0.080445
index=1, name=371, n_dims=5, dims=[1, 3, 85, 40], n_elems=408000, size=408000, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=56, scale=0.080794
index=2, name=390, n_dims=5, dims=[1, 3, 85, 20], n_elems=102000, size=102000, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=69, scale=0.081305
model is NHWC input fmt
model input height=640, width=640, channel=3
once run use 35.951000 ms

I am not sure about the other examples as its a weird layout. The root is a x86 Ubuntu install of the toolkit to convert models.
The rknn_toolkit_lite2 subfolder is the runtime to run on debian and only has the one example.
I am not sure about the root examples.

I have just been playing with https://github.com/usefulsensors/openai-whisper which is OpenAi’s whisper running on tensorflow but don’t know if they will quant to int 8 or rknn-toolkit2 will create runable models.
The cpu can prob run fp16 or fp32 and another ML framework to get my head round but find conversion can be a dark art sometimes and need to some reading and experiments.

Looks like the above is doing some same as it makes psuedo fp out of integers zp=zero point then it scales with scale but prob where it says fmt=UNDEFINED maybe it was unable to handle them?
Dunno as a complete noob currently with https://github.com/rockchip-linux/rknn-toolkit2

The big if is will the whisper models fully quant and also end up undefined as could be such a good fit running the partitioned models on the x3 cores of the NPU and the left overs maybe gpu/cpu.
It is a really good SOTA ASR and running on a low wattage rk3588 as you can run all on CPU but that creates load whilst there is a more powerful maybe x3 NPU avail if it will quant to int8.

I will have a go but prob needs someone to recompile the kernel with CONFIG_ROCKCHIP_RKNPU_SRAM=y and haven’t got anything setup yet

Ahh, the api has changed, now it is working. I wil check how to use SRAM.

root@rock5b:/home/rock/rockchip/camera/rknn_cam2# cat /sys/kernel/debug/rknpu/mm
SRAM bitmap: "*" - used, "." - free (1bit = 4KB)
[000] [********************************]
[001] [********************************]
[002] [********************************]
[003] [********************************]
[004] [********************************]
[005] [********************************]
[006] [................................]
[007] [...............]
SRAM total size: 978944, used: 786432, free: 192512
1 Like

What I did is just copy the original script https://github.com/rockchip-linux/rknn-toolkit2/blob/master/rknn_toolkit_lite2/examples/inference_with_lite/test.py

import cv2
import numpy as np
import platform
from rknnlite.api import RKNNLite
from time import time

# decice tree for rk356x/rk3588
DEVICE_COMPATIBLE_NODE = '/proc/device-tree/compatible'

def get_host():
    # get platform and device type
    system = platform.system()
    machine = platform.machine()
    os_machine = system + '-' + machine
    if os_machine == 'Linux-aarch64':
        try:
            with open(DEVICE_COMPATIBLE_NODE) as f:
                device_compatible_str = f.read()
                if 'rk3588' in device_compatible_str:
                    host = 'RK3588'
                else:
                    host = 'RK356x'
        except IOError:
            print('Read device node {} failed.'.format(DEVICE_COMPATIBLE_NODE))
            exit(-1)
    else:
        host = os_machine
    return host

INPUT_SIZE = 224

RK356X_RKNN_MODEL = 'resnet18_for_rk356x.rknn'
RK3588_RKNN_MODEL = 'resnet18_for_rk3588.rknn'


def show_top5(result):
    output = result[0].reshape(-1)
    # softmax
    output = np.exp(output)/sum(np.exp(output))
    output_sorted = sorted(output, reverse=True)
    top5_str = 'resnet18\n-----TOP 5-----\n'
    for i in range(5):
        value = output_sorted[i]
        index = np.where(output == value)
        for j in range(len(index)):
            if (i + j) >= 5:
                break
            if value > 0:
                topi = '{}: {}\n'.format(index[j], value)
            else:
                topi = '-1: 0.0\n'
            top5_str += topi
    print(top5_str)


if __name__ == '__main__':

    host_name = get_host()
    if host_name == 'RK356x':
        rknn_model = RK356X_RKNN_MODEL
    elif host_name == 'RK3588':
        rknn_model = RK3588_RKNN_MODEL
    else:
        print("This demo cannot run on the current platform: {}".format(host_name))
        exit(-1)

    rknn_lite = RKNNLite()

    # load RKNN model
    print('--> Load RKNN model')
    ret = rknn_lite.load_rknn(rknn_model)
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')

    #ori_img = cv2.imread('./space_shuttle_224.jpg')
    ori_img = cv2.imread('./orange-224.jpg')
    img = cv2.cvtColor(ori_img, cv2.COLOR_BGR2RGB)

    # init runtime environment
    print('--> Init runtime environment')
    # run on RK356x/RK3588 with Debian OS, do not need specify target.
    if host_name == 'RK3588':
        ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0)
    else:
        ret = rknn_lite.init_runtime()
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)
    starttime = time()
    print('--> Running model')
    for r in range(50):
        # Inference
        startinference = time()
        for i in range(1000):
            outputs = rknn_lite.inference(inputs=[img])

        print(1000 / (time() - startinference))
        show_top5(outputs)
        print('done ', time() - starttime)

    rknn_lite.release()

just renamed and copied to test_core0.py
edited ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_1) and saved as test_core0.py
3rd one ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_2) and just ran at the same time in seperate cli windows.
But also last time you can run a model on all three cores supposedly
ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) and maybe is same as results seem very varied and did not scale well at all.

rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ python3 test.py.0
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [22:01:56.680] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [22:01:56.680] RKNN Driver Information: version: 0.8.2
I RKNN: [22:01:56.681] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ python3 test.py.1
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [22:01:59.933] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [22:01:59.933] RKNN Driver Information: version: 0.8.2
I RKNN: [22:01:59.933] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ python3 test.py.2
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [22:02:02.729] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [22:02:02.729] RKNN Driver Information: version: 0.8.2
I RKNN: [22:02:02.729] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$ python3 test.py.3
–> Load RKNN model
done
–> Init runtime environment
I RKNN: [22:02:06.881] RKNN Runtime Information: librknnrt version: 1.4.0 (a10f100eb@2022-09-09T09:07:14)
I RKNN: [22:02:06.881] RKNN Driver Information: version: 0.8.2
I RKNN: [22:02:06.882] RKNN Model Information: version: 1, toolkit version: 1.4.0-c15f5e0b(compiler version: 1.4.0 (c73777b51@2022-09-05T12:06:01)), target: RKNPU v2, target platform: rk3588, framework name: PyTorch, framework layout: NCHW
done
–> Running model
resnet18
-----TOP 5-----
[812]: 0.9996696710586548
[404]: 0.0002492684288881719
[657]: 1.632158637221437e-05
[833]: 1.0159346857108176e-05
[466 895]: 9.02384545042878e-06

done
rock@rock5b:~/rockchip/npu/rknn-toolkit2/rknn_toolkit_lite2/examples/inference_with_lite$

test.py.0 -> RKNNLite.NPU_CORE_0
test.py.1 -> RKNNLite.NPU_CORE_1
test.py.2 -> RKNNLite.NPU_CORE_2
test.py.3 -> RKNNLite.NPU_CORE_0_1_2

Some screenshot with SRAM:

1 Like

I just rough hacked in a FPS counter and overall time so I could compare single core against multiple core.

    starttime = time()
    print('--> Running model')
    for r in range(50):
        # Inference
        startinference = time()
        for i in range(1000):
            outputs = rknn_lite.inference(inputs=[img])

        print(1000 / (time() - startinference))
        show_top5(outputs)
        print('done ', time() - starttime)

ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0_1_2) didn’t sort of work out how I expected from results returned.
I wonder if Radxa are going to add that as an overlay and enable the kernel conf?
@jack


Some benches in the above

Presume same with above as only just read and already forgot but presume the core is on auto

Not possible to run it.

Only have rknn_toolkit_lite2. Someone suggested changing it to lite as:

from rknnlite.api import RKNNLite as RKNN

es/onnx/yolov5/test.py
W Verbose file path is invalid, debug info will not dump to file.
--> Config model
Traceback (most recent call last):
  File "/home/rock/rockchip/npu/rknn-toolkit2/examples/onnx/yolov5/test.py", line 244, in <module>
    rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]])
AttributeError: 'RKNNLite' object has no attribute 'config'

Nope its the older one apols as never noticed looks like we have the one example on the new one.
I haven’t got round to even running rather than using the toolkit to import and export as rknn-toolkit2

Still though the yolov5 @640x480 fps was quite impressive for a single core.

I keep meaning to find a model to run on cpu/gpu/npu to get a like for like but working out the frameworks is quite a task but really seems to be find a model that will fully convert to int8 which CNN models tend to do.
When you get recurrent models with LSTM or GRU layers things start to say no.

I don’t think I’ve made any changes to the kernel that would affect this, but I am using CONFIG_ROCKCHIP_MULTI_RGA rather than RGA2, which builds the module rga3.

I guess I could detect the kernel driver in use and switch between job struct definitions…

There are two RGA configs CONFIG_VIDEO_ROCKCHIP_RGA for rga2 and CONFIG_ROCKCHIP_MULTI_RGA for rga3. Should we disable the legacy one?

It doesn’t matter if you have the original rga or rga2 enabled, as long as they aren’t loaded… but I think it’s safe to disable CONFIG_ROCKCHIP_RGA and CONFIG_ROCKCHIP_RGA2. CONFIG_VIDEO_ROCKCHIP_RGA is an upstream driver that does not seem to support RK3588, so can also be disabled.

1 Like

Still don’t understand why we have a x86 wheel for the conversion wheel and that is where things went wrong.
As if you run on x86 and export the yolov5s.rknn then drag that to your Rock5b just put in same folder.
But on that we have installed the rknn_lite module then you should be able to run

import os
import urllib
import traceback
import time
import sys
import numpy as np
import cv2
from rknnlite.api import RKNNLite

ONNX_MODEL = 'yolov5s.onnx'
RKNN_MODEL = 'yolov5s.rknn'
IMG_PATH = './bus.jpg'
DATASET = './dataset.txt'

QUANTIZE_ON = True

OBJ_THRESH = 0.25
NMS_THRESH = 0.45
IMG_SIZE = 640

CLASSES = ("person", "bicycle", "car", "motorbike ", "aeroplane ", "bus ", "train", "truck ", "boat", "traffic light",
           "fire hydrant", "stop sign ", "parking meter", "bench", "bird", "cat", "dog ", "horse ", "sheep", "cow", "elephant",
           "bear", "zebra ", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
           "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife ",
           "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza ", "donut", "cake", "chair", "sofa",
           "pottedplant", "bed", "diningtable", "toilet ", "tvmonitor", "laptop	", "mouse	", "remote ", "keyboard ", "cell phone", "microwave ",
           "oven ", "toaster", "sink", "refrigerator ", "book", "clock", "vase", "scissors ", "teddy bear ", "hair drier", "toothbrush ")


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def xywh2xyxy(x):
    # Convert [x, y, w, h] to [x1, y1, x2, y2]
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y


def process(input, mask, anchors):

    anchors = [anchors[i] for i in mask]
    grid_h, grid_w = map(int, input.shape[0:2])

    box_confidence = sigmoid(input[..., 4])
    box_confidence = np.expand_dims(box_confidence, axis=-1)

    box_class_probs = sigmoid(input[..., 5:])

    box_xy = sigmoid(input[..., :2])*2 - 0.5

    col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)
    row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)
    col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    grid = np.concatenate((col, row), axis=-1)
    box_xy += grid
    box_xy *= int(IMG_SIZE/grid_h)

    box_wh = pow(sigmoid(input[..., 2:4])*2, 2)
    box_wh = box_wh * anchors

    box = np.concatenate((box_xy, box_wh), axis=-1)

    return box, box_confidence, box_class_probs


def filter_boxes(boxes, box_confidences, box_class_probs):
    """Filter boxes with box threshold. It's a bit different with origin yolov5 post process!

    # Arguments
        boxes: ndarray, boxes of objects.
        box_confidences: ndarray, confidences of objects.
        box_class_probs: ndarray, class_probs of objects.

    # Returns
        boxes: ndarray, filtered boxes.
        classes: ndarray, classes for boxes.
        scores: ndarray, scores for boxes.
    """
    boxes = boxes.reshape(-1, 4)
    box_confidences = box_confidences.reshape(-1)
    box_class_probs = box_class_probs.reshape(-1, box_class_probs.shape[-1])

    _box_pos = np.where(box_confidences >= OBJ_THRESH)
    boxes = boxes[_box_pos]
    box_confidences = box_confidences[_box_pos]
    box_class_probs = box_class_probs[_box_pos]

    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)
    _class_pos = np.where(class_max_score >= OBJ_THRESH)

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    scores = (class_max_score* box_confidences)[_class_pos]

    return boxes, classes, scores


def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2] - boxes[:, 0]
    h = boxes[:, 3] - boxes[:, 1]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def yolov5_post_process(input_data):
    masks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
               [59, 119], [116, 90], [156, 198], [373, 326]]

    boxes, classes, scores = [], [], []
    for input, mask in zip(input_data, masks):
        b, c, s = process(input, mask, anchors)
        b, c, s = filter_boxes(b, c, s)
        boxes.append(b)
        classes.append(c)
        scores.append(s)

    boxes = np.concatenate(boxes)
    boxes = xywh2xyxy(boxes)
    classes = np.concatenate(classes)
    scores = np.concatenate(scores)

    nboxes, nclasses, nscores = [], [], []
    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]

        keep = nms_boxes(b, s)

        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


def draw(image, boxes, scores, classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
        top = int(top)
        left = int(left)
        right = int(right)
        bottom = int(bottom)

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)


def letterbox(im, new_shape=(640, 640), color=(0, 0, 0)):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)


if __name__ == '__main__':

    # Create RKNN object

    rknn_lite = RKNNLite()

    # load RKNN model
    print('--> Load RKNN model')
    ret = rknn_lite.load_rknn(RKNN_MODEL)
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')
    ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0)
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)    
    # Set inputs
    img = cv2.imread(IMG_PATH)
    # img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

    # Inference
    print('--> Running model')
    outputs = rknn_lite.inference(inputs=[img])
    
    # post process
    input0_data = outputs[0]
    input1_data = outputs[1]
    input2_data = outputs[2]

    input0_data = input0_data.reshape([3, -1]+list(input0_data.shape[-2:]))
    input1_data = input1_data.reshape([3, -1]+list(input1_data.shape[-2:]))
    input2_data = input2_data.reshape([3, -1]+list(input2_data.shape[-2:]))

    input_data = list()
    input_data.append(np.transpose(input0_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input1_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input2_data, (2, 3, 0, 1)))

    boxes, classes, scores = yolov5_post_process(input_data)

    img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    if boxes is not None:
        draw(img_1, boxes, scores, classes)
    # show output
    # cv2.imshow("post process result", img_1)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()

    rknn_lite.release()

Which should allow you to mess with cores and stuff, really wish maybe Rockchip would maybe enable a wiki/disscusions on these to libs and maybe users could setup a model zoo and code exchange.

I am starting to get there for converting models is :-
rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]])
As I sort of get it as its the input as int8 of the colorspace but mean_values & std_values is just not registering as look like min/max to me?

Teste with:

  • export RKNN_INTERNAL_MEM_TYPE=sram#256
  • export RKNN_SEPARATE_WEIGHT_MEM=1
  • export RKNN_WEIGHT_MEM_TYPE=sram#128

i don’t have any of these enabled. Must be some other kernel config.

I mean:
CONFIG_ROCKCHIP_RGA and CONFIG_ROCKCHIP_RGA2 . CONFIG_VIDEO_ROCKCHIP_RGA
only CONFIG_ROCKCHIP_MULTI_RGA is enabled.

CONFIG_ROCKCHIP_MULTI_RGA=y

@nyanmisaka, can you try it and see what you get?