ROCK 5B Debug Party Invitation

stuartiannaylor · November 20, 2022, 11:33pm

Nope its the older one apols as never noticed looks like we have the one example on the new one.
I haven’t got round to even running rather than using the toolkit to import and export as rknn-toolkit2

Still though the yolov5 @640x480 fps was quite impressive for a single core.

I keep meaning to find a model to run on cpu/gpu/npu to get a like for like but working out the frameworks is quite a task but really seems to be find a model that will fully convert to int8 which CNN models tend to do.
When you get recurrent models with LSTM or GRU layers things start to say no.

icecream95 · November 21, 2022, 3:57am

I don’t think I’ve made any changes to the kernel that would affect this, but I am using CONFIG_ROCKCHIP_MULTI_RGA rather than RGA2, which builds the module rga3.

I guess I could detect the kernel driver in use and switch between job struct definitions…

nyanmisaka · November 21, 2022, 7:12am

There are two RGA configs CONFIG_VIDEO_ROCKCHIP_RGA for rga2 and CONFIG_ROCKCHIP_MULTI_RGA for rga3. Should we disable the legacy one?

icecream95 · November 21, 2022, 7:24am

It doesn’t matter if you have the original rga or rga2 enabled, as long as they aren’t loaded… but I think it’s safe to disable CONFIG_ROCKCHIP_RGA and CONFIG_ROCKCHIP_RGA2. CONFIG_VIDEO_ROCKCHIP_RGA is an upstream driver that does not seem to support RK3588, so can also be disabled.

stuartiannaylor · November 21, 2022, 3:40pm

Still don’t understand why we have a x86 wheel for the conversion wheel and that is where things went wrong.
As if you run on x86 and export the yolov5s.rknn then drag that to your Rock5b just put in same folder.
But on that we have installed the rknn_lite module then you should be able to run

import os
import urllib
import traceback
import time
import sys
import numpy as np
import cv2
from rknnlite.api import RKNNLite

ONNX_MODEL = 'yolov5s.onnx'
RKNN_MODEL = 'yolov5s.rknn'
IMG_PATH = './bus.jpg'
DATASET = './dataset.txt'

QUANTIZE_ON = True

OBJ_THRESH = 0.25
NMS_THRESH = 0.45
IMG_SIZE = 640

CLASSES = ("person", "bicycle", "car", "motorbike ", "aeroplane ", "bus ", "train", "truck ", "boat", "traffic light",
           "fire hydrant", "stop sign ", "parking meter", "bench", "bird", "cat", "dog ", "horse ", "sheep", "cow", "elephant",
           "bear", "zebra ", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite",
           "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife ",
           "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza ", "donut", "cake", "chair", "sofa",
           "pottedplant", "bed", "diningtable", "toilet ", "tvmonitor", "laptop	", "mouse	", "remote ", "keyboard ", "cell phone", "microwave ",
           "oven ", "toaster", "sink", "refrigerator ", "book", "clock", "vase", "scissors ", "teddy bear ", "hair drier", "toothbrush ")


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def xywh2xyxy(x):
    # Convert [x, y, w, h] to [x1, y1, x2, y2]
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y


def process(input, mask, anchors):

    anchors = [anchors[i] for i in mask]
    grid_h, grid_w = map(int, input.shape[0:2])

    box_confidence = sigmoid(input[..., 4])
    box_confidence = np.expand_dims(box_confidence, axis=-1)

    box_class_probs = sigmoid(input[..., 5:])

    box_xy = sigmoid(input[..., :2])*2 - 0.5

    col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)
    row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)
    col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    grid = np.concatenate((col, row), axis=-1)
    box_xy += grid
    box_xy *= int(IMG_SIZE/grid_h)

    box_wh = pow(sigmoid(input[..., 2:4])*2, 2)
    box_wh = box_wh * anchors

    box = np.concatenate((box_xy, box_wh), axis=-1)

    return box, box_confidence, box_class_probs


def filter_boxes(boxes, box_confidences, box_class_probs):
    """Filter boxes with box threshold. It's a bit different with origin yolov5 post process!

    # Arguments
        boxes: ndarray, boxes of objects.
        box_confidences: ndarray, confidences of objects.
        box_class_probs: ndarray, class_probs of objects.

    # Returns
        boxes: ndarray, filtered boxes.
        classes: ndarray, classes for boxes.
        scores: ndarray, scores for boxes.
    """
    boxes = boxes.reshape(-1, 4)
    box_confidences = box_confidences.reshape(-1)
    box_class_probs = box_class_probs.reshape(-1, box_class_probs.shape[-1])

    _box_pos = np.where(box_confidences >= OBJ_THRESH)
    boxes = boxes[_box_pos]
    box_confidences = box_confidences[_box_pos]
    box_class_probs = box_class_probs[_box_pos]

    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)
    _class_pos = np.where(class_max_score >= OBJ_THRESH)

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    scores = (class_max_score* box_confidences)[_class_pos]

    return boxes, classes, scores


def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2] - boxes[:, 0]
    h = boxes[:, 3] - boxes[:, 1]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def yolov5_post_process(input_data):
    masks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
               [59, 119], [116, 90], [156, 198], [373, 326]]

    boxes, classes, scores = [], [], []
    for input, mask in zip(input_data, masks):
        b, c, s = process(input, mask, anchors)
        b, c, s = filter_boxes(b, c, s)
        boxes.append(b)
        classes.append(c)
        scores.append(s)

    boxes = np.concatenate(boxes)
    boxes = xywh2xyxy(boxes)
    classes = np.concatenate(classes)
    scores = np.concatenate(scores)

    nboxes, nclasses, nscores = [], [], []
    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]

        keep = nms_boxes(b, s)

        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


def draw(image, boxes, scores, classes):
    """Draw the boxes on the image.

    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
        top = int(top)
        left = int(left)
        right = int(right)
        bottom = int(bottom)

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)


def letterbox(im, new_shape=(640, 640), color=(0, 0, 0)):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)


if __name__ == '__main__':

    # Create RKNN object

    rknn_lite = RKNNLite()

    # load RKNN model
    print('--> Load RKNN model')
    ret = rknn_lite.load_rknn(RKNN_MODEL)
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')
    ret = rknn_lite.init_runtime(core_mask=RKNNLite.NPU_CORE_0)
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)    
    # Set inputs
    img = cv2.imread(IMG_PATH)
    # img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

    # Inference
    print('--> Running model')
    outputs = rknn_lite.inference(inputs=[img])
    
    # post process
    input0_data = outputs[0]
    input1_data = outputs[1]
    input2_data = outputs[2]

    input0_data = input0_data.reshape([3, -1]+list(input0_data.shape[-2:]))
    input1_data = input1_data.reshape([3, -1]+list(input1_data.shape[-2:]))
    input2_data = input2_data.reshape([3, -1]+list(input2_data.shape[-2:]))

    input_data = list()
    input_data.append(np.transpose(input0_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input1_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input2_data, (2, 3, 0, 1)))

    boxes, classes, scores = yolov5_post_process(input_data)

    img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    if boxes is not None:
        draw(img_1, boxes, scores, classes)
    # show output
    # cv2.imshow("post process result", img_1)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()

    rknn_lite.release()

Which should allow you to mess with cores and stuff, really wish maybe Rockchip would maybe enable a wiki/disscusions on these to libs and maybe users could setup a model zoo and code exchange.

I am starting to get there for converting models is :-
rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]])
As I sort of get it as its the input as int8 of the colorspace but mean_values & std_values is just not registering as look like min/max to me?

avaf · November 21, 2022, 6:49pm

Teste with:

export RKNN_INTERNAL_MEM_TYPE=sram#256
export RKNN_SEPARATE_WEIGHT_MEM=1
export RKNN_WEIGHT_MEM_TYPE=sram#128

avaf · November 21, 2022, 7:05pm

i don’t have any of these enabled. Must be some other kernel config.

I mean:
CONFIG_ROCKCHIP_RGA and CONFIG_ROCKCHIP_RGA2 . CONFIG_VIDEO_ROCKCHIP_RGA
only CONFIG_ROCKCHIP_MULTI_RGA is enabled.

CONFIG_ROCKCHIP_MULTI_RGA=y

@nyanmisaka, can you try it and see what you get?

nyanmisaka · November 22, 2022, 4:21pm

Tried on armbian jammy with icecream95’s patchset. Both hevc_rkmpp(8&10b) and h264_rkmpp works fine for me. I don’t connect rock5b to a monitor, so I use ffmpeg transcode to verify this.

RGA performs a pixel format conversion (nv12|na12->yuv420p) and copy back it to memory.

rock@rock-5b:~/workspace/ffmpeg$ sudo ./ffmpeg -init_hw_device drm=dr:/dev/dri/renderD128 -c:v h264_rkmpp -i ~/Videos/jellyfish-10-mbps-hd-h264.mkv -an -sn -c:v libx264 -preset ultrafast -b:v 6M -maxrate 6M -y /tmp/1.mp4
ffmpeg version n4.4.2-1-g289a344 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.3.0-1ubuntu1~22.04)
  configuration: --arch=arm64 --toolchain=hardened --libdir=/usr/lib/aarch64-linux-gnu --incdir=/usr/include/aarch64-linux-gnu --prefix=/opt/ffbuild --enable-gpl --enable-version3 --enable-libdrm --enable-rkmpp --enable-libx264
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, matroska,webm, from '/home/rock/Videos/jellyfish-10-mbps-hd-h264.mkv':
  Metadata:
    encoder         : libebml v1.2.0 + libmatroska v1.1.0
    creation_time   : 2016-02-06T04:00:51.000000Z
  Duration: 00:00:30.03, start: 0.000000, bitrate: 9955 kb/s
  Stream #0:0(eng): Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 1k tbn, 59.94 tbc (default)
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (h264_rkmpp) -> h264 (libx264))
Press [q] to stop, [?] for help
[h264_rkmpp @ 0x55957e8520] Decoder noticed an info change (1920x1080), format=0
[libx264 @ 0x559587dae0] VBV maxrate specified, but no bufsize, ignored
[libx264 @ 0x559587dae0] using SAR=1/1
[libx264 @ 0x559587dae0] using cpu capabilities: ARMv8 NEON
[libx264 @ 0x559587dae0] profile Constrained Baseline, level 4.0, 4:2:0, 8-bit
[libx264 @ 0x559587dae0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=0 ref=1 deblock=0:0:0 analyse=0:0 me=dia subme=0 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=0 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=0 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=0 intra_refresh=0 rc=abr mbtree=0 bitrate=6000 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=0
Output #0, mp4, to '/tmp/1.mp4':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0(eng): Video: h264 (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 6000 kb/s, 29.97 fps, 19001 tbn (default)
    Metadata:
      encoder         : Lavc58.134.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 6000000/0/6000000 buffer size: 0 vbv_delay: N/A
frame=  899 fps=121 q=-1.0 Lsize=   20764kB time=00:00:29.99 bitrate=5670.7kbits/s dup=0 drop=1 speed=4.05x

avaf · November 22, 2022, 5:24pm

Ahh, I tested it on kernel 5.10.66 and it worked. Kernel 5.10.110 does not. I will try it with linux-5.10-gen-rkr3.4 which has an updated mpp driver. Thanks.

avaf · November 22, 2022, 9:56pm

@icecream95

confirmed. Your fix does not work for kernel 5.10.110 and 5.10.110-gen-rkr3.4.

Any workaround? Or v3 is the way?

Bruno · November 27, 2022, 8:47pm

Booted up Radxa Ubuntu server image today.
Installed Ubuntu desktop and once in it asked me if i wanted to upgrade to 22.04.

I can confirm upgrading to 22.04 f***ed things up.

So it’s a bug

nicedude · November 28, 2022, 6:57am

Quick question, is rkmpp hardware decoding supposed to be working out of the box on the debian bullseye radxa distribution ? If not has anyone achieved it with a custom kernel/some custom package ?

stuartiannaylor · November 28, 2022, 1:38pm

Essentially the Rockchip BSP is a custom kernel with some custom packages where Devs here are trying various options and if you read you will find info above.
Also in the discord server devs and community are sharing images where much has been added. The Radxa images are more conservative with bleeding edge additions, but prob best to chat and ask in there.

nicedude · November 28, 2022, 2:34pm

Thanks for the Discord link ! I think I’ll switch to armbian at the moment, if it can play smoothly 4k 60p videos, I’ll be fine with that.

stake · December 20, 2022, 12:37pm

USA based dev here; late to the party.

where can I get a rock5b >= 16GB RAM at this point? ameridroid best bet?

stuartiannaylor · December 21, 2022, 2:29am

https://wiki.radxa.com/Buy

But yeah Ameridroid or go direct to allnet

dominik · December 21, 2022, 8:19am

If You are in USA then ameridroid seeme to be better option but both are ok. I ordered this month from both of them and i was about 3 working days for ameridroid to ship to Nevada and about 5 for allnet to ship to Europe.

Etienne · December 21, 2022, 2:36pm

any chance you can send another discord invite, this one is not valid anymore it seems

stuartiannaylor · December 24, 2022, 10:21am

apols haven’t been around

Konstantin_Lebedev · December 30, 2022, 9:36am

Hello, can you give a thesis step by step instructions, how did you achieve the result?