同一个系统内存卡运行NPU Run Test,一块板能够出结果,另一块报错

两块rock 3a,都是V1.31版本,但是不同时间买的,买的早的那块丝印是白色,最近买的那块丝印是黑色。
内存卡内烧录镜像“rock-3a-debian-buster-xfce4-arm64-20220701-0158-gpt.img”,
然后按照https://wiki.radxa.com/Rock3/dev/npu-run-test,更新内核等。

把这张卡插入白色丝印的板子,运行NPU Run Test,能够出结果;插入黑色丝印的板子,运行报错如下:

Loading model …
rknn_init …
input tensors:
index=0 name=Preprocessor/sub:0 n_dims=4 dims=[1 300 300 3] n_elems=270000 size=270000 fmt=0 type=3 qnt_type=2 fl=0 zp=0 scale=0.007812
output tensors:
index=0 name=concat:0 n_dims=4 dims=[1 1917 1 4] n_elems=7668 size=30672 fmt=0 type=0 qnt_type=2 fl=0 zp=53 scale=0.089455
index=1 name=concat_1:0 n_dims=4 dims=[1 1917 91 1] n_elems=174447 size=697788 fmt=0 type=0 qnt_type=2 fl=0 zp=53 scale=0.143593
rknn_run
[ 233.789697] RKNPU: soft reset
E RKNN: [07:14:22.962] failed to submit!, op id: 1, op name: Conv:FeatureExtractor/InceptionV2/InceptionV2/Conv2d_1a_7x7/separable_conv2d/depthwise, flags: 0x5, task st
art: 0, task number: 15, run task counter: 0, int status: 0
rknn_run fail! ret=-1

$ uname -a
Linux rock-3a 4.19.193-48-rockchip-g04e835f38660 #rockchip SMP Thu Jul 14 12:15:37 UTC 2022 aarch64 GNU/Linux

启动日志中关于npu的打印:

$ dmesg|grep npu
[ 0.000000] OF: reserved mem: initialized node rknpu, compatible id shared-dma-pool
[ 0.076845] platform fde40000.npu: assigned reserved memory node rknpu
[ 0.144113] rockchip-pm-domain fdd90000.power-management:power-controller: Looking up pd_npu-supply from device tree
[ 0.144142] rockchip-pm-domain fdd90000.power-management:power-controller: Looking up pd_npu-supply property in node /power-management@fdd90000/power-controller fail
ed
[ 1.093520] rk_gmac-dwmac fe010000.ethernet: clock input or output? (output).
[ 1.640500] vdd_npu: 500 <–> 1350 mV at 900 mV
[ 1.640755] vdd_npu: supplied by vcc3v3_sys
[ 1.961430] input: rk805 pwrkey as /devices/platform/fdd40000.i2c/i2c-0/0-0020/rk805-pwrkey/input/input0
[ 2.078887] rkisp_hw fdff0000.rkisp: max input:0x0@0fps
[ 2.813842] rockchip,bus bus-npu: Looking up bus-supply from device tree
[ 2.813939] rockchip,bus bus-npu: Linked as a consumer to regulator.12
[ 2.820136] rockchip,bus bus-npu: Failed to get leakage
[ 2.826349] rockchip,bus bus-npu: pvtm = 84857, form pvtm_value
[ 2.832538] rockchip,bus bus-npu: pvtm-volt-sel=1
[ 2.838894] rockchip,bus bus-npu: avs=0
[ 3.439456] input: rk-headset as /devices/platform/rk-headset/input/input1
[ 3.450408] iommu: Adding device fde40000.npu to group 0
[ 3.459581] RKNPU fde40000.npu: Linked as a consumer to fde4b000.iommu
[ 3.469968] RKNPU fde40000.npu: RKNPU: rknpu iommu is enabled, using iommu mode
[ 3.477906] RKNPU fde40000.npu: Looking up rknpu-supply from device tree
[ 3.478099] RKNPU fde40000.npu: Linked as a consumer to regulator.15
[ 3.485009] RKNPU fde40000.npu: can’t request region for resource [mem 0xfde40000-0xfde4ffff]
[ 3.492465] [drm] Initialized rknpu 0.4.2 20210701 for fde40000.npu on minor 1
[ 3.499208] RKNPU fde40000.npu: leakage=4
[ 3.506490] RKNPU fde40000.npu: avs=0
[ 3.513944] RKNPU fde40000.npu: l=0 h=2147483647 hyst=5000 l_limit=0 h_limit=0 h_table=0

白色丝印板是2G内存,黑色丝印板是8G内存,通过修改设备树内容解决了问题,主要是删除
rknpu_reserved的内容,修改如下:

     diff --git a/arch/arm64/boot/dts/rockchip/rk3568-rock-3a.dts b/arch/arm64/boot/dts/rockchip/rk3568-rock-3a.dts
index e25c16e2c8f5..073f6f7915c8 100644
--- a/arch/arm64/boot/dts/rockchip/rk3568-rock-3a.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3568-rock-3a.dts
@@ -99,20 +99,6 @@
	        };
	};
 
-       reserved-memory {
-               #address-cells = <2>;
-               #size-cells = <2>;
-               ranges;
-
-               rknpu_reserved: rknpu {
-                       compatible = "shared-dma-pool";
-                       inactive;
-                       reusable;
-                       size = <0x0 0x20000000>;
-                       alignment = <0x0 0x1000>;
-               };
-       };
-
	rk809_sound: rk809-sound {
	        status = "okay";
	        compatible = "simple-audio-card";
@@ -721,7 +707,6 @@
 };
 
 &rknpu {
-       memory-region = <&rknpu_reserved>;
	rknpu-supply = <&vdd_npu>;
	status = "okay";
 };

这也是临时方案,根本问题感觉还是跟 rknpu_reserved的设置有关。
你测试的rock 3a黑色丝印版本,也包括8G内存的版本吗?

需要更新一下 librknnrt.so 版本到 1.3.0。

同时,我更新了 NPU 测试包和操作文档。你可以测试一下。https://wiki.radxa.com/Rock3/dev/npu-run-test

已经测试通过,1.3.0版本能在两种板子上正常使用。
rknpu_reserved仍保留,不需要修改设备树内容了。

而且对于model/road.bmp,新版本有更好的检测结果。
:+1:

with the reserved memory you can adjust the memory size it reserves for it

with a good dts configuration you can use 1 of the 2 options ,

or disable reserved memory for npu and enable rknpu_mmu

or disable rknpu_mmu and enable reserved memory npu

这个问题竟然是跟npu开发的环境相关的,不是linux内核啥的?