Updated FFMpeg with mpp

Testing jellyfish-200-mbps-8k-uhd-hevc-10bit.mkv:

Input #0, matroska,webm, from '/apps/videos_rknn/jellyfish-200-mbps-8k-uhd-hevc-10bit.mkv':
  Metadata:
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: iso4hvc1iso6
    MAJOR_BRAND     : iso4
    ENCODER         : Lavf60.3.100
  Duration: 00:00:30.03, start: 0.000000, bitrate: 200189 kb/s
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, progressive), 7680x4320 [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 1k tbn (default)
    Metadata:
      HANDLER_NAME    : hevc@GPAC0.5.2-DEV-rev565-g71748d7-ab-suite
      ENCODER         : Lavc60.3.100 hevc_nvenc
      DURATION        : 00:00:30.030000000
[hevc_rkmpp_decoder @ 0x7f74007d00] Using partial libyuv soft conversion for yuv420p10le (7680x4320)
[hevc_rkmpp_decoder @ 0x7f74007d00] Pixfmt (yuv420p10le), Conversion (nv15[FBC]->p010le->yuv420p10le[LIBYUV])
rga_api version 1.8.1_[5] 0 aq=    0KB vq=15076KB sq=    0B f=0/0   
 125.95 M-V:  0.112 fd= 395 aq=    0KB vq=    0KB sq=    0B f=0/0   

Is this expected?

yes, to use p010 format you need this fix in the kernel, and this fix in rgamulti

i would suggest to use P010 instead of yuv420p10, p010 (FFMPEG_RKMPP_PIXFMT=p010 env variable) is much faster.

which librga sources do you use?

Ah yeah rgamulti also needs to be fixed with a patch. I edited the previous comment with the links for completeness.

Now it’s fine.

FFMPEG_RKMPP_PIXFMT=p010  DISPLAY=:0.0 ./ffplay /apps/videos_rknn/jellyfish-200-mbps-8k-uhd-hevc-10bit.mkv 
ffplay version 2bcff74 Copyright (c) 2003-2023 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04)
  configuration: --prefix=/usr --disable-libopenh264 --disable-vaapi --disable-vdpau --disable-decoder=h264_v4l2m2m --disable-decoder=vp8_v4l2m2m --disable-decoder=mpeg2_v4l2m2m --disable-decoder=mpeg4_v4l2m2m --disable-libxvid --disable-libx264 --disable-libx265 --enable-rkmpp --enable-nonfree --enable-gpl --enable-version3 --enable-libmp3lame --enable-libpulse --enable-libv4l2 --enable-libdrm --enable-libxml2 --enable-librtmp --enable-libfreetype --enable-openssl --enable-opengl --enable-libopus --enable-libvorbis --disable-shared --enable-decoder='aac,ac3,flac' --extra-cflags=-I/usr/src/linux-headers-5.10.110-rk3588-v4l2/include --disable-cuvid
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
arm_release_ver: g13p0-01eac0, rk_so_ver: 9
Input #0, matroska,webm, from '/apps/videos_rknn/jellyfish-200-mbps-8k-uhd-hevc-10bit.mkv':
  Metadata:
    MINOR_VERSION   : 1
    COMPATIBLE_BRANDS: iso4hvc1iso6
    MAJOR_BRAND     : iso4
    ENCODER         : Lavf60.3.100
  Duration: 00:00:30.03, start: 0.000000, bitrate: 200189 kb/s
  Stream #0:0: Video: hevc (Main 10), yuv420p10le(tv, progressive), 7680x4320 [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 1k tbn (default)
    Metadata:
      HANDLER_NAME    : hevc@GPAC0.5.2-DEV-rev565-g71748d7-ab-suite
      ENCODER         : Lavc60.3.100 hevc_nvenc
      DURATION        : 00:00:30.030000000
[hevc_rkmpp_decoder @ 0x7f5c007d00] Pixfmt (p010le), Conversion (nv15[FBC]->p010le)
rga_api version 1.9.3_[2] 0 aq=    0KB vq=15891KB sq=    0B f=0/0   

Can you share your 8k@60fps HDR file so i can see how smooth this is on ffplay?

dont use ffplay if you want to test HDR files, because ffplay forces output to yuv420p and can not handle 10bit output. You can test sdr files though.

To test HDR, use something like mpv or kodi.

Some 8k files (hdr & sdr):
https://drive.google.com/drive/folders/1TSdV36G_npDtjRJze54GEYpdxpBt7nCK
Jellyfish 8k files (hdr):
https://drive.google.com/drive/folders/1rKpr5soXvwx65v3fEDCOsXwmiWERz6xH?usp=sharing

mpv also requires below fixes to play drm_prime properly. I mainlined NV16 part, but p010 should be missing in the mainline mpv.

From 504b2de386bffc0746c243b5f937fb0c5b7c4447 Mon Sep 17 00:00:00 2001
From: hbiyik <boogiepop@gmx.com>
Date: Thu, 28 Sep 2023 22:25:25 +0200
Subject: [PATCH 1/2] hwdec_drmprime: add nv16 support

NV16 is the half subsampled version of NV12 format. Decoders which
support High 4:2:2 of h264 provide the frame in NV16 format to establish
richer colorspace. Similar profiles are also available in HEVC and other
popular codecs. This commit allows NV16 frames to be displayed over
drmprime layers.

Signed-off-by: hbiyik <boogiepop@gmx.com>
---
 video/out/hwdec/dmabuf_interop_gl.c | 1 +
 video/out/hwdec/hwdec_drmprime.c    | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/video/out/hwdec/dmabuf_interop_gl.c b/video/out/hwdec/dmabuf_interop_gl.c
index bd33474289..e7fb1031a0 100644
--- a/video/out/hwdec/dmabuf_interop_gl.c
+++ b/video/out/hwdec/dmabuf_interop_gl.c
@@ -176,6 +176,7 @@ static bool vaapi_gl_map(struct ra_hwdec_mapper *mapper,
         if (p_mapper->desc.layers[i].nb_planes > 1) {
             switch (p_mapper->desc.layers[i].format) {
             case DRM_FORMAT_NV12:
+            case DRM_FORMAT_NV16:
                 format[0] = DRM_FORMAT_R8;
                 format[1] = DRM_FORMAT_GR88;
                 break;
diff --git a/video/out/hwdec/hwdec_drmprime.c b/video/out/hwdec/hwdec_drmprime.c
index 5051207413..290f11c535 100644
--- a/video/out/hwdec/hwdec_drmprime.c
+++ b/video/out/hwdec/hwdec_drmprime.c
@@ -29,6 +29,7 @@
 
 #include "libmpv/render_gl.h"
 #include "options/m_config.h"
+#include "video/fmt-conversion.h"
 #include "video/out/drm_common.h"
 #include "video/out/gpu/hwdec.h"
 #include "video/out/hwdec/dmabuf_interop.h"
@@ -117,6 +118,7 @@ static int init(struct ra_hwdec *hw)
     int num_formats = 0;
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_NV12);
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_420P);
+    MP_TARRAY_APPEND(p, p->formats, num_formats, pixfmt2imgfmt(AV_PIX_FMT_NV16));
     MP_TARRAY_APPEND(p, p->formats, num_formats, 0); // terminate it
 
     p->hwctx.hw_imgfmt = IMGFMT_DRMPRIME;
-- 
2.42.0


From 76a34a8f31fd69094adef09c4b2146da7a842a10 Mon Sep 17 00:00:00 2001
From: boogie <boogiepop@gmx.com>
Date: Wed, 1 Nov 2023 20:35:19 +0100
Subject: [PATCH 2/2] hwdec_drmprime: add p010 support

Removes the limitation that P010 DRMPrime Avframes were filtered out

Signed-off-by: hbiyik <boogiepop@gmx.com>
---
 video/out/hwdec/hwdec_drmprime.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/video/out/hwdec/hwdec_drmprime.c b/video/out/hwdec/hwdec_drmprime.c
index 290f11c535..9c63ab49ff 100644
--- a/video/out/hwdec/hwdec_drmprime.c
+++ b/video/out/hwdec/hwdec_drmprime.c
@@ -119,6 +119,7 @@ static int init(struct ra_hwdec *hw)
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_NV12);
     MP_TARRAY_APPEND(p, p->formats, num_formats, IMGFMT_420P);
     MP_TARRAY_APPEND(p, p->formats, num_formats, pixfmt2imgfmt(AV_PIX_FMT_NV16));
+    MP_TARRAY_APPEND(p, p->formats, num_formats, pixfmt2imgfmt(AV_PIX_FMT_P010));
     MP_TARRAY_APPEND(p, p->formats, num_formats, 0); // terminate it
 
     p->hwctx.hw_imgfmt = IMGFMT_DRMPRIME;
-- 
2.42.0

Isn’t this hdr? the colors will be wrong in ffplay?

yes, 10bit=hdr, colors will be downsampled to 8bit on cpu with ffplay. Unless you dont have an HDR display you wont see a difference but the problem is ffplay does this conversion purely on cpu, it kills the performance.

best you can get out of ffplay is

FFMPEG_RKMPP_PIXFMT=yuv420p ffplay pathtohdrfile.mp4

This will convert the HDR to SDR using RGA+libyuv so you wont waste performance on CPU.

FFplay simply can not HDR.

got it. 450% cpu usage.

in which case? ffplay’s internal conversion?

In fact, 10-bit video is not necessarily HDR. Many BDrip videos are encoded in 10-bit SDR to avoid color banding and improve quality. The HDR10 (PQ EOTF/smpte2084) specification requires 10-bit.

2 Likes

Yes.

FFMPEG_RKMPP_PIXFMT=yuv420p ffplay pathtohdrfile.mp4

is about 110% CPU usage.

correct, correct terminology is 10bit, but to have HDR you need segments of 10bit, therefore it is used interchangeably to simplify. 10bit it is then.

makes sense, because 8k 10bit frame takes around 100mb, for 60fps, you need to process approx 5.5~6gb of data each second, this requires sepcialized hardware.

The real bottleneck here is actually ram, check the memory usage :slight_smile:

op - 11:17:33 up 44 min,  0 users,  load average: 1.46, 0.75, 0.35
Tasks: 249 total,   2 running, 247 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.6 us,  2.9 sy,  0.0 ni, 82.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15721.4 total,  11560.4 free,   2909.1 used,   1251.9 buff/cache
MiB Swap:   7860.7 total,   7860.7 free,      0.0 used.  12691.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
   2443 rock      20   0 1959936 162184 111460 R 109.9   1.0   0:18.93 ffplay   
    670 root      20   0 7376880 331052 260376 S  12.9   2.1   0:33.20 Xorg     
    193 root      rt   0       0      0      0 S   3.3   0.0   0:06.80 sugov:0  
   1242 rock      20   0 1792104  96424  73368 S   3.0   0.6   0:06.74 xfwm4    
    212 root       0 -20       0      0      0 I   1.7   0.0   0:02.26 kworker+ 
   1508 root       0 -20       0      0      0 I   1.7   0.0   0:02.10 kworker+ 
   1757 rock      20   0   18344   6132   3936 S   1.3   0.0   0:03.66 sshd     
    172 root      20   0       0      0      0 S   1.0   0.0   0:01.28 spi2     
    194 root      rt   0       0      0      0 S   1.0   0.0   0:02.27 sugov:4  
      7 root      20   0       0      0      0 I   0.3   0.0   0:01.09 kworker+ 
     11 root      20   0       0      0      0 I   0.3   0.0   0:01.40 rcu_sch+ 
     53 root      20   0       0      0      0 I   0.3   0.0   0:00.75 kworker+ 
    138 root      20   0       0      0      0 S   0.3   0.0   0:00.89 queue_w+ 
    196 root      rt   0       0      0      0 S   0.3   0.0   0:01.37 sugov:6  
    206 root     -51   0       0      0      0 S   0.3   0.0   0:00.54 irq/34-+ 
   1308 rock      20   0  313088   7056   6060 S   0.3   0.0   0:00.24 gvfs-af+ 
   1412 rock      20   0  730880  40504  31012 S   0.3   0.3   0:07.36 panel-8+

this is wrong because it does not show the mmaped memory space. correct way would be to analyze

cat /proc/$(pidofplayer)/maps

i do not know if there are tools automates this.

Don’t know if it helps to figure out how much memory is in use:

smem.txt.zip (1.8 KB)

yes for 8k 10bit it is around 2.5 ~ 3gb is necessary.

Great stuff!
I’ve just rebuilt my frigate docker yesterday with your ffmpeg ex_refactor_all branch. I use the encoder on just 1 camera so it’s using CPU as far as I can tell. Cameras are streaming 4k HEVC and those are just -c:v copied by ffmpeg. When it comes to the detection streams, these are at various resolutions and are all in HEVC and lower framerates (10 fps). Detection is done by google coral TPU and is at 3-11% cpu usage.

Rock5B is staying at around 8 load.

Just saw you pushed a new commit with “best performing option”.

encoder wont work in this branch, i have rewritten everything and encoder is on old architecture