[FFmpeg] Introduce FFmpeg-Rockchip for hyper fast video transcoding via CLI

the old rkmpp works fine in kernel 6.1, and it use RGA, if i recall correctly you fix alignment issues in FFmpeg-encoder back at that time.

What do you mean old rkmpp? Old version of mpp?

Jeffy’s code rkmpp.c

Ahh that old one, yes but all of those fixes are also in ffmpeg-rockchip. It is no more in a comparible form with ffmpeg-rockchip, even though it works, it utilized lots lots of hacks and was basically doing everthing itself without relying ffmpeg, rga nor mpp, may be thats the reason it works so better not use that even though it works.

Maybe the hstride is wrong. It is a pity you burned your board, but i hope @nyanmisaka can reproduce it.

Btw the 2nd picture in the encoding artifacts seems like a dma sync issue, speculatively with rga

To reproduce:

~/rockchip/ffmpeg/ffmpeg-rockchip/ffmpeg -f v4l2 -input_format nv12 -framerate 30 -video_size 1920x1080 -i /dev/video11 -c:v h264_rkmpp -qp_init 22 -movflags frag_keyframe+empty_moov+faststart -b:v 4000K -vprofile main -level:v 4.2 -vf format=yuv420p -r 25 -bufsize 600k cam-1.h264 -y

and then play back the stream.

In my tests, ffmpeg received 1080p nv12 input from the following sources, and after applying the software format=yuv420p filter, it encoded fine and there is no green line on my end.

The versions of mpp and librga used are

  1. 1080p video files (mp4, mkv, raw h264, raw hevc…)
# Prepare
cd ~/
curl -OL https://repo.jellyfin.org/jellyfish/media/jellyfish-15-mbps-hd-h264.mkv

# Software Decode H.264 file (yuv420p) -> Hardware Encode
ffmpeg -i ~/jellyfish-15-mbps-hd-h264.mkv -vf format=yuv420p -c:v h264_rkmpp -qp_init 22 -y /tmp/1.mp4

# Hardware Decode H.264 file (nv12) -> RGA2 convert (yuv420p) -> Hardware Encode
ffmpeg -hwaccel rkmpp -hwaccel_output_format drm_prime -i ~/jellyfish-15-mbps-hd-h264.mkv \
-vf scale_rkrga=format=yuv420p -c:v h264_rkmpp -qp_init 22 -y /tmp/2.mp4
  1. FFmpeg’s built-in test sources at 1080p -f lavfi testsrc=s=1920x1080,format=nv12
# Raw YUV from lavfi (yuv420p) -> Hardware Encode
ffmpeg -f lavfi -i testsrc=s=1920x1080,format=yuv420p -vf format=yuv420p -c:v h264_rkmpp -qp_init 22 -y /tmp/3.mp4

# Raw YUV from lavfi (nv12) -> Software Convert (yuv420p) -> Hardware Encode
ffmpeg -f lavfi -i testsrc=s=1920x1080,format=nv12 -vf format=yuv420p -c:v h264_rkmpp -qp_init 22 -y /tmp/4.mp4
  1. 1080p Raw YUV files
# Prepare
ffmpeg -f lavfi -i testsrc=s=1920x1080,format=nv12 -t 5 -y ~/nv12.yuv
ffmpeg -f lavfi -i testsrc=s=1920x1080,format=yuv420p -t 5 -y ~/yuv420p.yuv

# Raw YUV file (yuv420p) -> Hardware Encode
ffmpeg -pix_fmt yuv420p -s 1920x1080 -i ~/yuv420p.yuv -vf format=yuv420p -c:v h264_rkmpp -qp_init 22 -y /tmp/5.mp4

# Raw YUV file (nv12) -> Software Convert (yuv420p) -> Hardware Encode
ffmpeg -pix_fmt nv12 -s 1920x1080 -i ~/nv12.yuv -vf format=yuv420p -c:v h264_rkmpp -qp_init 22 -y /tmp/6.mp4
  1. 1080p HDMI RX input
    (Can’t test as of writing this, but I tested it in both 1080p and 4k not long ago)


According to searching, /dev/video11 seems to be RK-ISP used by the camera. I don’t have the hardware to verify it, and can’t say for sure what quirks it contains.

yep, NV12 from a MIPI camera. I don’t know if 12-bit modes (MEDIA_BUS_FMT_SRGGB12_1X12) could trigger this. Note: i think 2016x1080 is 10-bit mode.

I will try to investigate a bit more.

But some feedback:

-- mpp latest commit --
Author: Yanjun Liao <yanjun.liao@rock-chips.com>
Date:   Wed Jul 24 15:47:54 2024 +0800

    fix[265e]:Fix the st refernce frame err in tsvc
    
    This is a bug caused by the mark and use of ltr frames,
    when presence of multiple short temporal refenence frame,
    may lead to errors in reference relationships.
    
    Change-Id: I1962d81e39b704086a51b4e4098ba3feb64c47c6
    Signed-off-by: Yanjun Liao <yanjun.liao@rock-chips.com>

-- rga-multi latest commit --
Author: Yu Qiaowei <cerf.yu@rock-chips.com>
Date:   Thu Aug 29 15:01:34 2024 +0800

    normal: fix wrong full_csc_clip size in memcpy
    
    This causes the gauss mode to be checked when using full_csc with driver
    versions 1.3.5 and above.
    
    update to 1.10.1_[3]
    
    Signed-off-by: Yu Qiaowei <cerf.yu@rock-chips.com>
    Change-Id: I40c3c503c68a77f2c435a8932de0195b447d1d46

-- ffmpeg-rockchip latest commit --
Author: nyanmisaka <nst799610810@gmail.com>
Date:   Wed Oct 23 21:42:03 2024 +0800

    fixup! lavf/rkrga: add RKRGA scale, vpp and overlay filter
    
    fix nv24/nv42 check on rga2p
    
    Signed-off-by: nyanmisaka <nst799610810@gmail.com>

works, but dma-buf issue:

[AVFilterGraph @ 0xaaaae71af360] No such filter: ‘scale_rkrga’

Have i missed some build parameters?

works, but dma-buf issue.

Update: the dma-buf issue is only when rendering on screen (ffplay)

In the RK 6.1 kernel, their custom dma-heap driver was removed. It was replaced with the upstream dma-heap driver, which might be the problem. The patch added in my MPP branch will use the DRM allocator instead of dma-heap by default.

As for the missing scale_rkrga filter, check if --enable-rkrga is configured in your FFmpeg.

--enable-gpl --enable-version3 --enable-libdrm --enable-rkmpp --enable-rkrga

This fixed the 1px problem, and possibly the dma-buf issue (i think). Thanks!

PSA: please always use the MPP and RGA branches I mentioned in the Wiki and the latest ffmpeg-rockchip. Those small fixes are carefully selected and verified in Jellyfin Server.

Just for the record, i need to test with your branches, I just tested a uvc webcam that is H264 and H265 with the rockchip branches, and the 1 pixel (looks more 8 pixels) is back.
In order to decode hevc (webcam) i added this:

diff --git a/libavdevice/v4l2-common.c b/libavdevice/v4l2-common.c
index 1926179..d169cba 100644
--- a/libavdevice/v4l2-common.c
+++ b/libavdevice/v4l2-common.c
@@ -57,6 +57,9 @@ const struct fmt_map ff_fmt_conversion_table[] = {
 #ifdef V4L2_PIX_FMT_H264
     { AV_PIX_FMT_NONE,    AV_CODEC_ID_H264,     V4L2_PIX_FMT_H264    },
 #endif
+#ifdef V4L2_PIX_FMT_HEVC
+    { AV_PIX_FMT_NONE,    AV_CODEC_ID_H265,     V4L2_PIX_FMT_HEVC    },
+#endif
 #ifdef V4L2_PIX_FMT_MPEG4
     { AV_PIX_FMT_NONE,    AV_CODEC_ID_MPEG4,    V4L2_PIX_FMT_MPEG4   },
 #endif

I haven’t looked at libavdevice/v4l2-common.c, except to include a patch from rigaya to add nv16/nv24 formats. Please dump a frame as rawvideo and inspect it with YUView. Either it’s a problem in the upstream FFmpeg code, or RK needs to fix their kernel driver.

... -vframes 1 -f rawvideo /path/to/1.yuv
... -vframes 1 -f rawvideo /path/to/2.rgb

FYI, cloned the latest ffmpeg-rockchip, built and ran again, get_format() is choosing NV12 (which is the wrong format, should be yuv420p).

./ffplay -f hevc -vcodec hevc_rkmpp -i ~/ffmpeg_1920x1080_100frames.h265 -loglevel debug
ffplay version 9dbaf5a Copyright (c) 2003-2023 the FFmpeg developers
  built with gcc 12 (Ubuntu 12.3.0-1ubuntu1~22.04)
  configuration: --prefix=/usr --disable-libopenh264 --disable-vaapi --disable-vdpau --disable-decoder=h264_v4l2m2m --disable-decoder=vp8_v4l2m2m --disable-decoder=mpeg2_v4l2m2m --disable-decoder=mpeg4_v4l2m2m --disable-libxvid --disable-libx264 --disable-libx265 --enable-rkmpp --enable-nonfree --enable-gpl --enable-version3 --enable-libmp3lame --enable-libpulse --enable-libv4l2 --enable-libdrm --enable-libxml2 --enable-librtmp --enable-libfreetype --enable-openssl --enable-opengl --enable-libopus --enable-libvorbis --disable-shared --enable-decoder='aac,ac3,flac' --disable-cuvid --enable-rkrga
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'.
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'.
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'.
Initialized opengles2 renderer.
[hevc @ 0xffff5c000c20] Opening '/home/rock/ffmpeg_1920x1080_100frames.h265' for reading
[file @ 0xffff5c001290] Setting default whitelist 'file,crypto,data'
[hevc @ 0xffff5c000c20] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:1
[hevc @ 0xffff5c009a10] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] Decoding VPS
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding SPS
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding VUI
[hevc @ 0xffff5c009a10] Decoding PPS
[extract_extradata @ 0xffff5c01cde0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[extract_extradata @ 0xffff5c01cde0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[extract_extradata @ 0xffff5c01cde0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[extract_extradata @ 0xffff5c01cde0] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] Decoding VPS
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding SPS
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding VUI
[hevc @ 0xffff5c009a10] Decoding PPS
[hevc @ 0xffff5c009a10] Format yuvj420p chosen by get_format().
[hevc @ 0xffff5c009a10] Output frame with POC 0.
[hevc @ 0xffff5c009a10] Decoded frame with POC 0. sq=    0B f=0/0   
[hevc @ 0xffff5c009a10] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] Decoding VPS
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding SPS
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding VUI
[hevc @ 0xffff5c009a10] Decoding PPS
[hevc @ 0xffff5c009a10] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
    Last message repeated 2 times
[hevc @ 0xffff5c009a10] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0
[hevc @ 0xffff5c009a10] Decoding VPS
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding SPSKB vq=    0KB sq=    0B f=0/0   
[hevc @ 0xffff5c009a10] Main profile bitstream
[hevc @ 0xffff5c009a10] Decoding VUI
[hevc @ 0xffff5c009a10] Decoding PPS
[hevc @ 0xffff5c009a10] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0
    Last message repeated 44 times
[hevc @ 0xffff5c000c20] All info found
[hevc @ 0xffff5c000c20] After avformat_find_stream_info() pos: 524288 bytes read:524288 seeks:0 frames:50
Input #0, hevc, from '/home/rock/ffmpeg_1920x1080_100frames.h265':
  Duration: N/A, bitrate: N/A
  Stream #0:0, 50, 1/1200000: Video: hevc (Main), 1 reference frame, yuvj420p(pc, bt709, left), 1920x1080 (1920x1088), 0/1, 25 fps, 25 tbr, 1200k tbn
[hevc_mp4toannexb @ 0xffff5c01cde0] The input looks like it is Annex B already
[hevc_rkmpp @ 0xffff5c307780] Format nv12 chosen by get_format().
[hevc_rkmpp @ 0xffff5c307780] Created a RKMPP hardware device
[hevc_rkmpp @ 0xffff5c307780] Decoder flushing
[hevc_rkmpp @ 0xffff5c307780] Wrote 14425 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Wrote 16502 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Wrote 5682 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Noticed an info change
[hevc_rkmpp @ 0xffff5c307780] Format nv12 chosen by get_format().
[hevc_rkmpp @ 0xffff5c307780] Decoder options: deint=true afbc=0 fast_parse=true buf_mode=0
[hevc_rkmpp @ 0xffff5c307780] Configured with size: 1920x1080 | pix_fmt: nv12 | sw_pix_fmt: nv12
[hevc_rkmpp @ 0xffff5c307780] Wrote 14920 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Received a frame4KB sq=    0B f=0/0   
Video frame changed from size:0x0 format:none serial:-1 to size:1920x1080 format:nv12 serial:1
detected 8 logical cores
[ffplay_buffer @ 0xffff540016b0] Setting 'video_size' to value '1920x1080'
[ffplay_buffer @ 0xffff540016b0] Setting 'pix_fmt' to value '23'
[ffplay_buffer @ 0xffff540016b0] Setting 'time_base' to value '1/1200000'
[ffplay_buffer @ 0xffff540016b0] Setting 'pixel_aspect' to value '0/1'
[ffplay_buffer @ 0xffff540016b0] Setting 'frame_rate' to value '25/1'
[ffplay_buffer @ 0xffff540016b0] w:1920 h:1080 pixfmt:nv12 tb:1/1200000 fr:25/1 sar:0/1
[auto_scale_0 @ 0xffff540020f0] w:iw h:ih flags:'' interl:0
[ffplay_buffersink @ 0xffff54001aa0] auto-inserting filter 'auto_scale_0' between the filter 'ffplay_buffer' and the filter 'ffplay_buffersink'
[AVFilterGraph @ 0xffff54003860] query_formats: 2 queried, 0 merged, 1 already done, 0 delayed
[auto_scale_0 @ 0xffff540020f0] picking yuv420p out of 5 ref:nv12 alpha:0
[auto_scale_0 @ 0xffff540020f0] w:1920 h:1080 fmt:nv12 sar:0/1 -> w:1920 h:1080 fmt:yuv420p sar:0/1 flags:0x00000004
[auto_scale_0 @ 0xffff540020f0] w:1920 h:1080 fmt:nv12 sar:0/1 -> w:1920 h:1080 fmt:yuv420p sar:0/1 flags:0x00000004
    Last message repeated 2 times
[hevc_rkmpp @ 0xffff5c307780] Wrote 48590 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Received a frame
Created 1920x1080 texture with SDL_PIXELFORMAT_IYUV.
[hevc_rkmpp @ 0xffff5c307780] Received a frame
    Last message repeated 2 times
[hevc_rkmpp @ 0xffff5c307780] Wrote 7279 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Wrote 7487 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Wrote 6723 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Received a frame
[hevc_rkmpp @ 0xffff5c307780] Wrote 6448 bytes to decoder
[hevc_rkmpp @ 0xffff5c307780] Received a frame
    Last message repeated 2 times

ffmpeg_1920x1080_100frames.h265.zip (1.0 MB)

This is expected behavior. The hardware decoders are designed to output semi-planar formats, namely NV12. YUV420P is only used by the software decoder.

ffmpeg-rockchip no longer glues the RGA filter and MPP decoder together, because this does not conform to FFmpeg conventions. You have to use the scale_rkrga filter in your code.

I think i get it, but how do I force auto_scale to pick NV12?

[auto_scale_0 @ 0xffff640022a0] w:1920 h:1080 fmt:nv12 sar:0/1 -> w:1920 h:1080 fmt:yuv420p sar:0/1 flags:0x00000004

-vf scale_rkrga=format=nv12

I tried to force nv12 but i got an error:

ffplay_buffer @ 0xffff68001860] Setting 'video_size' to value '1920x1080'
[ffplay_buffer @ 0xffff68001860] Setting 'pix_fmt' to value '23'
[ffplay_buffer @ 0xffff68001860] Setting 'time_base' to value '1/1200000'
[ffplay_buffer @ 0xffff68001860] Setting 'pixel_aspect' to value '0/1'
[ffplay_buffer @ 0xffff68001860] Setting 'frame_rate' to value '25/1'
[ffplay_buffer @ 0xffff68001860] w:1920 h:1080 pixfmt:nv12 tb:1/1200000 fr:25/1 sar:0/1
[AVFilterGraph @ 0xffff680039c0] Setting 'format' to value 'nv12'
[auto_scale_0 @ 0xffff68002770] w:iw h:ih flags:'' interl:0
[Parsed_scale_rkrga_0 @ 0xffff68001dc0] auto-inserting filter 'auto_scale_0' between the filter 'ffplay_buffer' and the filter 'Parsed_scale_rkrga_0'
Impossible to convert between the formats supported by the filter 'ffplay_buffer' and the filter 'auto_scale_0'

ffplay does not support hardware filters. You can use MPV player instead.

MPV is not a choice in my case. Gstreamer and ffplay 4.4.2 decode and render it fine (hw decoder).
I suppressed the filter in ffplay and rendered the NV12 frame as a texture; it works, but the green bar still exists. I need to investigate the NV12 frame format. I will dig into it. Thanks for your information and time.

The green bar means that the actual pitch/stride or offset of the decoder output image does not match the one you calculated. You have to fix it in your code.