Question: ffmpeg encoding/decoding with VPU ... V4L2M2M?

I am trying to work out how to do video encoding/decoding with ffmpeg making use the VPU.

Using Radxa’s Debian 12 image (b3) it has cix-ffmpeg installed.

$ dpkg -l | grep cix-ffmpeg
ii  cix-ffmpeg                                     1.0.0                               arm64        cix-ffmpeg package

From what I can tell the encoder and decoder are interfaced via v4l2m2m.

$ ffmpeg -hide_banner -encoders | grep v4l2
DSP API version: DSP Wrapper Build On Jan  7 2025 22:06:04 eb4a506
 V..... h263_v4l2m2m         V4L2 mem2mem H.263 encoder wrapper (codec h263)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 encoder wrapper (codec h264)
 V..... hevc_v4l2m2m         V4L2 mem2mem HEVC encoder wrapper (codec hevc)
 V..... mpeg4_v4l2m2m        V4L2 mem2mem MPEG4 encoder wrapper (codec mpeg4)
 V..... vp8_v4l2m2m          V4L2 mem2mem VP8 encoder wrapper (codec vp8)

$ ffmpeg -hide_banner -decoders | grep v4l2
DSP API version: DSP Wrapper Build On Jan  7 2025 22:06:04 eb4a506
 V..... h263_v4l2m2m         V4L2 mem2mem H.263 decoder wrapper (codec h263)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 decoder wrapper (codec h264)
 V..... hevc_v4l2m2m         V4L2 mem2mem HEVC decoder wrapper (codec hevc)
 V..... mpeg1_v4l2m2m        V4L2 mem2mem MPEG1 decoder wrapper (codec mpeg1video)
 V..... mpeg2_v4l2m2m        V4L2 mem2mem MPEG2 decoder wrapper (codec mpeg2video)
 V..... mpeg4_v4l2m2m        V4L2 mem2mem MPEG4 decoder wrapper (codec mpeg4)
 V..... vc1_v4l2m2m          V4L2 mem2mem VC1 decoder wrapper (codec vc1)
 V..... vp8_v4l2m2m          V4L2 mem2mem VP8 decoder wrapper (codec vp8)
 V..... vp9_v4l2m2m          V4L2 mem2mem VP9 decoder wrapper (codec vp9

Which I think is on /dev/video3|4

$ v4l2-ctl --list-devices
...
Linlon Video device (platform:mvx):
        /dev/video3
        /dev/video4
...

Using ffmpegs software encoder libx264 I see around 30% CPU usage for the process under “top” for the following command.

ffmpeg  -threads 2 -c:v:1 libx264 -re -stream_loop -1 -fflags +genpts -i frigate/debug/thief-house.mp4 -r 5 -vf fps=5,scale=1280:720 -threads 2 -f rawvideo -pix_fmt yuv420p out.yuv

Attempting to run the hardware accelerated encoder h264_v4l2m2m I still see 30% CPU usage in linux top, meaning is defaulted back to software/CPU processing.

ffmpeg  -threads 2 -c:v:1 h264_v4l2m2m -re -stream_loop -1 -fflags +genpts -i frigate/debug/thief-house.mp4 -r 5 -vf fps=5,scale=1280:720 -threads 2 -f rawvideo -pix_fmt yuv420p out.yuv

So the question is how do we do video hardware encoding/decoding using the VPU?

Some further debugging shows ffmpeg defaulting back to software decoding, whilst the h264_v4l2m2m decoder eventually comes to an error;

[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[h264 @ 0x74120b0] ct_type:0 pic_struct:0
[h264_v4l2m2m @ 0x7342370] output POLLERR
    Last message repeated 1 times
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[h264 @ 0x74120b0] ct_type:0 pic_struct:0
[h264_v4l2m2m @ 0x7342370] output POLLERR
    Last message repeated 1 times
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[h264 @ 0x74120b0] ct_type:0 pic_struct:0
[h264_v4l2m2m @ 0x7342370] output POLLERR
    Last message repeated 1 times
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x733a830] Terminating thread with return code 0 (success)
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Decoder thread received EOF packet
[h264_v4l2m2m @ 0x7342370] output stop_decode
[h264_v4l2m2m @ 0x7342370] capture POLLERR
Assertion pkt failed at fftools/ffmpeg_dec.c:710
Aborted

I found in the O6 Debug forum thread a patch from nyanmisaka and applied that to ffmpeg v7.1.1 and built. Now the following command runs with only 3% CPU usage on the process;

ffmpeg  -loglevel debug -threads 2 -c:v h264_v4l2m2m -re -stream_loop -1 -fflags +genpts -i frigate/debug/thief-house.mp4 -r 5 -vf fps=5,scale=1280:720 -threads 2 -f rawvideo -pix_fmt yuv420p out.yuv

however it seems the hardware decoder does not support the -stream_loop so only runs once.

[vist#0:0/h264 @ 0xaaaaf2430c50] [dec:h264_v4l2m2m @ 0xaaaaf2435d00] Decoder returned EOF, resetting
[Parsed_fps_0 @ 0xffffb0001730] Read frame with in pts 388000, out pts 78
[Parsed_fps_0 @ 0xffffb0001730] Writing frame with pts 77 to pts 77
[h264 @ 0xaaaaf24ed440] ct_type:0 pic_struct:0me=00:00:15.20 bitrate=54498.4kbits/s speed=1.01x    
[vist#0:0/h264 @ 0xaaaaf2430c50] [dec:h264_v4l2m2m @ 0xaaaaf2435d00] Decoder returned EOF, finishing
[vist#0:0/h264 @ 0xaaaaf2430c50] [dec:h264_v4l2m2m @ 0xaaaaf2435d00] Terminating thread with return code 0 (success)
[Parsed_fps_0 @ 0xffffb0001730] EOF is at pts 78
[Parsed_fps_0 @ 0xffffb0001730] Dropping frame with pts 78
[Parsed_scale_1 @ 0xffffb00019a0] [framesync @ 0xffffb0001a78] Sync level 0
[out_#0:0 @ 0xffffb000f940] EOF on sink link out_#0:0:default.
[vf#0:0 @ 0xaaaaf2451c50] Filtergraph returned EOF, finishing
[vf#0:0 @ 0xaaaaf2451c50] All consumers returned EOF
[vost#0:0/rawvideo @ 0xaaaaf24515c0] Encoder thread received EOF
[vost#0:0/rawvideo @ 0xaaaaf24515c0] Terminating thread with return code 0 (success)
[Parsed_fps_0 @ 0xffffb0001730] 389 frames in, 78 frames out; 311 frames dropped, 0 frames duplicated.
[vf#0:0 @ 0xaaaaf2451c50] Terminating thread with return code 0 (success)
[out#0/rawvideo @ 0xaaaaf2450ee0] All streams finished
[out#0/rawvideo @ 0xaaaaf2450ee0] Terminating thread with return code 0 (success)
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820] Terminating thread with return code 0 (success)
[AVIOContext @ 0xaaaaf24b3ec0] Statistics: 107827200 bytes written, 0 seeks, 412 writeouts
[out#0/rawvideo @ 0xaaaaf2450ee0] Output file #0 (out.yuv):
[out#0/rawvideo @ 0xaaaaf2450ee0]   Output stream #0:0 (video): 78 frames encoded; 78 packets muxed (107827200 bytes); 
[out#0/rawvideo @ 0xaaaaf2450ee0]   Total: 78 packets (107827200 bytes) muxed
[out#0/rawvideo @ 0xaaaaf2450ee0] video:105300KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.000000%
frame=   78 fps=5.2 q=-0.0 Lsize=  105300KiB time=00:00:15.60 bitrate=55296.0kbits/s speed=1.04x    
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820] Input file #0 (/home/radxa/devel/frigate/debug/thief-house.mp4):
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820]   Input stream #0:0 (video): 391 packets read (5294607 bytes); 389 frames decoded; 0 decode errors; 
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820]   Total: 391 packets (5294607 bytes) demuxed
[AVIOContext @ 0xaaaaf2436190] Statistics: 5305313 bytes read, 1 seeks

Also it looks like the patch only applies to decoding and nothing for hardware encoding?

The patch was from January, @nyanmisaka have you made any further developments since then?

So I found that CIX provides an ffmpeg fork here with support for V4L2 M2M AV1 hardware decoding via the VPU.

After compiling the source you can make use of hardware acceleration by passing the parameter -c:v h264_v4l2m2m.

A full command example;

ffmpeg  -loglevel debug -threads 2 -c:v h264_v4l2m2m -re -stream_loop -1 -fflags +genpts -i frigate/debug/thief-house.mp4 -r 5 -vf fps=5,scale=1280:720 -threads 2 -f rawvideo -pix_fmt yuv420p out.yuv

This reveals the hardware decoding activated in the debug output;

[h264_v4l2m2m @ 0xaaaaf02cf660] probing device /dev/video-cixdec0
[h264_v4l2m2m @ 0xaaaaf02cf660] driver 'mvx' on card 'Linlon Video device' in mplane mode
[h264_v4l2m2m @ 0xaaaaf02cf660] Using device /dev/video-cixdec0
[h264_v4l2m2m @ 0xaaaaf02cf660] driver 'mvx' on card 'Linlon Video device' in mplane mode
[h264_v4l2m2m @ 0xaaaaf02cf660] requesting formats: output=H264 capture=YM12
[h264_v4l2m2m @ 0xaaaaf02cf660] output: H264 16 buffers initialized: 1280x0720, sizeimage 00921600, bytesperline 00000000
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (h264_v4l2m2m) -> rawvideo (native))

When running the above command the ffmpeg process only uses ~3% CPU. This is sufficient to be “production ready” for use with Frigate NVR… just need to get the NPU performing decently now.

As a further note the debian b3 OS image has cix’s ffmpeg version at /usr/share/cix/bin/ffmpeg however it errors out when using. The b6 image does not package ffmpeg so for now you have to compile from source. There are prebuilt debian packages available here.

2 Likes