Question: ffmpeg encoding/decoding with VPU ... V4L2M2M?

I am trying to work out how to do video encoding/decoding with ffmpeg making use the VPU.

Using Radxa’s Debian 12 image (b3) it has cix-ffmpeg installed.

$ dpkg -l | grep cix-ffmpeg
ii  cix-ffmpeg                                     1.0.0                               arm64        cix-ffmpeg package

From what I can tell the encoder and decoder are interfaced via v4l2m2m.

$ ffmpeg -hide_banner -encoders | grep v4l2
DSP API version: DSP Wrapper Build On Jan  7 2025 22:06:04 eb4a506
 V..... h263_v4l2m2m         V4L2 mem2mem H.263 encoder wrapper (codec h263)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 encoder wrapper (codec h264)
 V..... hevc_v4l2m2m         V4L2 mem2mem HEVC encoder wrapper (codec hevc)
 V..... mpeg4_v4l2m2m        V4L2 mem2mem MPEG4 encoder wrapper (codec mpeg4)
 V..... vp8_v4l2m2m          V4L2 mem2mem VP8 encoder wrapper (codec vp8)

$ ffmpeg -hide_banner -decoders | grep v4l2
DSP API version: DSP Wrapper Build On Jan  7 2025 22:06:04 eb4a506
 V..... h263_v4l2m2m         V4L2 mem2mem H.263 decoder wrapper (codec h263)
 V..... h264_v4l2m2m         V4L2 mem2mem H.264 decoder wrapper (codec h264)
 V..... hevc_v4l2m2m         V4L2 mem2mem HEVC decoder wrapper (codec hevc)
 V..... mpeg1_v4l2m2m        V4L2 mem2mem MPEG1 decoder wrapper (codec mpeg1video)
 V..... mpeg2_v4l2m2m        V4L2 mem2mem MPEG2 decoder wrapper (codec mpeg2video)
 V..... mpeg4_v4l2m2m        V4L2 mem2mem MPEG4 decoder wrapper (codec mpeg4)
 V..... vc1_v4l2m2m          V4L2 mem2mem VC1 decoder wrapper (codec vc1)
 V..... vp8_v4l2m2m          V4L2 mem2mem VP8 decoder wrapper (codec vp8)
 V..... vp9_v4l2m2m          V4L2 mem2mem VP9 decoder wrapper (codec vp9

Which I think is on /dev/video3|4

$ v4l2-ctl --list-devices
...
Linlon Video device (platform:mvx):
        /dev/video3
        /dev/video4
...

Using ffmpegs software encoder libx264 I see around 30% CPU usage for the process under “top” for the following command.

ffmpeg  -threads 2 -c:v:1 libx264 -re -stream_loop -1 -fflags +genpts -i frigate/debug/thief-house.mp4 -r 5 -vf fps=5,scale=1280:720 -threads 2 -f rawvideo -pix_fmt yuv420p out.yuv

Attempting to run the hardware accelerated encoder h264_v4l2m2m I still see 30% CPU usage in linux top, meaning is defaulted back to software/CPU processing.

ffmpeg  -threads 2 -c:v:1 h264_v4l2m2m -re -stream_loop -1 -fflags +genpts -i frigate/debug/thief-house.mp4 -r 5 -vf fps=5,scale=1280:720 -threads 2 -f rawvideo -pix_fmt yuv420p out.yuv

So the question is how do we do video hardware encoding/decoding using the VPU?

Some further debugging shows ffmpeg defaulting back to software decoding, whilst the h264_v4l2m2m decoder eventually comes to an error;

[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[h264 @ 0x74120b0] ct_type:0 pic_struct:0
[h264_v4l2m2m @ 0x7342370] output POLLERR
    Last message repeated 1 times
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[h264 @ 0x74120b0] ct_type:0 pic_struct:0
[h264_v4l2m2m @ 0x7342370] output POLLERR
    Last message repeated 1 times
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[h264 @ 0x74120b0] ct_type:0 pic_struct:0
[h264_v4l2m2m @ 0x7342370] output POLLERR
    Last message repeated 1 times
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Error submitting packet to decoder: Input/output error
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x733a830] Terminating thread with return code 0 (success)
[vist#0:0/h264 @ 0x733dc00] [dec:h264_v4l2m2m @ 0x7341f80] Decoder thread received EOF packet
[h264_v4l2m2m @ 0x7342370] output stop_decode
[h264_v4l2m2m @ 0x7342370] capture POLLERR
Assertion pkt failed at fftools/ffmpeg_dec.c:710
Aborted

I found in the O6 Debug forum thread a patch from nyanmisaka and applied that to ffmpeg v7.1.1 and built. Now the following command runs with only 3% CPU usage on the process;

ffmpeg  -loglevel debug -threads 2 -c:v h264_v4l2m2m -re -stream_loop -1 -fflags +genpts -i frigate/debug/thief-house.mp4 -r 5 -vf fps=5,scale=1280:720 -threads 2 -f rawvideo -pix_fmt yuv420p out.yuv

however it seems the hardware decoder does not support the -stream_loop so only runs once.

[vist#0:0/h264 @ 0xaaaaf2430c50] [dec:h264_v4l2m2m @ 0xaaaaf2435d00] Decoder returned EOF, resetting
[Parsed_fps_0 @ 0xffffb0001730] Read frame with in pts 388000, out pts 78
[Parsed_fps_0 @ 0xffffb0001730] Writing frame with pts 77 to pts 77
[h264 @ 0xaaaaf24ed440] ct_type:0 pic_struct:0me=00:00:15.20 bitrate=54498.4kbits/s speed=1.01x    
[vist#0:0/h264 @ 0xaaaaf2430c50] [dec:h264_v4l2m2m @ 0xaaaaf2435d00] Decoder returned EOF, finishing
[vist#0:0/h264 @ 0xaaaaf2430c50] [dec:h264_v4l2m2m @ 0xaaaaf2435d00] Terminating thread with return code 0 (success)
[Parsed_fps_0 @ 0xffffb0001730] EOF is at pts 78
[Parsed_fps_0 @ 0xffffb0001730] Dropping frame with pts 78
[Parsed_scale_1 @ 0xffffb00019a0] [framesync @ 0xffffb0001a78] Sync level 0
[out_#0:0 @ 0xffffb000f940] EOF on sink link out_#0:0:default.
[vf#0:0 @ 0xaaaaf2451c50] Filtergraph returned EOF, finishing
[vf#0:0 @ 0xaaaaf2451c50] All consumers returned EOF
[vost#0:0/rawvideo @ 0xaaaaf24515c0] Encoder thread received EOF
[vost#0:0/rawvideo @ 0xaaaaf24515c0] Terminating thread with return code 0 (success)
[Parsed_fps_0 @ 0xffffb0001730] 389 frames in, 78 frames out; 311 frames dropped, 0 frames duplicated.
[vf#0:0 @ 0xaaaaf2451c50] Terminating thread with return code 0 (success)
[out#0/rawvideo @ 0xaaaaf2450ee0] All streams finished
[out#0/rawvideo @ 0xaaaaf2450ee0] Terminating thread with return code 0 (success)
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820] Terminating thread with return code 0 (success)
[AVIOContext @ 0xaaaaf24b3ec0] Statistics: 107827200 bytes written, 0 seeks, 412 writeouts
[out#0/rawvideo @ 0xaaaaf2450ee0] Output file #0 (out.yuv):
[out#0/rawvideo @ 0xaaaaf2450ee0]   Output stream #0:0 (video): 78 frames encoded; 78 packets muxed (107827200 bytes); 
[out#0/rawvideo @ 0xaaaaf2450ee0]   Total: 78 packets (107827200 bytes) muxed
[out#0/rawvideo @ 0xaaaaf2450ee0] video:105300KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.000000%
frame=   78 fps=5.2 q=-0.0 Lsize=  105300KiB time=00:00:15.60 bitrate=55296.0kbits/s speed=1.04x    
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820] Input file #0 (/home/radxa/devel/frigate/debug/thief-house.mp4):
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820]   Input stream #0:0 (video): 391 packets read (5294607 bytes); 389 frames decoded; 0 decode errors; 
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0xaaaaf242d820]   Total: 391 packets (5294607 bytes) demuxed
[AVIOContext @ 0xaaaaf2436190] Statistics: 5305313 bytes read, 1 seeks

Also it looks like the patch only applies to decoding and nothing for hardware encoding?

The patch was from January, @nyanmisaka have you made any further developments since then?