[FFmpeg] Introduce FFmpeg-Rockchip for hyper fast video transcoding via CLI

boogiepop · January 1, 2024, 11:07pm

Thanks for such an effort, i am planning to merge decoder focused parts in my fork to your fork so that we can have at least 1 signle proper fork cleanly. My fork does things better in terms current decoding capabilities (AFBC->DRMPrime things), but this will be better on your fork when we have a proper mesa which can handle AFBC. And your fork is definetely cleaner, mine got spagetti due to various workarounds implemented to cover up missing parts of mesa.

As latests PRs i got drm primde decoder support on mpv and kodi, still needs testing though.

phiber · January 2, 2024, 8:17am

that outputs drm_prime format

Stream #0:0: Video: hevc (Main) (hev1 / 0x31766568), drm_prime(pc, bt470bg/bt470bg/smpte170m, progressive), 1920x1080, q=2-31, 4000 kb/s, 25 fps, 12800 tbn

nyanmisaka · January 2, 2024, 10:02am

drm_prime means DMA, and this CLI means the encoder at the end of the pipeline will only output the HEVC stream, not the rawvideo stream.

To access DMA memory, you must use hwmap=mode=read,format=yuv420p or hwdownload,format=yuv420p and remove the encoder -c:v hevc_rkmpp.

./ffmpeg -hwaccel rkmpp -hwaccel_output_format drm_prime -i rtsp://camera1 -vf scale_rkrga=w=1920:h=1080:format=yuv420p,hwmap=mode=read,format=yuv420p -y -f rawvideo /dev/null

phiber · January 2, 2024, 12:27pm

this one produces extremely high cpu usage that just goes up. starts at 16% goes to 160%

[rtsp @ 0x558b118880] jitter buffer fullkB time=00:00:02.48 bitrate=310344.7kbits/s dup=2 drop=1 speed=1.46x
[rtsp @ 0x558b118880] RTP: missed 14 packets
[rtsp @ 0x558b118880] jitter buffer fullkB time=00:00:06.72 bitrate=310827.9kbits/s dup=48 drop=1 speed=1.19x
[rtsp @ 0x558b118880] RTP: missed 18 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packetrate=306004.3kbits/s dup=247 drop=1 speed=1.97x
[rtsp @ 0x558b118880] RTP: missed 13643 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packet
[rtsp @ 0x558b118880] RTP: missed 4570 packets
[rtsp @ 0x558b118880] RTP: PT=68: bad cseq 24e3 expected=1309
[rtsp @ 0x558b118880] max delay reached. need to consume packet
[rtsp @ 0x558b118880] RTP: missed 1901 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packet
[rtsp @ 0x558b118880] RTP: missed 18 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packetrate=310971.5kbits/s dup=247 drop=1 speed=1.55x
[rtsp @ 0x558b118880] RTP: missed 1337 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packetrate=310971.5kbits/s dup=247 drop=1 speed=1.45x
[rtsp @ 0x558b118880] RTP: missed 6 packets
[rtsp @ 0x558b118880] jitter buffer fullkB time=00:00:25.04 bitrate=310971.5kbits/s dup=247 drop=1 speed=1.22x
[rtsp @ 0x558b118880] RTP: missed 5 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packetrate=310971.5kbits/s dup=247 drop=1 speed=1.13x
[rtsp @ 0x558b118880] RTP: missed 1 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packet
[rtsp @ 0x558b118880] RTP: missed 1663 packets
[rtsp @ 0x558b118880] RTP: dropping old packet received too lateate=310971.5kbits/s dup=247 drop=1 speed=0.942x
[rtsp @ 0x558b118880] jitter buffer full
[rtsp @ 0x558b118880] RTP: missed 45 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packet
[rtsp @ 0x558b118880] RTP: missed 1712 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packet
[rtsp @ 0x558b118880] RTP: missed 8711 packets
[rtsp @ 0x558b118880] max delay reached. need to consume packet
[rtsp @ 0x558b118880] RTP: missed 23491 packets
[rtsp @ 0x558b118880] RTP: PT=68: bad cseq 6fa8 expected=13e5
[vost#0:0/rawvideo @ 0x558b18bf20] More than 1000 frames duplicated=310971.5kbits/s dup=247 drop=1 speed=0.804x

nyanmisaka · January 2, 2024, 4:18pm

That’s just an example, used to dump rawvideo. You need to modify the CLI yourself to suit your use case.

phiber · January 3, 2024, 9:51am

frigate uses rawvideo.

nyanmisaka · January 3, 2024, 11:26am

Frigate: NVR with realtime local object detection for IP cameras

For real-time object detection applications, hundreds or thousands of FPS may not seem necessary. Use the -re parameter or -vf realtime,... to limit the FPS to match your camera’s frame rate.

./ffmpeg -hwaccel rkmpp -hwaccel_output_format drm_prime -i rtsp://camera1 -vf realtime,scale_rkrga=w=1920:h=1080:format=yuv420p,hwmap=mode=read,format=yuv420p -y -f rawvideo /dev/null

./ffmpeg -hwaccel rkmpp -hwaccel_output_format drm_prime -re -i rtsp://camera1 -vf scale_rkrga=w=1920:h=1080:format=yuv420p,hwmap=mode=read,format=yuv420p -y -f rawvideo /dev/null

https://ffmpeg.org/ffmpeg-all.html
https://ffmpeg.org/ffmpeg-all.html#toc-realtime_002c-arealtime

-readrate speed (input)
Limit input read speed.

Its value is a floating-point positive number which represents the maximum duration of media, in seconds, that should be ingested in one second of wallclock time. Default value is zero and represents no imposed limitation on speed of ingestion. Value 1 represents real-time speed and is equivalent to -re.

Mainly used to simulate a capture device or live input stream (e.g. when reading from a file). Should not be used with a low value when input is an actual capture device or live stream as it may cause packet loss.

It is useful for when flow speed of output packets is important, such as live streaming.

-re (input)
Read input at native frame rate. This is equivalent to setting -readrate 1.

Dbenton · January 4, 2024, 5:19pm

Actually, you would be surprised. The entire reason I started building my own ffmpeg and opencv on my rock5b was because I was experiencing errors when attempting to play and record videos related to the RKMPP. Built ffmpeg and opencv myself, no issues.

phiber · January 5, 2024, 1:19pm

nah it doesn’t work.

command options #1 you suggested linked with rtsp-camera-specific flags:

./ffmpeg -hwaccel rkmpp -hwaccel_output_format drm_prime -re -i “rtsp://camera1” -vf scale_rkrga=w=1920:h=1080:format=yuv420p,hwmap=mode=read,format=yuv420p -avoid_negative_ts make_zero -fflags genpts+discardcorrupt -strict experimental -rtsp_transport tcp -timeout 5000000 -y -f rawvideo /dev/null

this generates the following errors, note the slower-than-realtime speed and dropped packets galore

[rtsp @ 0x55d18fa890] Multi-layer HEVC coding is not implemented. Update your FFmpeg version to the newest one from Git. If the problem still occurs, it means that your file has a feature which has not been implemented.
[rtsp @ 0x55d18fa890] max delay reached. need to consume packet
[rtsp @ 0x55d18fa890] RTP: missed 569 packets
[rtsp @ 0x55d18fa890] max delay reached. need to consume packetrate=310767.5kbits/s dup=33 drop=1 speed=0.886x
[rtsp @ 0x55d18fa890] RTP: missed 905 packets
[rtsp @ 0x55d18fa890] max delay reached. need to consume packetrate=310773.6kbits/s dup=33 drop=1 speed=0.567x
[rtsp @ 0x55d18fa890] RTP: missed 903 packets
[rtsp @ 0x55d18fa890] max delay reached. need to consume packetrate=310773.6kbits/s dup=33 drop=1 speed= 0.4x
[rtsp @ 0x55d18fa890] RTP: missed 1124 packets
[rtsp @ 0x55d18fa890] max delay reached. need to consume packetrate=310773.6kbits/s dup=33 drop=1 speed=0.31x
[rtsp @ 0x55d18fa890] RTP: missed 1127 packets
[rtsp @ 0x55d18fa890] max delay reached. need to consume packetrate=310773.6kbits/s dup=33 drop=1 speed=0.253x
[rtsp @ 0x55d18fa890] RTP: missed 1173 packets

command options #2 you suggested linked with rtsp-camera-specific flags:

./ffmpeg -hwaccel rkmpp -hwaccel_output_format drm_prime -i “rtsp://camera1” -vf realtime,scale_rkrga=w=1920:h=1080:format=yuv420p,hwmap=mode=read,format=yuv420p -avoid_negative_ts make_zero -fflags genpts+discardcorrupt -strict experimental -rtsp_transport tcp -timeout 5000000 -y -f rawvideo /dev/null

this generates the following errors dropped packets galore, after some time

[rtsp @ 0x55a3dd9880] Multi-layer HEVC coding is not implemented. Update your FFmpeg version to the newest one from Git. If the problem still occurs, it means that your file has a feature which has not been implemented.
[rtsp @ 0x55a3dd9880] max delay reached. need to consume packet
[rtsp @ 0x55a3dd9880] RTP: missed 606 packets
[rtsp @ 0x55a3dd9880] max delay reached. need to consume packetrate=292213.5kbits/s dup=3 drop=0 speed=0.822x
[rtsp @ 0x55a3dd9880] RTP: missed 96 packets
[rtsp @ 0x55a3dd9880] max delay reached. need to consume packet
[rtsp @ 0x55a3dd9880] RTP: missed 701 packets
[Parsed_realtime_0 @ 0x55a3e6ef30] time discontinuity detected: 2506344 us, resettingup=3 drop=0 speed=0.506x
[Parsed_realtime_0 @ 0x55a3e6ef30] time discontinuity detected: -2155967 us, resettingp=75 drop=16 speed=1.02x

but if using software scaler and hevc native decoder:

./ffmpeg -i “rtsp://camera1” -vf scale=1920:1080 -avoid_negative_ts make_zero -fflags discardcorrupt+genpts -strict experimental -rtsp_transport tcp -timeout 5000000 -y -pix_fmt yuv420p -f rawvideo /dev/null

there are no issues, except for the fact that it’s using CPU instead of VPU for everything.

scote · January 23, 2024, 10:27am

Firstly, many thanks @nyanmisaka for the work on this, I’ve been waiting for hwacceleration to be workable on Rockchip for years, I never got anything working properly on rk3399 using bsp or mainline drm_prime.

I am testing this out on a rock 3a (rk3568) with a view to buy a rock5b I intend to use with frigate. I have distilled the frigate ffmpeg pipeline to only include the decoding to yuv420p at 5fps scaled from 1600x900 to 1280x720 for the object recognition part. My input is:
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1600x900, 25 fps, 10 tbr, 90k tbn
I am using this command to test:
ffmpeg -loglevel info -hwaccel rkmpp -hwaccel_output_format drm_prime -rtsp_transport tcp -i rtsp://nano.local:8554/back -vf scale_rkrga=w=1280:h=720:format=yuv420p,hwmap=mode=read,format=yuv420p -r 5 -f rawvideo -y pipe: |cat > 1.yuv
This results in about 10% cpu usage of one core, which I think is good for the low power A55 cores the rk3568 has. I am using pipe: and cat to separate the CPU involved in filesystem use and that of ffmpeg.

Is there any further optimisation possible on the above ffmpeg command?

nyanmisaka · January 24, 2024, 10:02am

IMHO this pipeline is highly optimized, but the memcpy still happens between the hwmap mapped yuv420p pointers and the rawvideo.

To continue optimizing, Frigate needs to be able to read yuv420p frames directly through the Libav* API, or even consume drm_prime(dma fd) frames directly.

boogiepop · January 24, 2024, 7:30pm

I think piping is another copy, why pipe?

scote · January 25, 2024, 9:23am

Frigate reads from pipe, I wanted to emulate that part, but also capture the raw output to a file. I did not want the filesystem cpu activity to be attributed to ffmpeg so I could see how much cpu ffmpeg uses writing to a pipe only. This allows me to make comparisons with other systems for HW decode. eg I am currently using a Jetson nano for this task and ffmpeg uses about 6% of one cpu. The A57 cores @ 1.43Ghz on jetson are slightly more powerful than the A55 cores @ 1.9Ghz on rock-3a. So I conclude that the ffmpeg HW acceleration cpu use is marginally better on Jetson using their NVMPI decoder than Rockchip decoder, but in the same ballpark really.

Datasheets state that Jetson can do 4K 60FPS and RK3588 which I’m planning on buying can do 8K 60FPS, so I can support more cameras at higher resolution on RK3588. So I think I wil go ahead and buy a Rock5B to replace my current two SBC - rk3399 + jetson nano, then I can run my Home Assistant, Frigate, etc all on one system.

boogiepop · January 25, 2024, 6:53pm

I see, i think frigate guys should consider using pyav instead, you just can not pipe and expect performance. A lot of performance is lost in several stages including piping.

Frankie_Yuen · March 7, 2024, 5:43pm

Hello, I am trying to debug high cpu usage with Frigate using Rock5b.

The camera is a 1920x1080p feed, is such high cpu usage normal?

And also Frigate shows GPU hardware acceleration has not been setup.

scote · March 9, 2024, 8:08am

You need to use the rockchip preset in your config probably, I can see that you are successfully using the rockchip NPU so you have the right docker image. The current images are not using the ffmpeg mentioned in this thread, they use the other build, so the ffmpeg options will be different. Look in ffmpeg_presets.py

flyingRich · March 25, 2024, 3:11pm

Scote, What is the right docker image for the NPU?

Is it the image: blakeblackshear/frigate:stable-rk

What detector should I be using?

I have a coral plugged inby USB,

This works:

detectors:
coral:
type: edgetpu
device: usb

This does not work:

detectors:
rknn:
type: rknn
core_mask: 0b111

Any help on this is greatly appreciated!

scote · March 25, 2024, 6:04pm

The builds with RK should work, but I don’t know if the new ffmpeg has been integrated yet, you can tell by looking in the presets python file, you need to look in there anyway to determine the presets to use.

I built my own ffmpeg and added it to the image using the supported method, I also modded the presets to match that.

The home assistant addon version of frigate does not currently work with the built NPU, there is something preventing access to some file in /proc which the closed source rockchip library uses to determine what variant board its running on. To get around this just run the standalone version of frigate instead. I had both a Corel and the NPU running at one point just fine. I’ve stopped using the Corel now and freed up a USB port.

There is more info on frigate GitHub, search up rockchip in issues, my username is spattinson over there you can see some interaction from the dev that kindly maintains the rockchip variant

flyingRich · March 26, 2024, 1:18am

I am running frigate separately from home assistant. Thanks for the reply! I’ll check frigate github!

flyingRich · March 26, 2024, 1:48am

ffmpeg_presets.py dos not have any of the RK hdw acceleration.

I do not understand why I can not get the NPU to work.

I have other boards, ready to spin one up and start from scratch.