5

I'm trying to compile ffmpeg into javascript so that I can decode H.264 video streams using node. The streams are H.264 frames packed into RTP NALUs so any solution has to be able to accept H.264 frames rather than a whole file name. These frames can't be in a container like MP4 or AVI because then the demuxer needs to needs the timestamp of every frame before demuxing can occur, but I'm dealing with a real time stream, no containers.

Streaming H.264 over RTP

Below is the basic code I'm using to listen on a udp socket. Inside the 'message' callback the data packet is an RTP datagram. The data portion of the data gram is an H.264 frame (P-frames and I-frames).

var PORT = 33333;
var HOST = '127.0.0.1';

var dgram = require('dgram');
var server = dgram.createSocket('udp4');

server.on('listening', function () {
    var address = server.address();
    console.log('UDP Server listening on ' + address.address + ":" + address.port);
});

server.on('message', function (message, remote) {
    console.log(remote.address + ':' + remote.port +' - ' + message);
    frame = parse_rtp(message);

    rgb_frame = some_library.decode_h264(frame); // This is what I need.

});

server.bind(PORT, HOST);  

I found the Broadway.js library, but I couldn't get it working and it doesn't handle P-frames which I need. I also found ffmpeg.js, but could get that to work and it needs a whole file not a stream. Likewise, fluent-ffmpeg doesn't appear to support file streams; all of the examples show a filename being passed to the constructor. So I decided to write my own API.

My current solution attempt

I have been able to compile ffmpeg into one big js file, but I can't use it like that. I want to write an API around ffmpeg and then expose those functions to JS. So it seems to me like I need to do the following:

  1. Compile ffmpeg components (avcodec, avutil, etc.) into llvm bitcode.
  2. Write a C wrapper that exposes the decoding functionality and uses EMSCRIPTEN_KEEPALIVE.
  3. Use emcc to compile the wrapper and link it to the bitcode created in step 1.

I found WASM+ffmpeg, but it's in Chinese and some of the steps aren't clear. In particular there is this step:

emcc web.c process.c ../lib/libavformat.bc ../lib/libavcodec.bc ../lib/libswscale.bc ../lib/libswresample.bc ../lib/libavutil.bc \

:( Where I think I'm stuck

I don't understand how all the ffmpeg components get compiled into separate *.bc files. I followed the emmake commands in that article and I end up with one big .bc file.

2 questions

1. Does anyone know the steps to compile ffmpeg using emscripten so that I can expose some API to javascript?
2. Is there a better way (with decent documentation/examples) to decode h264 video streams using node?

noel
  • 2,257
  • 3
  • 24
  • 39
  • try looking at fluent api : https://www.npmjs.com/package/fluent-ffmpeg – Robert Rowntree Sep 18 '18 at 00:39
  • @RobertRowntree fluent-ffmpeg doesn't appear to support file stream judging by their examples. I updated my question. – noel Sep 18 '18 at 01:28
  • 1
    The documentation says "You may pass an input file name or readable stream, a configuration object, or both to the constructor". What do you mean by 'file stream'? – Alex Taylor Sep 18 '18 at 01:41
  • 1
    https://github.com/fluent-ffmpeg/node-fluent-ffmpeg/blob/master/examples/input-stream.js looks like input stream to me – Robert Rowntree Sep 18 '18 at 02:16
  • I updated my post to show an example of the type of "stream" I'm talking about. – noel Sep 18 '18 at 04:45
  • 1
    You could open a child process to run ffmpeg and then feed your RTP messages into stdin of that child process and read your rgb_frames from stdout of the child process. – GroovyDotCom Sep 26 '18 at 20:58
  • If you want even simpler and don't really need to be parsing the RTP yourself, you could let ffmpeg handle the RTP transport also and the child command would be something like ffmpeg -i rtp://127.0.0.1:9999 -f rawvideo -pix_fmt rgb24 -s 320x240 -vcodec rawvideo. Each 320x240x3 bytes you read from the child will be one RGB frame. – GroovyDotCom Sep 26 '18 at 21:26
  • 1
    Please describe your "H.264 frames packed into RTP NALUs" in more detail -- without that, it's impossible to say how one could possibly work with that. E.g. you could show a [mcve] that generates them. E.g. how is the receiving end supposed to figure out the metadata? timestamps? connect interdependent frames together? – ivan_pozdeev Sep 27 '18 at 00:21
  • Final comment: It seems to me you have wrongly convinced yourself that your solution "shouldn't be calling some sort of subprocess but should actually expose the ffmpeg API". The ffmpeg CLI does a pretty good job of exposing the FFMPEG functionality in a way that is likely easier and more efficient to invoke from Javascript. – GroovyDotCom Sep 27 '18 at 06:28
  • I kind of agree with that. I was inspired by Broadway.js et al. but I think I've found a solution that works nicely using websockets. Might post a MCVE for the future. I'd still like to figure out how to use ffmpeg with wasm though. – noel Sep 27 '18 at 07:00

2 Answers2

1

To question 1: Just follow the official doc

Consider the case where you normally build with the following commands:

./configure
make

To build with Emscripten, you would instead use the following commands:

# Run emconfigure with the normal configure command as an argument.
./emconfigure ./configure

# Run emmake with the normal make to generate linked LLVM bitcode.
./emmake make

# Compile the linked bitcode generated by make (project.bc) to JavaScript.
# 'project.bc' should be replaced with the make output for your project (e.g. 'yourproject.so')
#  [-Ox] represents build optimisations (discussed in the next section).
./emcc [-Ox] project.bc -o project.js

To question 2: c/c++ libs can be called in a node environment. You can write some c/c++ glue code or use a proxy node module like node-ffi.

Using node-ffi to call exist libs may be easier. May it help :)

Dragonite
  • 36
  • 3
0

Easiest way, specially if you need to run it in web browsers, is to utilize Media Source Extension. I managed to do it in just 3 days. Moreover, it is used automatically GPU hardware ( Cuda, Intel Qsv, ... ) acceleration as far as browser builtin support. It is important if it runs in the real World app. I have tested yesterday, just 5% of my old i7 machine CPU to decode 4K ( 4 times more than 1080p ) IP Camera H.264 nal raw live streaming. I am not sure it under server side js like nodejs, but I expect the result is similar. Contact me if you need further more. About H.265 / HEVC, you need emscripten it partially of ffmpeg or x265 or OpenH265 similarily, with minimal bc size ( less than 1,2,3 M depending on config. ). Good luck...