Android Vision Face Detection with Video Stream

Question

I am trying to integrate the face detection api in a video stream I am receiving from a parrot bebop drone.

The stream is decoded with the MediaCodec class (http://developer.android.com/reference/android/media/MediaCodec.html) and this is working fine. Rather than rendering the decoded frame data to a surface view, I can successfully access the ByteBuffer with the decoded frame data from the decoder.

I can also access the decoded image objects (class https://developer.android.com/reference/android/media/Image.html) from the decoder, they have a timestamp and I get the following infos:

width: 640
height: 368
format: YUV_420_888

First thing I tried to do was generating Frame objects for the vision api (com/google/android/gms/vision/Frame) via the Framebuilder (android/gms/vision/Frame.Builder)

...
 ByteBuffer decodedOutputByteBufferFrame = mediaCodec.getOutputBuffer(outIndex);
Image image =  mediaCodec.getOutputImage(outIndex);
...
decodedOutputByteBufferFrame.position(bufferInfo.offset);
decodedOutputByteBufferFrame.limit(bufferInfo.offset+bufferInfo.size);
frameBuilder.setImageData(decodedOutputByteBufferFrame, 640, 368,ImageFormat.YV12);
frameBuilder.setTimestampMillis(image.getTimestamp());
Frame googleVisFrame = frameBuilder.build();

This codes does not give me any error and the googleVisFrame object is not null, but when I call googleVis.getBitmap(), I get null. Subsequently, a Facedetection does not work (I suppose because there's an issue with my vision frame objects...)

Even if this would work, I am not sure how to handle the videostream with the vision api as all the code I find demonstrates the use with the internal camera.

If you could point me to the right direction, I would be very thankfull.

If you haven't already, you should try converting the output frame to a bitmap and view it to see if the data looks correct. The ByteBuffer output from MediaCodec is not in the same format as the output of Camera, so treating it as YV12 or NV21 will be slightly off (wrong colors) or completely wrong. See also http://bigflake.com/mediacodec/#q5 — fadden, Oct 16 '15 at 15:42
What do you use the `Image` for here - only for the timestamp? Accessing the buffers via `ByteBuffer` and `Image` are mutually exclusive - when you call `getOutputImage`, the `ByteBuffer` you got on the line above gets invalidated. If the receiving API can use `Image`, that'd be better - there's no guarantee that the data in the decoder output buffers are packed exactly in the right way to match e.g. `ImageFormat.YV12` (it most probably isn't, but if you only look at the luma plane, it might be fine). — mstorsjo, Oct 16 '15 at 18:20
Note that the Frame class will only return a Bitmap if it was created using a Bitmap. Supplying a ByteBuffer in creating the frame assumes that the leading bytes represent the Y channel (the rest is ignored). I'd guess that the problem is either the invalidation issue that mstorsjo indicated, or that the data isn't really in YV12 format, or maybe the image is really a different size. — pm0733464, Oct 19 '15 at 15:33
Many thanks for your reply and sorry for my late response. > Accessing the buffers via ByteBuffer and Image are mutually exclusive I guessed that but was not sure - thanks for the clarification. > If the receiving API can use Image, that'd be better Unfortunately, i don't see a way to "feed" the google vis api with this image object. > or that the data isn't really in YV12 format You're right, it is not. I have to dig in deeper in the formats and will report back if i come up with a solution. — Norman Süsstrunk, Oct 22 '15 at 07:31

Android Vision Face Detection with Video Stream

0 Answers0

Linked