0

I was hoping someone could help give me some ideas or point me towards some further reading materials for working with Mediapipe to create custom Android Applications using an Iris .aar., I've poured over the official MediaPipe documentation but have found it to be a bit limited and now I'm struggling to make progress. I'm stuck trying to add the expected Side Packet for the Iris Model and trying to extract specific landmark coordinates in real time.

My aim to create an open-source gaze direction driven text-to-speech keyboard for accessibility purposes that uses a modified MediaPipe Iris solution to infer the user's gaze direction for controlling the app, and I would really appreciate any help towards this.

Here's my development plan and progress so far:

  1. Set up Mediapipe and build examples from the command line DONE
  2. Generate .aars for face detection and iris tracking DONE
  3. Set up Android Studio for building Mediapipe apps DONE
  4. Build and test Face Detection example app using an .aar DONE
  5. Modify face detection example to use Iris .aar IN PROGRESS
  6. Output coordinates between iris and edges of the eyes and distance between to estimate the direction in real-time. Or modify the graphs and calculators to infer this for me if possible and rebuild the .aar
  7. Integrate gaze direction into a control scheme in the app.
  8. Extend app functionality once initial control is implemented.

So far I have generated an Iris .aar using the following build file, Does the .aar I built contain the calculators for the subgraphs as well as the main graphs or do I need to add something else to my AAR Build File?

.aar BUILD File:

load("//mediapipe/java/com/google/mediapipe:mediapipe_aar.bzl", "mediapipe_aar")
mediapipe_aar(
name = "mp_iris_tracking_aar",
calculators = ["//mediapipe/graphs/iris_tracking :iris_tracking_gpu_deps"],
)

At the moment I have an android studio project set up with the below assets and the aforementioned Iris .aar.

Android Studio Assets:
iris_tracking_gpu.binarypb
face_landmark.tflite
iris_landmark.tflite
face_detection_front.tflite

For now, I'm just trying to build this as is so I better understand the process and can verify my build environment is set up correctly. I've already successfully built and tested the face detection examples listed in the docs which runs correctly, however when modifying the project to utilize the iris .aar it builds correctly but crashes when ran, with the exception: Side Packet "focal_length_pixel" is required but not provided.

I've tried to add code for the focal length to onCreate based on the Iris example in the media pipe rep but I don't know how to modify this to work with an Iris .aar, Are there any further docs I can read to point me in the right direction?

I need to integrate this snippet (I think) into the modified code for the face-detection example but not sure how. Thanks for your help :)

    float focalLength = cameraHelper.getFocalLengthPixels();
    if (focalLength != Float.MIN_VALUE) {
    Packet focalLengthSidePacket = processor.getPacketCreator().createFloat32(focalLength);
    Map<String, Packet> inputSidePackets = new HashMap<>();
    inputSidePackets.put(FOCAL_LENGTH_STREAM_NAME, focalLengthSidePacket);
    processor.setInputSidePackets(inputSidePackets);
    }
    haveAddedSidePackets = true;
Modified Face Tracking AAR example:
package com.example.iristracking;

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

import android.graphics.SurfaceTexture;
import android.os.Bundle;
import android.util.Log;
import java.util.HashMap;
import java.util.Map;
import androidx.appcompat.app.AppCompatActivity;
import android.util.Size;
import android.view.SurfaceHolder;
import android.view.SurfaceView;
import android.view.View;
import android.view.ViewGroup;
import com.google.mediapipe.components.CameraHelper;
import com.google.mediapipe.components.CameraXPreviewHelper;
import com.google.mediapipe.components.ExternalTextureConverter;
import com.google.mediapipe.components.FrameProcessor;
import com.google.mediapipe.components.PermissionHelper;
import com.google.mediapipe.framework.AndroidAssetUtil;
import com.google.mediapipe.framework.Packet;
import com.google.mediapipe.glutil.EglManager;

/** Main activity of MediaPipe example apps. */
public class MainActivity extends AppCompatActivity {
private static final String TAG = "MainActivity";
private boolean haveAddedSidePackets = false;

private static final String FOCAL_LENGTH_STREAM_NAME = "focal_length_pixel";
private static final String OUTPUT_LANDMARKS_STREAM_NAME = "face_landmarks_with_iris";

private static final String BINARY_GRAPH_NAME = "iris_tracking_gpu.binarypb";
private static final String INPUT_VIDEO_STREAM_NAME = "input_video";
private static final String OUTPUT_VIDEO_STREAM_NAME = "output_video";
private static final CameraHelper.CameraFacing CAMERA_FACING = CameraHelper.CameraFacing.FRONT;

// Flips the camera-preview frames vertically before sending them into FrameProcessor to be
// processed in a MediaPipe graph, and flips the processed frames back when they are displayed.
// This is needed because OpenGL represents images assuming the image origin is at the bottom-left
// corner, whereas MediaPipe in general assumes the image origin is at top-left.
private static final boolean FLIP_FRAMES_VERTICALLY = true;

static {
    // Load all native libraries needed by the app.
    System.loadLibrary("mediapipe_jni");
    System.loadLibrary("opencv_java3");
}

// {@link SurfaceTexture} where the camera-preview frames can be accessed.
private SurfaceTexture previewFrameTexture;
// {@link SurfaceView} that displays the camera-preview frames processed by a MediaPipe graph.
private SurfaceView previewDisplayView;

// Creates and manages an {@link EGLContext}.
private EglManager eglManager;
// Sends camera-preview frames into a MediaPipe graph for processing, and displays the processed
// frames onto a {@link Surface}.
private FrameProcessor processor;
// Converts the GL_TEXTURE_EXTERNAL_OES texture from Android camera into a regular texture to be
// consumed by {@link FrameProcessor} and the underlying MediaPipe graph.
private ExternalTextureConverter converter;

// Handles camera access via the {@link CameraX} Jetpack support library.
private CameraXPreviewHelper cameraHelper;


@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    previewDisplayView = new SurfaceView(this);
    setupPreviewDisplayView();

    // Initialize asset manager so that MediaPipe native libraries can access the app assets, e.g.,
    // binary graphs.
    AndroidAssetUtil.initializeNativeAssetManager(this);

    eglManager = new EglManager(null);
    processor =
            new FrameProcessor(
                    this,
                    eglManager.getNativeContext(),
                    BINARY_GRAPH_NAME,
                    INPUT_VIDEO_STREAM_NAME,
                    OUTPUT_VIDEO_STREAM_NAME);
    processor.getVideoSurfaceOutput().setFlipY(FLIP_FRAMES_VERTICALLY);

    PermissionHelper.checkAndRequestCameraPermissions(this);


}

@Override
protected void onResume() {
    super.onResume();
    converter = new ExternalTextureConverter(eglManager.getContext());
    converter.setFlipY(FLIP_FRAMES_VERTICALLY);
    converter.setConsumer(processor);
    if (PermissionHelper.cameraPermissionsGranted(this)) {
        startCamera();
    }
}

@Override
protected void onPause() {
    super.onPause();
    converter.close();
}

@Override
public void onRequestPermissionsResult(
        int requestCode, String[] permissions, int[] grantResults) {
    super.onRequestPermissionsResult(requestCode, permissions, grantResults);
    PermissionHelper.onRequestPermissionsResult(requestCode, permissions, grantResults);
}

private void setupPreviewDisplayView() {
    previewDisplayView.setVisibility(View.GONE);
    ViewGroup viewGroup = findViewById(R.id.preview_display_layout);
    viewGroup.addView(previewDisplayView);

    previewDisplayView
            .getHolder()
            .addCallback(
                    new SurfaceHolder.Callback() {
                        @Override
                        public void surfaceCreated(SurfaceHolder holder) {
                            processor.getVideoSurfaceOutput().setSurface(holder.getSurface());
                        }

                        @Override
                        public void surfaceChanged(SurfaceHolder holder, int format, int width, int height) {
                            // (Re-)Compute the ideal size of the camera-preview display (the area that the
                            // camera-preview frames get rendered onto, potentially with scaling and rotation)
                            // based on the size of the SurfaceView that contains the display.
                            Size viewSize = new Size(width, height);
                            Size displaySize = cameraHelper.computeDisplaySizeFromViewSize(viewSize);

                            // Connect the converter to the camera-preview frames as its input (via
                            // previewFrameTexture), and configure the output width and height as the computed
                            // display size.
                            converter.setSurfaceTextureAndAttachToGLContext(
                                    previewFrameTexture, displaySize.getWidth(), displaySize.getHeight());
                        }

                        @Override
                        public void surfaceDestroyed(SurfaceHolder holder) {
                            processor.getVideoSurfaceOutput().setSurface(null);
                        }
                    });
}

private void startCamera() {
    cameraHelper = new CameraXPreviewHelper();
    cameraHelper.setOnCameraStartedListener(
            surfaceTexture -> {
                previewFrameTexture = surfaceTexture;
                // Make the display view visible to start showing the preview. This triggers the
                // SurfaceHolder.Callback added to (the holder of) previewDisplayView.
                previewDisplayView.setVisibility(View.VISIBLE);
            });
    cameraHelper.startCamera(this, CAMERA_FACING, /*surfaceTexture=*/ null);

}
}
Ben Webb
  • 27
  • 6

1 Answers1

0
override fun onResume() {
        super.onResume()
        converter = ExternalTextureConverter(eglManager?.context, NUM_BUFFERS)

        if (PermissionHelper.cameraPermissionsGranted(this)) {
            var rotation: Int = 0
            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.R) {
                rotation = this.display!!.rotation
            } else {
                rotation = this.windowManager.defaultDisplay.rotation
            }

            converter!!.setRotation(rotation)
            converter!!.setFlipY(FLIP_FRAMES_VERTICALLY)

            startCamera(rotation)

            if (!haveAddedSidePackets) {
                val packetCreator = mediapipeFrameProcessor!!.getPacketCreator();
                val inputSidePackets = mutableMapOf<String, Packet>()

                focalLength = cameraHelper?.focalLengthPixels!!
                Log.i(TAG_MAIN, "OnStarted focalLength: ${cameraHelper?.focalLengthPixels!!}")
                inputSidePackets.put(
                    FOCAL_LENGTH_STREAM_NAME,
                    packetCreator.createFloat32(focalLength.width.toFloat())
                )
                mediapipeFrameProcessor!!.setInputSidePackets(inputSidePackets)
                haveAddedSidePackets = true

                val imageSize = cameraHelper!!.imageSize
                val calibrateMatrix = Matrix()
                calibrateMatrix.setValues(
                    floatArrayOf(
                        focalLength.width * 1.0f,
                        0.0f,
                        imageSize.width / 2.0f,
                        0.0f,
                        focalLength.height * 1.0f,
                        imageSize.height / 2.0f,
                        0.0f,
                        0.0f,
                        1.0f
                    )
                )
                val isInvert = calibrateMatrix.invert(matrixPixels2World)
                if (!isInvert) {
                    matrixPixels2World = Matrix()
                }
            }
            converter!!.setConsumer(mediapipeFrameProcessor)
        }
    }`
B001ᛦ
  • 2,036
  • 6
  • 23
  • 31