13

In my app I'm trying to do face recognition on a specific image using Open CV, here first I'm training one image and then after training that image if I run face recognition on that image it successfully recognizes that trained face. However, when I turn to another picture of the same person recognition does not work. It just works on the trained image, so my question is how do I rectify it?

Update: What i want to do is that user should select image of a person from storage and then after training that selected image i want to fetch all images from storage which matches face of my trained image

Here is my activity class:

public class MainActivity extends AppCompatActivity {
    private Mat rgba,gray;
    private CascadeClassifier classifier;
    private MatOfRect faces;
    private ArrayList<Mat> images;
    private ArrayList<String> imagesLabels;
    private Storage local;
    ImageView mimage;
    Button prev,next;
    ArrayList<Integer> imgs;
    private int label[] = new int[1];
    private double predict[] = new double[1];
    Integer pos = 0;
    private String[] uniqueLabels;
    FaceRecognizer recognize;
    private boolean trainfaces() {
        if(images.isEmpty())
            return false;
        List<Mat> imagesMatrix = new ArrayList<>();
        for (int i = 0; i < images.size(); i++)
            imagesMatrix.add(images.get(i));
        Set<String> uniqueLabelsSet = new HashSet<>(imagesLabels); // Get all unique labels
        uniqueLabels = uniqueLabelsSet.toArray(new String[uniqueLabelsSet.size()]); // Convert to String array, so we can read the values from the indices

        int[] classesNumbers = new int[uniqueLabels.length];
        for (int i = 0; i < classesNumbers.length; i++)
            classesNumbers[i] = i + 1; // Create incrementing list for each unique label starting at 1
        int[] classes = new int[imagesLabels.size()];
        for (int i = 0; i < imagesLabels.size(); i++) {
            String label = imagesLabels.get(i);
            for (int j = 0; j < uniqueLabels.length; j++) {
                if (label.equals(uniqueLabels[j])) {
                    classes[i] = classesNumbers[j]; // Insert corresponding number
                    break;
                }
            }
        }
        Mat vectorClasses = new Mat(classes.length, 1, CvType.CV_32SC1); // CV_32S == int
        vectorClasses.put(0, 0, classes); // Copy int array into a vector

        recognize = LBPHFaceRecognizer.create(3,8,8,8,200);
        recognize.train(imagesMatrix, vectorClasses);
        if(SaveImage())
            return true;

        return false;
    }
    public void cropedImages(Mat mat) {
        Rect rect_Crop=null;
        for(Rect face: faces.toArray()) {
            rect_Crop = new Rect(face.x, face.y, face.width, face.height);
        }
        Mat croped = new Mat(mat, rect_Crop);
        images.add(croped);
    }
    public boolean SaveImage() {
        File path = new File(Environment.getExternalStorageDirectory(), "TrainedData");
        path.mkdirs();
        String filename = "lbph_trained_data.xml";
        File file = new File(path, filename);
        recognize.save(file.toString());
        if(file.exists())
            return true;
        return false;
    }

    private BaseLoaderCallback callbackLoader = new BaseLoaderCallback(this) {
        @Override
        public void onManagerConnected(int status) {
            switch(status) {
                case BaseLoaderCallback.SUCCESS:
                    faces = new MatOfRect();

                    //reset
                    images = new ArrayList<Mat>();
                    imagesLabels = new ArrayList<String>();
                    local.putListMat("images", images);
                    local.putListString("imagesLabels", imagesLabels);

                    images = local.getListMat("images");
                    imagesLabels = local.getListString("imagesLabels");

                    break;
                default:
                    super.onManagerConnected(status);
                    break;
            }
        }
    };

    @Override
    protected void onResume() {
        super.onResume();
        if(OpenCVLoader.initDebug()) {
            Log.i("hmm", "System Library Loaded Successfully");
            callbackLoader.onManagerConnected(BaseLoaderCallback.SUCCESS);
        } else {
            Log.i("hmm", "Unable To Load System Library");
            OpenCVLoader.initAsync(OpenCVLoader.OPENCV_VERSION, this, callbackLoader);
        }
    }

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        prev = findViewById(R.id.btprev);
        next = findViewById(R.id.btnext);
        mimage = findViewById(R.id.mimage);
       local = new Storage(this);
       imgs = new ArrayList();
       imgs.add(R.drawable.jonc);
       imgs.add(R.drawable.jonc2);
       imgs.add(R.drawable.randy1);
       imgs.add(R.drawable.randy2);
       imgs.add(R.drawable.imgone);
       imgs.add(R.drawable.imagetwo);
       mimage.setBackgroundResource(imgs.get(pos));
        prev.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                if(pos!=0){
                  pos--;
                  mimage.setBackgroundResource(imgs.get(pos));
                }
            }
        });
        next.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                if(pos<5){
                    pos++;
                    mimage.setBackgroundResource(imgs.get(pos));
                }
            }
        });
        Button train = (Button)findViewById(R.id.btn_train);
        train.setOnClickListener(new View.OnClickListener() {
            @RequiresApi(api = Build.VERSION_CODES.KITKAT)
            @Override
            public void onClick(View view) {
                rgba = new Mat();
                gray = new Mat();
                Mat mGrayTmp = new Mat();
                Mat mRgbaTmp = new Mat();
                classifier = FileUtils.loadXMLS(MainActivity.this);
                Bitmap icon = BitmapFactory.decodeResource(getResources(),
                        imgs.get(pos));
                Bitmap bmp32 = icon.copy(Bitmap.Config.ARGB_8888, true);
                Utils.bitmapToMat(bmp32, mGrayTmp);
                Utils.bitmapToMat(bmp32, mRgbaTmp);
                Imgproc.cvtColor(mGrayTmp, mGrayTmp, Imgproc.COLOR_BGR2GRAY);
                Imgproc.cvtColor(mRgbaTmp, mRgbaTmp, Imgproc.COLOR_BGRA2RGBA);
                /*Core.transpose(mGrayTmp, mGrayTmp); // Rotate image
                Core.flip(mGrayTmp, mGrayTmp, -1); // Flip along both*/
                gray = mGrayTmp;
                rgba = mRgbaTmp;
                Imgproc.resize(gray, gray, new Size(200,200.0f/ ((float)gray.width()/ (float)gray.height())));
                if(gray.total() == 0)
                    Toast.makeText(getApplicationContext(), "Can't Detect Faces", Toast.LENGTH_SHORT).show();
                classifier.detectMultiScale(gray,faces,1.1,3,0|CASCADE_SCALE_IMAGE, new Size(30,30));
                if(!faces.empty()) {
                    if(faces.toArray().length > 1)
                        Toast.makeText(getApplicationContext(), "Mutliple Faces Are not allowed", Toast.LENGTH_SHORT).show();
                    else {
                        if(gray.total() == 0) {
                            Log.i("hmm", "Empty gray image");
                            return;
                        }
                        cropedImages(gray);
                        imagesLabels.add("Baby");
                        Toast.makeText(getApplicationContext(), "Picture Set As Baby", Toast.LENGTH_LONG).show();
                        if (images != null && imagesLabels != null) {
                            local.putListMat("images", images);
                            local.putListString("imagesLabels", imagesLabels);
                            Log.i("hmm", "Images have been saved");
                            if(trainfaces()) {
                                images.clear();
                                imagesLabels.clear();
                            }
                        }
                    }
                }else {
                   /* Bitmap bmp = null;
                    Mat tmp = new Mat(250, 250, CvType.CV_8U, new Scalar(4));
                    try {
                        //Imgproc.cvtColor(seedsImage, tmp, Imgproc.COLOR_RGB2BGRA);
                        Imgproc.cvtColor(gray, tmp, Imgproc.COLOR_GRAY2RGBA, 4);
                        bmp = Bitmap.createBitmap(tmp.cols(), tmp.rows(), Bitmap.Config.ARGB_8888);
                        Utils.matToBitmap(tmp, bmp);
                    } catch (CvException e) {
                        Log.d("Exception", e.getMessage());
                    }*/
                    /*    mimage.setImageBitmap(bmp);*/
                    Toast.makeText(getApplicationContext(), "Unknown Face", Toast.LENGTH_SHORT).show();
                }
            }
        });
        Button recognize = (Button)findViewById(R.id.btn_recognize);
        recognize.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                if(loadData())
                    Log.i("hmm", "Trained data loaded successfully");
                rgba = new Mat();
                gray = new Mat();
                faces = new MatOfRect();
                Mat mGrayTmp = new Mat();
                Mat mRgbaTmp = new Mat();
                classifier = FileUtils.loadXMLS(MainActivity.this);
                Bitmap icon = BitmapFactory.decodeResource(getResources(),
                        imgs.get(pos));
                Bitmap bmp32 = icon.copy(Bitmap.Config.ARGB_8888, true);
                Utils.bitmapToMat(bmp32, mGrayTmp);
                Utils.bitmapToMat(bmp32, mRgbaTmp);
                Imgproc.cvtColor(mGrayTmp, mGrayTmp, Imgproc.COLOR_BGR2GRAY);
                Imgproc.cvtColor(mRgbaTmp, mRgbaTmp, Imgproc.COLOR_BGRA2RGBA);
                /*Core.transpose(mGrayTmp, mGrayTmp); // Rotate image
                Core.flip(mGrayTmp, mGrayTmp, -1); // Flip along both*/
                gray = mGrayTmp;
                rgba = mRgbaTmp;
                Imgproc.resize(gray, gray, new Size(200,200.0f/ ((float)gray.width()/ (float)gray.height())));
                if(gray.total() == 0)
                    Toast.makeText(getApplicationContext(), "Can't Detect Faces", Toast.LENGTH_SHORT).show();
                classifier.detectMultiScale(gray,faces,1.1,3,0|CASCADE_SCALE_IMAGE, new Size(30,30));
                if(!faces.empty()) {
                    if(faces.toArray().length > 1)
                        Toast.makeText(getApplicationContext(), "Mutliple Faces Are not allowed", Toast.LENGTH_SHORT).show();
                    else {
                        if(gray.total() == 0) {
                            Log.i("hmm", "Empty gray image");
                            return;
                        }
                        recognizeImage(gray);
                    }
                }else {
                    Toast.makeText(getApplicationContext(), "Unknown Face", Toast.LENGTH_SHORT).show();
                }
            }
        });


    }
    private void recognizeImage(Mat mat) {
        Rect rect_Crop=null;
        for(Rect face: faces.toArray()) {
            rect_Crop = new Rect(face.x, face.y, face.width, face.height);
        }
        Mat croped = new Mat(mat, rect_Crop);
        recognize.predict(croped, label, predict);
        int indice = (int)predict[0];
        Log.i("hmmcheck:",String.valueOf(label[0])+" : "+String.valueOf(indice));
        if(label[0] != -1 && indice < 125)
            Toast.makeText(getApplicationContext(), "Welcome "+uniqueLabels[label[0]-1]+"", Toast.LENGTH_SHORT).show();
        else
            Toast.makeText(getApplicationContext(), "You're not the right person", Toast.LENGTH_SHORT).show();
    }
    private boolean loadData() {
        String filename = FileUtils.loadTrained();
        if(filename.isEmpty())
            return false;
        else
        {
            recognize.read(filename);
            return true;
        }
    }
}

My File Utils Class:

   public class FileUtils {
        private static String TAG = FileUtils.class.getSimpleName();
        private static boolean loadFile(Context context, String cascadeName) {
            InputStream inp = null;
            OutputStream out = null;
            boolean completed = false;
            try {
                inp = context.getResources().getAssets().open(cascadeName);
                File outFile = new File(context.getCacheDir(), cascadeName);
                out = new FileOutputStream(outFile);

                byte[] buffer = new byte[4096];
                int bytesread;
                while((bytesread = inp.read(buffer)) != -1) {
                    out.write(buffer, 0, bytesread);
                }

                completed = true;
                inp.close();
                out.flush();
                out.close();
            } catch (IOException e) {
                Log.i(TAG, "Unable to load cascade file" + e);
            }
            return completed;
        }
        public static CascadeClassifier loadXMLS(Activity activity) {


            InputStream is = activity.getResources().openRawResource(R.raw.lbpcascade_frontalface);
            File cascadeDir = activity.getDir("cascade", Context.MODE_PRIVATE);
            File mCascadeFile = new File(cascadeDir, "lbpcascade_frontalface_improved.xml");
            FileOutputStream os = null;
            try {
                os = new FileOutputStream(mCascadeFile);
                byte[] buffer = new byte[4096];
                int bytesRead;
                while ((bytesRead = is.read(buffer)) != -1) {
                    os.write(buffer, 0, bytesRead);
                }
                is.close();
                os.close();

            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }


            return new CascadeClassifier(mCascadeFile.getAbsolutePath());
        }
        public static String loadTrained() {
            File file = new File(Environment.getExternalStorageDirectory(), "TrainedData/lbph_trained_data.xml");

            return file.toString();
        }
    }

These are the images i'm trying to compare here face of person is same still in recognition it's not matching! Image 1 Image 2

Farzad Vertigo
  • 2,458
  • 1
  • 29
  • 32
R.Coder
  • 257
  • 3
  • 17
  • When I built my final year assignment for Automatic Attendance System, I used 8-10 images of myself with slightly different poses and lighting conditions to train the classifier. – ZdaR Nov 18 '19 at 11:11
  • You could flip your training image mat horizontally to handle that requirement. – nfl-x Nov 20 '19 at 23:39
  • @nfl-x flipping images won't solve the problem of accuracy we need something better recent answer on tensorflow seems okay but there isn't sufficient information or tutorials available on it's implementation for android so our best guess is to keep up voting this post such that an expert can intervene and provide a proper solution for android – Mr. Patel Nov 22 '19 at 14:22

3 Answers3

5

Update

According to the new edit in the question, you need a way to identify new people on the fly whose photos might not have been available during the training phase of the model. These tasks are called few shot learning. This is similar to the requirements of the intelligence/police agencies to find their targets using CCTV camera footage. As usually there are not enough images of a specific target, during training, they use models such as FaceNet. I really suggest reading the paper, however, I explain a few of its highlights here:

  • Generally, the last layer of a classifier is a n*1 vector with n-1 of the elements almost equal to zero, and one close to 1. The element close to 1, determines the prediction of the classifier about the input's label. Typical CNN architecture
  • The authors figured out that if they train a classifier network with a specific loss function on a huge dataset of faces, you can use the semi-final layer output as a representation of any face, irrespective of it being in the training set or not, the authors call this vector Face Embedding.
  • The previous result means that with a very well trained FaceNet model, you can summarise any face into a vector. The very interesting attribute of this approach is that the vectors of a specific person's face in different angles/positions/states have are proximate in the euclidian space (this property is enforced by the loss function that the authors chose).enter image description here
  • In summary, you have a model that gets faces as input and returns vectors. The vectors close to each other are very likely to belong to the same person (For checking that you can use KNN or just simple euclidian distance).

One implementation of FaceNet can be found here. I suggest you try to run it on your computer to get to know what you are actually dealing with. After that, it might be best to do the following:

  1. Transform the FaceNet model mentioned in the repository to its tflite version (this blogpost might help)
  2. For each photo submitted by the user, use Face API to extract the face(s)
  3. Use the minified model in your app to get the face embeddings of the extracted face.
  4. Process all the images in the gallery of the user, getting the vectors for the faces in the photos.
  5. Then compare each vector found in step4 with each vector found in step3 to get the matches.

Original Answer

You came across one of the most prevalent challenges of machine learning: Overfitting. Face detection and recognition is a huge area of research on its own and almost all the reasonably accurate models are using some kind of deep learning. Note that even detecting a face accurately is not as easy as it seems, however, as you are doing it on android, you can use Face API for this task. (Other more advanced techniques such as MTCNN are too slow/difficult to deploy on a handset). It has been shown that just feeding the model with a face photo with a lot of background noise or multiple people inside does not work. So, you really cannot skip this step.

After getting a nice trimmed face of the candidate targets from the background, you need to overcome the challenge of recognising the detected faces. Again, all the competent models to the best of my knowledge, are using some sort of deep learning/convolutional neural networks. Using them on a mobile phone is a challenge, but thanks to Tensorflow Lite you can minify them and run them within your app. A project about face recognition on android phones that I had worked on is here that you can check. Keep in mind that any good model should be trained on numerous instances of labelled data, however there are a plethora of models already trained on large datasets of faces or other image recognition tasks, to tweak them and use their existing knowledge, we can employ transfer learning, for a quick start on object detection and transfer learning that is closely related to your case check this blog post.

Overall, you have to get numerous instances of the faces that you want to detect plus numerous face pics of people that you don't care about, then you need to train a model based on the above-mentioned resources, and then you need to use TensorFlow lite to decrease its size and embed it within your app. For each frame then, you call android Face API and feed (the probably detected face) into the model and identify the person.

Depending on your level of tolerance for delay and the number of training set size and number of targets, you can get various results, however, %90+ accuracy is easily achievable if you have only a few target people.

Community
  • 1
  • 1
Farzad Vertigo
  • 2,458
  • 1
  • 29
  • 32
  • I don't want to use network connection in my app so google cloud vision is out of question but tensor flow lite seems to be quite interesting is it free? and if you can provide a working example of it i'll appreciate it! Thanks – R.Coder Nov 18 '19 at 11:25
  • Great answer by the way! – R.Coder Nov 18 '19 at 11:26
  • It's free. Check [this](https://github.com/farzadz/Australian-Politician-Face-Recognition) for a working example. We were able to identify faces of 225 people on without using network connection with very high accuracy although there were some glitches in the user experience side. But that should be a good kick off. – Farzad Vertigo Nov 18 '19 at 11:30
  • Okay i'll give it a try – R.Coder Nov 18 '19 at 11:35
  • In your code example there is a tflite file in assests what's that? and in my case in need user to choose a image from storage and face recognisation is done on that image dynamically to rest of images in storage so a statically available model won't be useful in my case – R.Coder Nov 19 '19 at 08:40
  • If you can give a working code example correlating to my case i'll my answer as correct! tensorflow is new to me so i'm struggling – R.Coder Nov 19 '19 at 08:41
  • Update your question with this requirement of identifying faces with only one photo. I'll edit the answer and will mention how to reach this as well. – Farzad Vertigo Nov 20 '19 at 04:55
  • We are not limited to choosing only one photo we can make a user select 5 to 6 images of a person and then do face recognition on images of external storage – R.Coder Nov 20 '19 at 05:38
  • Personally i'm getting a bad feeling about Open CV after getting introduced to tensorflow however being new to it i don't know how to make things go as i want, it'll be really helpful of you if you can provide a working example in android as per my case and i'll surely mark your answer as correct if it helps! – R.Coder Nov 20 '19 at 05:47
  • Check the updated answer. You can see that even the theory behind what you need to achieve is also quite extensive, let alone the implementation. Check [this](https://www.coursera.org/lecture/convolutional-neural-networks/one-shot-learning-gjckG) video as well. – Farzad Vertigo Nov 20 '19 at 09:24
  • Follow, the links I put in the answer and take your time. Unfortunately, even a minimal example that does all the stuff that I mentioned is a full-blown project, and I am not aware of the existence of any opensource implementation. However, by following the links and tweaking them here and there, you can first achieve what you want on on a laptop and then transform it into the mobile version. – Farzad Vertigo Nov 20 '19 at 09:25
  • I have a doubt the model is claimed to about 95 mb so if i place that model in my app won't it increase my apk size? i'm looking for a low sized apk – R.Coder Nov 20 '19 at 10:40
  • When quantized by tensorflow lite, the size decreases (sometimes by a factor of 3). But this is a very complex task to achieve, so, you need to balance factors like size vs. accuracy. – Farzad Vertigo Nov 20 '19 at 11:11
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202770/discussion-between-r-coder-and-farzad-vertigo). – R.Coder Nov 21 '19 at 04:08
  • I tried Everything but still nothing help i'm stuck!!!! the blog post you gave is bad in step 2 it says to edit some files which are not present – R.Coder Nov 21 '19 at 10:44
  • 1
    It worked!!!! i eventually extracted that face net model tflite and got above 80% accuracy on a single trained image. but time complexity is really really huge!!,For comparing two images it takes minimum 5 to 6 seconds any idea on how to reduce that? – R.Coder Nov 28 '19 at 04:16
  • I will create another question with 100 rep of bounty again if you can solve and answer that!! – R.Coder Nov 28 '19 at 04:18
2

If I understand correctly, you're training the classifier with a single image. In that case, this one specific image is everything the classifier will be able to ever recognise. You would need a noticeably bigger training set of pictures showing the same person, something like 5 or 10 different images at the very least.

Florian Echtler
  • 2,148
  • 1
  • 15
  • 28
0

1) Change threshold value while initializing LBPHrecognizer to -> LBPHFaceRecognizer(1, 8, 8, 8, 100)

2) train each face with atleast 2-3 pictures since these recognizers mainly work on comparison

3) Set accuracy threshold while recognizing. Do something like this:

//predicting result
// LoadData is a static class that contains trained recognizer
// _result is the gray frame image captured by the camera
LBPHFaceRecognizer.PredictionResult ER = LoadData.recog.Predict(_result);
int temp_result = ER.Label;

imageBox1.SizeMode = PictureBoxSizeMode.StretchImage;
imageBox1.Image = _result.Mat;

//Displaying predicted result on screen
// LBPH returns -1 if face is recognized
if ((temp_result != -1) && (ER.Distance < 55)){  
     //I get best accuracy at 55, you should try different values to determine best results
     // Do something with detected image
}
dasunse
  • 2,839
  • 1
  • 14
  • 32
Riz
  • 1
  • 3