1

I've had some confusion on this for some time now. When FaceNet is run on an image, it returns an 128 element array in Euclidean space/L2 (even this is something I do not completely get). I've had the thought that maybe this embedding can be used to predict faces with triplet loss.

My question is: can I compute something like triplet loss using this embedding? And if so, how is it done? Do I subtract an element corresponding to the other like:

    arr = [] #supposed new embedding after difference is calculated
    for i in range(0,128):
       j = b[i] - a[i]
       arr.append[j]

Is this how it's done, and can I perform facial identification with this?

Please kindly move this to the right forum if it's not appropriate here.

Jerome Ariola
  • 135
  • 1
  • 11
  • its going to be fairly simple right ? You set a threshold range for the distance (typically cosine distance) between the embedding and if it is with in the range of threshold then its same, and if its beyond threshold, they are far apart and different. – venkata krishnan Dec 16 '19 at 01:23
  • Yes but is it like per each element of all embeddings? How is the distance computed on a technical level? During training of a system using this architecture, the distances will seem almost arbitrary. Some embeddings, even though they're positive anchors, might be far right? So the weights are optimized to make it so the positives are closer? – Jerome Ariola Dec 16 '19 at 10:23
  • I would like you to attempt something, get all the embeddings for a sample set of images with similar and dissimilar faces (say a numpy array), compute the cross product i.e. one vs all cosine distance for the embeddings. Look at the outcome of it, may be that will give you better understanding. So far in my case, I compute the embedding and take cosine distance, which has given fairly accurate results. – venkata krishnan Dec 17 '19 at 01:58
  • 1
    I'll get to it! I'll report as soon I figure this out and post results in an answer. I'll ask if I'm confused about anything technical – Jerome Ariola Dec 17 '19 at 02:40
  • Wait, how do I go about doing this? Do I initialize the FaceNet embeddings into a variable then compute cosine loss with those vars? – Jerome Ariola Dec 17 '19 at 11:12
  • yes, you can store them in a numpy array, or even .npz files then go with it. – venkata krishnan Dec 18 '19 at 02:24

1 Answers1

0

I'm back and here's what I came up with. It's a bit confusing because there doesn't seem to be much difference. I used two pictures each of Trump and Obama. Computing the cosine similarity doesn't show anything significant. Maybe I'm doing something wrong?

import PIL
from PIL import Image   

import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import img_to_array

from sklearn.metrics.pairwise import cosine_similarity

#load model and compile
facenet = load_model('facenet_keras.h5', compile='False')
facenet.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])

def dist(a,b):
    #prepare image for FaceNet
    a,b = Image.open(a), Image.open(b)
    a,b = np.array(a), np.array(b)
    a,b = Image.fromarray(a), Image.fromarray(b)
    a,b = a.resize((160,160)), b.resize((160,160))
    a,b = img_to_array(a), img_to_array(b)
    a = a.reshape((1,a.shape[0], a.shape[1], a.shape[2]))
    b = b.reshape((1,b.shape[0], b.shape[1], b.shape[2]))

    #get FaceNet embedding vector
    a, b = facenet.predict(a), facenet.predict(b)

    #compute distance metric
    print((cosine_similarity(a, b)))


dist("obamaface2.jpg", "trumpface1.jpg") #images cropped to face

The output of comparing obamaface2 to trumpface1 is this: [[0.9417696]] while trumpface1 to trumpface2 was [[0.9754221]]

Jerome Ariola
  • 135
  • 1
  • 11