1

I've asked on the forum with regards to this but this seemed niche enough to have its own question

I took the snippet with cosine distance online from here. The output doesn't seem right though...

Here's my code (NOTE: I changed from np.matmul to np.dot but there's still no difference. I'm also confused as to why I need to use transpose. It won't work without it....:

import PIL
from PIL import Image   

import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import img_to_array

from sklearn.metrics.pairwise import cosine_similarity

#load model and compile
facenet = load_model('facenet_keras.h5', compile='False')
facenet.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])

def findCosineDistance(a, b):
    x = np.dot(np.transpose(a),b)
    y = np.dot(np.transpose(a),a)
    z = np.dot(np.transpose(b),b)
    return (1 - (x / (np.sqrt(y) * np.sqrt(z))))

def dist(a,b):

    #prepare image for FaceNet
    a,b = Image.open(a), Image.open(b)

    a,b = np.array(a), np.array(b)
    a,b = Image.fromarray(a), Image.fromarray(b)
    a,b = a.resize((160,160)), b.resize((160,160))
    a,b = img_to_array(a), img_to_array(b)
    a = a.reshape((1,a.shape[0], a.shape[1], a.shape[2]))
    b = b.reshape((1,b.shape[0], b.shape[1], b.shape[2]))


    #get FaceNet embedding vector
    a, b = facenet.predict(a), facenet.predict(b)

    #compute distance metric
    output = findCosineDistance(a,b)
    #print(output)
    #print((cosine_similarity(a, b)))
    print(output)

Output:

c:/Users/Jerome Ariola/Desktop:     RuntimeWarning: invalid value encountered in sqrt
  return (1 - (x / (np.sqrt(y) * np.sqrt(z))))
[[ 0.         -0.3677783  -0.1329441  ...  0.2845478  -0.33033693
          nan]
 [ 0.26888728  0.          0.17169017 ...  0.47692382  0.02737373
          nan]
 [ 0.1173439  -0.2072779   0.         ...  0.36850178 -0.17422998
          nan]
 ...
 [-0.39771736 -0.9117675  -0.58353555 ...  0.         -0.85943496
          nan]
 [ 0.24831063 -0.02814436  0.14837813 ...  0.4622023   0.
          nan]
 [        nan         nan         nan ...         nan         nan
   0.        ]]
Jerome Ariola
  • 135
  • 1
  • 11

2 Answers2

1

It seems FaceNet's predict() method is returning face embeddings containing NaN values. Clipping NaN values before computing cosine similarity might help. Use below line of code for the same:

a, b = np.clip(a, -1000, 1000), np.clip(b, -1000, 1000)

Note: Choose appropriate threshold for clipping with above method from the range of values of a & b.

Balraj Ashwath
  • 1,407
  • 2
  • 13
  • 19
0

Currently working on a solution so I'll try to update this: the error stems from the negative values within the output of facenet.predict(). Prior to computing the cosine distance, the model runs on an image and within the array are negative values. The formula for the cosine distance involves np.sqrt(). I tried the following:

>>> import numpy as np
>>> a = [-1,0,1,2,3,4] # simple array with a negative number
>>> a = np.sqrt(a)
__main__:1: RuntimeWarning: invalid value encountered in sqrt
>>> a
array([       nan, 0.        , 1.        , 1.41421356, 1.73205081,
       2.        ])

...and as we can see, there is NaN value in the array.

TL;DR would it matter if I apply np.abs or would that change the "meaning" of the entire embedding? Using the absolute value of the value will allow the distance to be computed but I'm not entirely sure if that's, for the lack of a better term, okay

Jerome Ariola
  • 135
  • 1
  • 11
  • Yes, adding `np.abs` would definitely change the meaning of the embeddings. But I'm wondering if we need to do that because the inner product of an embedding with itself would always result in a value >=0 – Balraj Ashwath Jan 01 '20 at 01:39
  • For the sake of testing I used the `np.abs` on the embeddings. And even still there's an issue; when computing the distances shouldn't I get ONE number and not an array? Output: ```python [[ 0. -0.36777842 -0.13294446 ... 0.2845481 -0.33033693 [ 0.26888746 0. 0.1716901 ... 0.47692412 0.02737391 [ 0.11734432 -0.20727754 0. ... 0.36850214 -0.17422962 2.045044 ] ... [-0.39771795 -0.91176844 -0.583537 ... 0. -0.8594358 2.6548653 ] [ 0.2483108 -0.02814436 0.14837784 ... 0.46220255 0. 1.8899825 ] 0. ]] ``` – Jerome Ariola Jan 01 '20 at 02:11
  • 1
    A crucial part I missed is the summing of values when computing the distance. I guess I'll have to find a way to add them all up – Jerome Ariola Jan 01 '20 at 16:27