-2

I am a beginner in Python and tried hard to find an answer here before I ask this question. I have different designs that have a couple of photos, and I want to compare their hamming distances. But I don't wanna compare the images of same design file which are located in the same folder. I make the comparison based on a library called Imagehash. After comparing the different combination of images, I want to keep the ones with the highest hamming distance score. Let me explain what I want with a simple example:

In folder table there are three images: table_1.jpg, table_2.jpg, table_3.jpg In folder chair two images: chair_1.jpg, chair_2.jpg

What I want to get is the file path of the files(which I can do) to, later on, use Image.open() and imagehash.phash functions. Combinations should look like this:

(table_1.jpg, chair_1.jpg), (table_1.jpg, chair_2.jpg), (table_2.jpg, chair_1.jpg ), (table_2.jpg, chair_2.jpg), (table_3.jpg, chair_1.jpg), (table_3.jpg, chair_2.jpg)

Then I have to split after "_", and use groupby and itemgetter, I guess

1 Answers1

0

You need itertools.product to calculate the tuples you want :

from itertools import product

table = ['table_1.jpg', 'table_2.jpg', 'table_3.jpg']
chair = ['chair_1.jpg', 'chair_2.jpg']

print(list(product(table, chair)))
# [('table_1.jpg', 'chair_1.jpg'), ('table_1.jpg', 'chair_2.jpg'), ('table_2.jpg', 'chair_1.jpg'), ('table_2.jpg', 'chair_2.jpg'), ('table_3.jpg', 'chair_1.jpg'), ('table_3.jpg', 'chair_2.jpg')]

If the fillenames are all in the same list, you can use combinations and check that the elements don't have the same beginning :

from itertools import combinations
filenames = ['table_1.jpg', 'table_2.jpg', 'table_3.jpg', 'chair_1.jpg', 'chair_2.jpg']

print [(x,y) for x,y in combinations(filenames, 2) if x.split('_')[0] != y.split('_')[0]]
# [('table_1.jpg', 'chair_1.jpg'), ('table_1.jpg', 'chair_2.jpg'), ('table_2.jpg', 'chair_1.jpg'), ('table_2.jpg', 'chair_2.jpg'), ('table_3.jpg', 'chair_1.jpg'), ('table_3.jpg', 'chair_2.jpg')]
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • Yes but my question is how I could prevent taking combinations of the files in the same folder – Kenan Dalay Mar 27 '17 at 11:07
  • `product` doesn't mix elements from the same list. – Eric Duminil Mar 27 '17 at 11:08
  • Just one more question: combined = ((x, y, (64 - (x - y))/64) for x, y in combinations(df['hash_1'], 2) if x != y) series = Series(list(g) for k, g in groupby(combined, key=itemgetter(0))) Actually I calculate phash first then use it ina function rather than file names, how I could combine the image name split with hash – Kenan Dalay Mar 27 '17 at 11:42
  • @KenanDalay: This looks like a separate question. Comments aren't convenient for code – Eric Duminil Mar 27 '17 at 11:51
  • I would have asked if I didn't get -3 :) But thank you very much in any case – Kenan Dalay Mar 27 '17 at 14:04