1

I have the following code as an example (this is based on pairwise similarity of textual definitions):

import pandas as pd

df = pd.read_csv("pairings.csv")

sample_list = df['fruit'].tolist()

And the output of sample_list looks like:

['Apple, Orange', 'Pear, Apple, Grape',
 'Plum, Orange, Pear, Banana, Grape, Apple'] 

Again I arbitrarily selected fruits as an example, my actual dataset finds the groupings between techniques based on cosine similarity and produces groupings of the techniques based on their definitions.

I have tried

for n in range(len(sample_list) + 1):
     list_combinations += list(combinations(sample_list,n))
     print(list_combinations)

and

for i in sample_test:
    res = [(a, b) for idx, a in enumerate(sample_test) for b in sample_test[idx + 1:]]

But they have not worked. My goal is to get a new csv that shows all the pairs of each list within the list so it would read as (the brackets would not be there this is just for further explanation):

0        1
0 Apple  Orange [from list 1]
1 Pear   Apple [from list 2]
2 Pear   Grape [from list 2]
3 Apple  Grape [from list 2]
4 Plum   Orange [from list 3]
5 Plum   Pear [from list 3]
6 etc. 

I need to know how to iterate through each list and get all possible pairs. Thanks!

  • 1
    You don't have a list of lists, you have a list of strings. You need to use `split()` to convert the string to a list. – Barmar May 04 '23 at 21:54

2 Answers2

1

You need to split the string into a list before calling combinations.

result = []
for s in sample_list:
    result.extend(combinations(s.split(', '), r=2))

print(result)

output:

[('Apple', 'Orange'),
 ('Pear', 'Apple'),
 ('Pear', 'Grape'),
 ('Apple', 'Grape'),
 ('Plum', 'Orange'),
 ('Plum', 'Pear'),
 ('Plum', 'Banana'),
 ('Plum', 'Grape'),
 ('Plum', 'Apple'),
 ('Orange', 'Pear'),
 ('Orange', 'Banana'),
 ('Orange', 'Grape'),
 ('Orange', 'Apple'),
 ('Pear', 'Banana'),
 ('Pear', 'Grape'),
 ('Pear', 'Apple'),
 ('Banana', 'Grape'),
 ('Banana', 'Apple'),
 ('Grape', 'Apple')]
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

I think of this as a two step problem. First, convert the stringy inner lists to lists (with [elem.strip() for elem in stringylist.split(',')]) below. Then, find all pairs of elements from that inner list.

We could write our own method to do that, or we could use itertools.combinations which does this for us. I choose to use itertools below.

from itertools import combinations

samplelist = [
    'Apple, Orange', 'Pear, Apple, Grape',
     'Plum, Orange, Pear, Banana, Grape, Apple'
]

def all_pairs(inlist):
    ret = []
    for liststr in inlist:
        innerlist = [elem.strip() for elem in liststr.split(',')]
        ret += list(combinations(innerlist, 2))
    return ret

all_pairs(samplelist)
""" prints
[('Apple', 'Orange'),
 ('Pear', 'Apple'),
 ('Pear', 'Grape'),
 ('Apple', 'Grape'),
 ('Plum', 'Orange'),
 ('Plum', 'Pear'),
 ('Plum', 'Banana'),
 ('Plum', 'Grape'),
 ('Plum', 'Apple'),
 ('Orange', 'Pear'),
 ('Orange', 'Banana'),
 ('Orange', 'Grape'),
 ('Orange', 'Apple'),
 ('Pear', 'Banana'),
 ('Pear', 'Grape'),
 ('Pear', 'Apple'),
 ('Banana', 'Grape'),
 ('Banana', 'Apple'),
 ('Grape', 'Apple')]
"""

This resolves your question of iterating through and finding all pairs. I suppose you can convert from a list of pairs to a CSV?

davidlowryduda
  • 2,404
  • 1
  • 25
  • 29