0

I am facing an error when I try to match every distinct pair of values in a dataframe column. every value was written in this way:

['http://dbpedia.org/resource/Category:American_books,http://dbpedia.org/resource/Category:American_literature_by_medium,http://dbpedia.org/resource/Category:Autobiographies,http://dbpedia.org/resource/Category:Bertelsmann_subsidiaries'] 
i = 0
j = 0

for i in range(len(book_dc.dc_term)):
    values_i = set(book_dc['dc_term'][i].split(','))
    for j in range(i+1, len(book_dc.dc_term)):
        values_j = set(book_dc['dc_term'][j].split(','))
        num_matching = len(values_i.intersection(values_j))
        print("i:", i, "j:", j, "num_matching:", num_matching)
        print('\n')

I should have the matching number of values between every 2 values(cells). i am getting this error:

KeyError                                  Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3360             try: 3361                 return self._engine.get_loc(casted_key) 3362             except KeyError as err:

5 frames pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3361                 return self._engine.get_loc(casted_key) 3362             except KeyError as err: 3363                 raise KeyError(key) from err 3364  3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 1
  • Please paste in the traceback so it's readable. Put it in a code block rather than quotation, so it won't be reformated. – Barmar Jan 06 '23 at 18:17
  • 2
    Get out of the habit of using `for index in range(len(list)):`. Use `for item in list:` or `for index, item in enumerate(list):` – Barmar Jan 06 '23 at 18:19
  • 1
    Put the solution in an Answer below, not a comment. – Barmar Jan 06 '23 at 18:53

1 Answers1

0

Solved.

  for i, item_i in enumerate(book_dc.dc_term):    
           values_i = set(item_i.split(','))    
           for j, item_j in enumerate(book_dc.dc_term[i+1:]):       
                values_j = set(item_j.split(','))        
                num_matching = len(values_i.intersection(values_j))      
                print("i:", i, "j:", j+i+1, "num_matching:", num_matching)         
                print('\n')
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 07 '23 at 18:56