0

I am new to python and I am trying to compute the condensed distance matrix of the elements from a dataframe column using pdist.

This is what the data looks like and I want to use the "Sequence" column :

In [90]: print(a_10)
        Sequence  Occurrences  Size
12     FJGKFLDKFJ         4185    10
13     FJGKFLEKFJ         4074    10
15     FJGEELKJFD         3392    10
16     AFLJSFLSKD         3240    10
22     EOAIJFFEOF         2652    10
...           ...          ...   ...
29963  ELFKAJLFKA            1    10
29975  VEOIAJSEIJ            1    10
29983  ELKSJFLSEK            1    10
29989  ESKJFSLEKF            1    10
30002  ECSKCJSOEC            1    10

[3369 rows x 3 columns]

First I reshape it:

v = a_10["Sequence"].to_numpy().reshape(-1,1)

And then I try to apply pdist:

matrix = pdist(v, "euclidean")

But I get the following error:

ValueError: could not convert string to float: 'FJGKFLDKFJ'

Does any one have a suggestion on how to overcome this? Thank you in advance.

mantunes
  • 25
  • 5
  • 1
    Why would you want to calculate euclidean distance on text data? – Alex Metsai Mar 12 '21 at 12:02
  • Hi Alex, I want to do some clustering on that column based on the similarity between each pair. From what I've read it is possible to calculate that similarity using the euclidean distance, but please correct me if I'm wrong as I am still new to this topic. I actually wanted to try the Levenshtein distance as well. – mantunes Mar 12 '21 at 12:16
  • The thing is that you can't calculate an euclidean distance in raw text. You have to preprocess your data and/or define a distance metric. – Alex Metsai Mar 12 '21 at 12:20
  • I see, I will look into that, thank you for your comment! – mantunes Mar 12 '21 at 12:30
  • you may want to look to Levenshtein distance – RomainL. Mar 12 '21 at 12:53
  • 'from Levenshtein import distance ... matrix = pdist(v,lambda x,y: distance(x[0],y[0])) ' I ended up trying Levenshtein like this based on another post. However it returns a 1D array and not the matrix, unlike what I was expecting. – mantunes Mar 12 '21 at 13:15

0 Answers0