0

Example i have 4 column in my dataframe, i want to use jaro similarity for col: A,B vs col: C,D containing strings

Currently i am using it between 2 columns using

df.apply(lambda x: textdistance.jaro(x[A], x[C]),axis = 1))

Currently i was comparing with names

|A|C |result| |--| --- | --- | |Kevin| kenny |0.67| |Danny |Danny|1| |Aiofa |Avril|0.75| I have records over 100K in my dataframe

COLUMN A -contains strings of person name

COLUMN B -contains strings of city

COLUMN C -contains strings of person name (to compare with)

COLUMN D -contains strings of city (to compare with)

Expected Output |A|B|C|D |result| |--|--|---| --- | --- | |Kevin|London| kenny|Leeds |0.4| |Danny |Dublin|Danny|dublin|1| |Aiofa|Madrid |Avril|Male|0.65|

Kevin D
  • 1
  • 1
  • Please provide a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Add the data sample as text, not as a picture. E.g. try `df.head().to_dict(orient='list')` and post in a block between triple backticks (```). Show both input *and* expected output. Also, show us what you have tried so far, and why your attempt isn't giving you the result that you expect. See: [Research Effort](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users). – ouroboros1 Aug 10 '22 at 22:04
  • It depends on the application, so for your purpose would it make sense to compare by concatenations strings in the column pairs? Meaning: `df.apply(lambda x: textdistance.jaro(x['A'] + x['B'], x['C'] + x['D']),axis = 1))` – DarrylG Aug 10 '22 at 22:22
  • Hi DarrylG, Thank you so much that worked well , thats what I was looking for. – Kevin D Aug 15 '22 at 10:47

1 Answers1

0

df.apply(lambda x: textdistance.jaro(x['A'] + x['B'], x['C'] + x['D']),axis = 1))

thank you DarrylG

Kevin D
  • 1
  • 1