0

I have a simple pandas data frame and list which is as fallows

import pandas as pd

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

mylist =['cat blue', 'sky green', 'dog black']

how to find the match between this dataframe and list. I got the result when the list is like

mylist_1 = ['cat','sky','dog']

But when i try to solve with mylist the dataframe is not matching. Here is the piece of code which i used.

import pandas as pd

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

print(frame)

mylist_1 =['cat', 'sky', 'dog']

import nltk
frame['Data'] = frame['a'].apply(lambda x :  ([i for i in nltk.word_tokenize(x) if i in mylist_1]))

print(frame)

But how to match with the my_list with the dataframe. Please help me on this issue

  • `But when i try to solve with mylist the dataframe is not matching.`: Can you [edit](https://stackoverflow.com/posts/50317114/edit) to show us *precisely* what you see? And also what you are expecting? – jpp May 13 '18 at 14:28

1 Answers1

1

IIUC, you don't need to use nltk.word_tokenize, you can just use split(' ') in a list comprehension, using the same structure as you were trying to use:

frame['data'] = (frame.a.apply(lambda x: [w for i in mylist
                                          for w in i.split(' ')
                                          if w in (x)]))
>>> frame
                  a          data
0   the cat is blue   [cat, blue]
1  the sky is green  [sky, green]
2  the dog is black  [dog, black]

The list comprehension: [w for i in mylist for w in i.split(' ')] flattens your list to ['cat', 'blue', 'sky', 'green', 'dog', 'black']

cs95
  • 379,657
  • 97
  • 704
  • 746
sacuL
  • 49,704
  • 8
  • 81
  • 106