Python FastText: How to create a corpus from a Dataframe Column

Question

I need to create a corpus for my Email Classifer . Right now am Using fasttext 0.8.3 but it expects text file as a input whereas i need to pass a dataframe as an input .

It shows error while i am Using following Code :-

```

import fasttext

x_val = df['Message']  
y_val = df['Categories']  
model = fasttext.skipgram(x_val, y_val)  
print model.words

TypeError:
<ipython-input-105-58241a9688b5> 
 <module>() 
----> 1 model = fasttext.skipgram(x_val, y_val) 
      2 print model.words # list of words in dictionary 
      fasttext/fasttext.pyx in fasttext.fasttext.skipgram (fasttext/fasttext.cpp:6451)() 
      fasttext/fasttext.pyx in fasttext.fasttext.train_wrapper (fasttext/fasttext.cpp:5223)() 
     /root/anaconda2/lib/python2.7/genericpath.pyc in isfile(path) 
           35 """Test whether a path is a regular file""" 
           36 try: 
      ---> 37 st = os.stat(path) 
           38 except os.error: 
           39 return False 
     TypeError: coercing to Unicode: need string or buffer, Series found

```

In above code the df['Message'] and df['Categories'] are the dataframe column in which it contains mails and the category respectively .
There are 30123 mails in the dataframe .
I already go through the Documentation of fasttext but i dont find someting useful.

Fasttext Tutorial refrence

Thanks for the Help.

Since `x_val` is a series (when you slice a dataframe you get a series) you can use `x_val.to_string()`, to convert it to a string or buffer. Documentation [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_string.html). — gionni, Jun 27 '17 at 13:40
ValueError : : (, UnicodeEncodeError('ascii', u'fastText: cannot load — kshitij chaurasiya, Jun 28 '17 at 06:07

Python FastText: How to create a corpus from a Dataframe Column

0 Answers0