I am trying to do what has been asked in this question. The problem I am having is that .apply() does not properly iterate over the rows. I have a dataframe which looks like this:
stuff, body
12, "Je parle francais"
25, "This is english"
I have tried 3 things, running df['body'].apply(lambda row: (detect == "en"))
which ended up returning false for all things, regardless of language (due to it outputting <function detect at random_bytes>
into ever row). df['body'].apply(detect)
and df['body'].apply(lambda row: detect(row)")
which ended up returning.
LangDetectException: No features in text.
I cannot really afford running through every single row using a for loop due to the amount of data I have. So how would I find out what rows in the body column, are english and which are not, using the langdetect
library.