Applying language detector to every row in pandas

Question

I am trying to do what has been asked in this question. The problem I am having is that .apply() does not properly iterate over the rows. I have a dataframe which looks like this:

stuff, body
 12, "Je parle francais"
 25,  "This is english"

I have tried 3 things, running df['body'].apply(lambda row: (detect == "en")) which ended up returning false for all things, regardless of language (due to it outputting <function detect at random_bytes> into ever row). df['body'].apply(detect) and df['body'].apply(lambda row: detect(row)") which ended up returning.

LangDetectException: No features in text.

I cannot really afford running through every single row using a for loop due to the amount of data I have. So how would I find out what rows in the body column, are english and which are not, using the langdetect library.

FYI, `apply` is a fancy way to write a `for` loop, which generally runs more slowly. Either `apply` or `for` loop is going to be your best bet due to the way `langdetect` works. — Quang Hoang, Jul 11 '22 at 16:12
On another note, `LangDetectException: No features in text.` might be because you have empty string or NaN values in column `body`. — Quang Hoang, Jul 11 '22 at 16:18

Scott Boston · Accepted Answer · 2022-07-11T19:24:40.953

Try this:

import pandas as pd
from langdetect import detect, LangDetectException

df = pd.read_clipboard(sep=', ') #Create dataframe from clipboard
df.loc[3, :] = [30,'']  #Add blank text to dataframe

def f(x):
    try:
        result = detect(x)
    except LangDetectException as e:
        result = str(e)
    return result


df["lang"] = df["body"].apply(f)

Output:

   stuff                 body                  lang
0   12.0  "Je parle francais"                    fr
1   25.0    "This is english"                    en
3   30.0                       No features in text.

Applying language detector to every row in pandas

1 Answers1