I have a dataframe of Songs, its singers and lyrics. There are songs which lyrics are not in English language, but letters are latin letters. Is there any way to separate English words (which have meanings in English) from non English words (which are written by latin letters, but have no English meaning). Any Python library or sort of code? My main goal is to do sentiment analysis by the lyrics.
Asked
Active
Viewed 967 times
1 Answers
2
There is a library in python for doing so which is called langdetect.
Here is an example of using it:
>>> from langdetect import detect
>>> detect("War doesn't show who's right, just who's left.")
'en'
>>> detect("Ein, zwei, drei, vier")
'de'

Fatemeh Rahimi
- 458
- 5
- 20
-
Thank you so much Fatemeh Rahimi :) Your answer can help me a lot. – Diana Mart Jun 20 '20 at 21:16
-
But Fatemeh, I need your help one more time. How can I select rows from my dataframe, which Lyrics detect is 'en'? rslt_df = df[df['lyrics'].detect = "en"] something like that – Diana Mart Jun 20 '20 at 21:41
-
Your welcome. You can use the [apply function from pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html). to apply this function to all the rows if your dataframe. And then you can put the output into another column. [Example](https://stackoverflow.com/questions/33518124/how-to-apply-a-function-on-every-row-on-a-dataframe) – Fatemeh Rahimi Jun 22 '20 at 13:56