Python pandas: Remove emojis from DataFrame

Question

I have a dataframe which contains a lot of different emojis and I want to remove them. I looked at answers to similar questions but they didn't work for me.

index| messages
----------------
1    |Hello!  
2    |Good Morning   
3    |How are you ?
4    | Good 
5    | Ländern

Now I want to remove all these emojis from the DataFrame so it looks like this

    index| messages
    ----------------
    1    |Hello!
    2    |Good Morning   
    3    |How are you ?
    4    | Good 
    5    |Ländern

I tried the solution here but unfortunately it also removes all non-English letters like "ä" How can I remove emojis from a dataframe?

Please paste the dataframe as-is from the output/console. Don't format it. — Vishnudev Krishnadas, Dec 02 '20 at 13:31
https://stackoverflow.com/questions/51217909/removing-all-emojis-from-text#52571541 — Hielke Walinga, Dec 02 '20 at 13:36
@Vishnudev Unfortiantly I am not allowed to share the acual dataframe, that's why I formated it — Sam, Dec 02 '20 at 13:44
Oh I see, thank you. (it's my first time posting a question so I didn't know how to write it) — Sam, Dec 02 '20 at 13:51
@Moe: Hi Moe, please put the "ä" character in your data example. — Ruthger Righart, Dec 02 '20 at 13:56
Does this answer your question? [Removing all Emojis from Text](https://stackoverflow.com/questions/51217909/removing-all-emojis-from-text) — flyingdutchman, Dec 02 '20 at 14:00
@RuthgerRighart I had it in the last line as one of the things why the solution in the given link didn't work but I added it now, thank you for the remark. — Sam, Dec 02 '20 at 14:01
I found an existing answer here which might help you out: https://stackoverflow.com/a/57514515/14718928 — Nicole, Dec 02 '20 at 14:10

xjcl · Accepted Answer · 2020-12-02T15:12:31.797

3

This solution that will keep all ASCII and latin-1 characters, i.e. characters between U+0000 and U+00FF in this list. For extended Latin plus Greek, use < 1024:

df = pd.DataFrame({'messages': ['Länder ❤️', 'Hello! ']})

filter_char = lambda c: ord(c) < 256
df['messages'] = df['messages'].apply(lambda s: ''.join(filter(filter_char, s)))

Result:

  messages
0  Länder 
1  Hello!

Note this does not work for Japanese text for example. Another problem is that the heart "emoji" is actually a Dingbat so I can't simply filter for the Basic Multilingual Plane of Unicode, oh well.

edited Dec 02 '20 at 15:12

answered Dec 02 '20 at 14:24

xjcl

12,848
6
67
89

1

Works well, extra credits for the flag ;-) – Ruthger Righart Dec 02 '20 at 14:35
Thank you a lot this worked for me (Vielen Dank) – Sam Dec 02 '20 at 14:52
If this doesn't work for some cases, you can also try `filter(lambda c: c.isalpha(), s)` -- that should handle Japanese for example. But it does filter `!` -- oh well. – xjcl Dec 02 '20 at 15:00
We are not suppose to assign lambda expression to variable. `df['messages'] = df['messages'].apply(lambda s: ''.join(filter(lambda c: ord(c) < 256, s)))` will be correct one. – Rahul Kumeriya Sep 08 '21 at 04:06

score 1 · Answer 2 · answered Dec 02 '20 at 14:22

I think the following is answering your question. I added some other characters for verification.

import pandas as pd
df = pd.DataFrame({'messages':['Hello! ', 'Good-Morning ', 'How are you ?', ' Goodé ', 'Ländern' ]})

df['messages'].astype(str).apply(lambda x: x.encode('latin-1', 'ignore').decode('latin-1'))

score 1 · Answer 3 · answered Feb 28 '23 at 12:07

1

You can use emoji package:

import emoji

df = ...
df['messages'] = df['messages'].apply(lambda s: emoji.replace_emoji(s, ''))

answered Feb 28 '23 at 12:07

Guru Stron

102,774
10
95
132

Python pandas: Remove emojis from DataFrame

3 Answers3

Linked