4

I know that

test = []
for item in my_texts:
    test.append(item.encode('ascii', 'ignore').decode('ascii'))

removes emojis from a list. But how can I remove emojis from a dataframe? When I try

a = []
for item in goldtest['Text']:
    a.append(item.encode('ascii', 'ignore').decode('ascii'))

I get only the last entry of goldtest. When I try the code on the whole dataframe, I get ''AttributeError: 'DataFrame' object has no attribute 'encode'''

maybeyourneighour
  • 494
  • 2
  • 4
  • 13
  • a DataFrame is not a string. So ask yourself what it is that you are actually calling `encode`, as your error suggests its a DataFrame – William Bright Aug 15 '19 at 18:22
  • And this pattern will not only remove "emoji"s, but all accented characters, non latin letters, and punctuation signs beside a few of the more common ones - effectively corrupting any text data you have. – jsbueno Aug 15 '19 at 18:45

3 Answers3

10

This would be the equivalent code for pandas. It operates column by column.

df.astype(str).apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))
ivallesp
  • 2,018
  • 1
  • 14
  • 21
  • With my data this gives me an AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index Index'). Meaning one value in Index is a comma? – maybeyourneighour Aug 15 '19 at 18:48
  • 1
    Try now, I edited it to coerce all the values to str – ivallesp Aug 15 '19 at 18:55
  • Thanks for your vote. Next time it will be much more useful if you can add a simple reproducible example :D – ivallesp Aug 15 '19 at 19:03
  • 1
    Works good, but unfortunately also delates all polish special characters like ą, ę, ń, ó, ż... Do you know how to overcome it? – AAAA Feb 08 '22 at 09:41
  • Tried with a DF with multiple columns specifying desired column. Threw the error ```AttributeError: 'str' object has no attribute 'str' ```. Tried to recreate setting described here with DF with 1 column. Still same error message. Text is in German. Instead of investigating further used [this solution](https://stackoverflow.com/questions/65109065/python-pandas-remove-emojis-from-dataframe/65109987#65109987) from xjcl. Worked without tweaking – Simone Jul 15 '23 at 12:50
2

You can use emoji package:

import emoji
df = pd.DataFrame(data={'str_data':['يااا واجعوط هذا راه باغي يبدع فالسانكيام‍♀️']})
df['str_data'] = df['str_data'].apply(lambda s: emoji.replace_emoji(s, ''))
df

Output:

str_data
يااا واجعوط هذا راه باغي يبدع فالسانكيام
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
0

This will remove all special characters including emojis except letters and numbers from a given Column

goldtest['Text'] = goldtest['Text'].str.replace('[^A-Za-z0-9]', '', flags=re.UNICODE)
Skynet
  • 35
  • 6
  • It works. Just a quick note - it will also replace space with '', thus merging text. Make sure to change '' to ' ' so you still have spaces between words – Ernest May 11 '23 at 07:34