1

This is a follow up this this question

Removing Non ASCII characters and replacing with spaces from Pandas data frame

Which tells how to remove non ASCII characters from Pandas columns

 df['DB_user'] = df["DB_user"].apply(lambda x: ''.join([" " if ord(i) < 32 or ord(i) > 126 else i for i in x]))

From the UTF-8 wikipedia, UTF-8 is

The first 128 characters of Unicode

https://en.wikipedia.org/wiki/UTF-8

So my guess is that the solution would be

 df['DB_user'] = df["DB_user"].apply(lambda x: ''.join([" " if ord(i) > 127 else i for i in x]))
SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
  • 1
    Try using the str methods instead: `df["DB_user"].str.encode('utf-8', 'ignore').str.decode('utf-8')`. – cs95 Jun 24 '19 at 22:39
  • 1
    Also, you're probably misunderstanding what the wikipedia article says. The reason for mentioning the first 128 characters of unicode is to make the point that "valid ASCII text is valid UTF-8-encoded Unicode as well.". Utf-8 supports much much more than 128 characters. – cs95 Jun 24 '19 at 22:42

0 Answers0