How would I change/remove 'non-printable' characters e.g Â from df.columns values incorporating the regex statements already in place

Question

Have tried the above with no success. Note ..This is specific to the text Column Headings and not the Column Values

df.columns = [x.lower().replace(" ","").replace("?","").replace("_","").replace( "Â" , "") for x in df.columns]

Would have replaced the non-printable character but has failed.

Can anyone help ?

Usually this means that you have [mojibake](https://en.wikipedia.org/wiki/Mojibake) or other corruption in your input, or are reading it incorrectly. A much better fix is to repair the upstream source so that the root cause gets addressed. — tripleee, Feb 15 '23 at 14:07
Consider this [answer](https://stackoverflow.com/a/32201665/3155240), which uses regex to replace text with regex. — Shmack, Feb 15 '23 at 19:21

Pawel Kam · Answer 1 · 2023-02-15T19:18:15.250

0

First of all, please remember that replace is case sensitive. Also, when chaining functions, the order is important.

"Â".lower().replace("Â", "") # "â"
"Â".replace("Â", "").lower() # ""

If the reason for the matter in question is a Mojibake encoding/decoding issue, you can try this quick fix with ftfy library. You can use it in conjunction with the rename function.

import ftfy

def _change_column_name(val):
    # fix mojibake
    val = ftfy.fix_text(val)
    # whatever data processing you need
    return val.replace("Â", "").lower()

df.rename(columns=_change_column_name, inplace=True)

@tripleee is right, though. Maybe instead of quick fix you'd want to fix encoding/decoding errors in your source data.

edited Feb 15 '23 at 19:18

answered Feb 14 '23 at 18:11

Pawel Kam

1,684
3
14
30

Unfortunately the hidden character is still present when I export to CSV. What could be the reasons for this ? – Peter R Feb 15 '23 at 13:47
@PeterR, I updated my answer. Does it solve your problem? – Pawel Kam Feb 15 '23 at 23:49

How would I change/remove 'non-printable' characters e.g Â from df.columns values incorporating the regex statements already in place

1 Answers1