How to replace similarly written values that make reference to the same (to give them the same value)?

Question

I have a column Cities inside a pandas DataFrame that has a lot of words written similarly but not exactly.

For example: "Example City", " Example City" and "Example City ".

This bothers me because when I look for the unique values inside the column it classifies this cities as different.

Please provide a clear and complete description of the operation you're trying to perform. — AMC, Mar 22 '20 at 18:44
Does this answer your question? [Pandas - Strip white space](https://stackoverflow.com/questions/43332057/pandas-strip-white-space) — AMC, Mar 22 '20 at 18:47

score 1 · Accepted Answer · answered Mar 22 '20 at 16:21

1

If the problem is just spaces at the end of the strings you can use strip, if you also have multiple spaces (e.g. Example City and Example City) you can use replace and regex:

df['Cities'] = df['Cities'].str.strip()
df['Cities'] = df['Cities'].str.replace(r'\s\s+', ' ')

answered Mar 22 '20 at 16:21

FBruzzesi

6,385
3
15
37

1

It worked for the spaces, which was the main source of the problem. Thanks. – mszsorondo Mar 22 '20 at 18:39
@AMC `strip()` removes white spaces both on the left and on the right, while `lstrip()` and `rstrip()` only do one side. – FBruzzesi Mar 22 '20 at 18:49

How to replace similarly written values that make reference to the same (to give them the same value)?

1 Answers1