How do I replace the similar looking values in a pandas dataframe?

Question

I am new to Pandas. I have the following data types in my dataset. (The dataset is Indian Startup Funding downloaded from Kaggle.)

Date                datetime64[ns]
StartupName                 object
IndustryVertical            object
CityLocation                object
InvestorsName               object
InvestmentType              object
AmountInUSD                 object
dtype: object

data['AmountInUSD'].groupby(data['CityLocation']).describe()

I did the above operation and found that many cities are similar for example,

Bangalore   
Bangalore / Palo Alto
Bangalore / SFO
Bangalore / San Mateo
Bangalore / USA
Bangalore/ Bangkok

I want to do following operation, but I do not know the code to this.

In column CityLocation, find all cells which starts with 'Bang' and replace them all with 'Bangalore'. Help will be appreciated.

I did this

data[data.CityLocation.str.startswith('Bang')]

and I do not know what to do after this.

please show what code you have written – aydow Jun 25 '18 at 23:01 — aydow, Jun 25 '18 at 23:01
data[data.CityLocation.str.startswith('Bang')] – Jun 25 '18 at 23:07 — , Jun 25 '18 at 23:07

John Karasinski · Answer 1 · 2018-06-25T23:15:05.590

2

You can use the loc function to find the values in your column whose substring matches and replace with them with the value of your choosing.

import pandas as pd

df = pd.DataFrame({'CityLocation': ['Bangalore', 'Dangerlore', 'Bangalore/USA'], 'Values': [1, 2, 3]})
print(df)
#     CityLocation  Values
# 0      Bangalore       1
# 1     Dangerlore       2
# 2  Bangalore/USA       3


df.loc[df.CityLocation.str.startswith('Bang'), 'CityLocation'] = 'Bangalore'
print(df)
#   CityLocation  Values
# 0    Bangalore       1
# 1   Dangerlore       2
# 2    Bangalore       3

edited Jun 25 '18 at 23:15

answered Jun 25 '18 at 23:05

John Karasinski

977
7
16

Hi John, thanks for the answer :) , but it is the string itself that I want to change. I do not want to change the Values corresponding to Bangalore and Dangerlore. – Jun 25 '18 at 23:10

score 1 · Accepted Answer · answered Jun 25 '18 at 23:10

1

pandas 0.23 has a nice way to handle text. See the docs Working with Text Data. You can use regular expressions to capture and replace text.

import pandas as pd
df = pd.DataFrame({'CityLocation': ["Bangalore / Palo Alto", "Bangalore / SFO", "Other"]})

df['CityLocation'] = df['CityLocation'].str.replace("^Bang.*", "Bangalore")

print(df)

Will yield

  CityLocation
0    Bangalore
1    Bangalore
2        Other

answered Jun 25 '18 at 23:10

ascripter

5,665
12
45
68

Thanks for the answer. It worked. I will do the same with other cities in the data. – Jun 25 '18 at 23:14

How do I replace the similar looking values in a pandas dataframe?

2 Answers2