1

Say I have a dataframe which contains the locations of places.

df1 = pd.DataFrame({'col1': [1,2,3,4,5], 'location': ['Hackney', 'Mile End', 'Croydon', 'Edgbaston', 'Wembley'] })

Then I have a list of these places and what the main city they are contained in stored in a dictionary

dict ={
['Hackney', 'Mile End', 'Croydon', 'Wembley'] : 'London',
['Edgbaston'] : 'Birmingham'
}

Question: How could I create a new column (say df1['city']) which uses this dictionary to populate which city each of the location column entries is in. Note: If creating a dictionary to do this isnt the ideal way feel free to suggest an alternative.

Ideal Output: Would like something as shown below (this should generalise for more entries providing the dictionary is extended if need be).

df1 = pd.DataFrame({'col1': [1,2,3,4,5], 'location': ['Hackney', 'Mile End', 'Croydon', 'Edgbaston', 'Wembley'], 'city': ['London','London','London','Birmingham','London'] })

Tried: Using the apply method but seems to give an error

df1['city'] = df1['location'].apply(dict)
Curious
  • 325
  • 1
  • 10

2 Answers2

1

Your dictionary is not valid, you can use list for values of dictionary, also not call dictioanry like dict, because python code name, builtins:

d = { 'London': ['Hackney', 'Mile End', 'Croydon'],
     'Birmingham': ['Edgbaston']}

Here is possible flatten values in lists and then use Series.map, if not exist value is returned missing value:

d1 = {x: k for k, v in d.items() for x in v}
print (d1)
{'Hackney': 'London', 'Mile End': 'London', 'Croydon': 'London', 'Edgbaston': 'Birmingham'}

df1['city'] = df1['location'].map(d1)
print (df1)
   col1   location        city
0     1    Hackney      London
1     2   Mile End      London
2     3    Croydon      London
3     4  Edgbaston  Birmingham
4     5    Wembley         NaN

If dictionary format is tuples in keys:

d ={('Hackney', 'Mile End', 'Croydon') : 'London', ('Edgbaston', ) : 'Birmingham'}


d1 = {x: v for k, v in d.items() for x in k}
print (d1)
{'Hackney': 'London', 'Mile End': 'London', 'Croydon': 'London', 'Edgbaston': 'Birmingham'}

df1['city'] = df1['location'].map(d1)
print (df1)
   col1   location        city
0     1    Hackney      London
1     2   Mile End      London
2     3    Croydon      London
3     4  Edgbaston  Birmingham
4     5    Wembley         NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I think it is "allowed" to call a dictionary as `dict` - but I agree, it is probably not a good idea – Mortz Apr 19 '21 at 10:00
  • Thanks but could you explain what you mean when you say my "because python code name, bulletins? – Curious Apr 19 '21 at 10:01
  • 1
    @nishcs - Sure, because `dict` is dictionary `bulletins`. So for new dict is possible use `d = dict()` But if use `dict={'key': 'val'}` then ift is overwritten and possible weird callable errors (really hard catch), so best never do it. Similar like `list` for list. – jezrael Apr 19 '21 at 10:03
  • 1
    @nishcs - It is like `type` - check [this](https://stackoverflow.com/questions/10568087/is-it-safe-to-use-the-python-word-type-in-my-code) – jezrael Apr 19 '21 at 10:07
1

You cannot have a python dict with mutable keys - which means you probably need a tuple instead of a list

dict ={
('Hackney', 'Mile End', 'Croydon') : 'London',
('Edgbaston', ) : 'Birmingham'
}

Once you have this - you can use the map function to map a location to a city. If your dict did not have tuples for keys, you could have used it directly, but in this case - you can define a function -

def get_city(location):
    for key in dict.keys():
        if location in key:
            return dict[key]

df1['location'].map(get_city)
#0        London
#1        London
#2        London
#3    Birmingham
#4          None
Mortz
  • 4,654
  • 1
  • 19
  • 35