map multiple columns by a single dictionary in pandas

Question

I have a DataFrame with a multiple columns with 'yes' and 'no' strings. I want all of them to convert to a boolian dtype. To map one column, I would use

dict_map_yn_bool={'yes':True, 'no':False}
df['nearby_subway_station'].map(dict_map_yn_bool)

This would do the job for the one column. how can I replace multiple columns with single line of code?

Pandas equivalent to a `for` loop is `apply`: `df = df.apply(lambda x: x.map(dict_map_yn_bool))`. — jpp, Jul 15 '19 at 09:53

score 15 · Answer 1 · edited May 23 '17 at 11:47

15

You can use applymap:

df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
print (df)
  Station nearby_subway_station
0      no                   yes
1     yes                    no

dict_map_yn_bool={'yes':True, 'no':False}

df = df.applymap(dict_map_yn_bool.get)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False

Another solution:

for x in df:
    df[x] = df[x].map(dict_map_yn_bool)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False

Thanks Jon Clements for very nice idea - using replace:

df = df.replace({'yes': True, 'no': False})
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False

Some differences if data are no in dict:

df = pd.DataFrame({'nearby_subway_station':['yes','no','a'], 'Station':['no','yes','no']})
print (df)
  Station nearby_subway_station
0      no                   yes
1     yes                    no
2      no                     a

applymap create None for boolean, strings, for numeric NaN.

df = df.applymap(dict_map_yn_bool.get)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                  None

map create NaN:

for x in df:
    df[x] = df[x].map(dict_map_yn_bool)

print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                   NaN

replace dont create NaN or None, but original data are untouched:

df = df.replace(dict_map_yn_bool)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                     a

edited May 23 '17 at 11:47

Community

1
1

answered May 01 '17 at 20:12

jezrael

822,522
95
1,334
1,252

2

Why not just `df.replace({'yes': True, 'no': False})` ? – Jon Clements May 01 '17 at 20:16
@JonClements why not just add that as a perfectly good answer so we can upvote it because it's a good one? That said, I'd say having a goto method for using `map` across multiple columns is useful. Even you could use `replace` in this context. – piRSquared May 01 '17 at 20:17
@piRSquared well - to me it seems obvious - but I'm well aware I'm still nowhere near the level of others when it comes to pandas... so it's a question and I'm happy to have the suggestion added to the answer so we've got a good reference for anyone else. I just don't know if it has any pros/cons that I can't see etc... so wasn't willing to add as an answer is all. – Jon Clements May 01 '17 at 20:20
@JonClements Sometimes we get lost in the question, we forget all the other ways to accomplish the same thing. – piRSquared May 01 '17 at 20:21
@piRSquared indeed. We've got a good answer now anyway covering all aspects. If jezreal wanted to put in some details about the difference between applymap, map and replace (hint hint)... we've between us got a fantastic answer for others in future. – Jon Clements May 01 '17 at 20:23
You can also supply the `value` argument to `df.replace` to make it work the same way as `.map` if required – Jon Clements May 01 '17 at 20:25
@JonClements - `value`? Do you think `df = df.replace(dict_map_yn_bool, value=np.nan)`? But it doesnt work. Or something missing? – jezrael May 01 '17 at 20:30
@piRSquared see - that why's I only comment :p – Jon Clements May 01 '17 at 20:34
@JonClements - no problem ;) – jezrael May 01 '17 at 20:35

piRSquared · Accepted Answer · 2017-05-01T20:26:21.207

10

You could use a stack/unstack idiom

df.stack().map(dict_map_yn_bool).unstack()

Using @jezrael's setup

df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
dict_map_yn_bool={'yes':True, 'no':False}

Then

df.stack().map(dict_map_yn_bool).unstack()

  Station nearby_subway_station
0   False                  True
1    True                 False

timing
small data

bigger data

edited May 01 '17 at 20:26

answered May 01 '17 at 20:13

piRSquared

285,575
57
475
624

2

Yes, I think down vote is not good idea. This is nice solution also. – jezrael May 01 '17 at 20:30
Thanks @jezrael. Also, this is a general idiom that can be applied to many scenarios in which you want to use a `pd.Series` specific method on many columns. – piRSquared May 01 '17 at 20:32
I think converting to `Series` and then back is not good idea - reshaping is very slow. So in my opinion better is avoid it. – jezrael May 02 '17 at 15:29
@jezrael if the size of the dataframe is small and performance isn't the focus, then this can be a simple and intuitive solution. – piRSquared May 02 '17 at 15:30
I think `replace` is here number one, then `applymap` and last `stack/unstack`. Btw, what about performance of `for x in df: df[x] = df[x].map(dict_map_yn_bool)` ? – jezrael May 02 '17 at 15:31
In my opinion it can be faster as `stack/unstack` solution. – jezrael May 02 '17 at 15:32
You are correct! However, *if* performance isn't the concern then it is the preference of OP as to which style they choose. – piRSquared May 02 '17 at 15:36
1

with map and applymap, the value gets replaced to NaN / None. With replace, if its not found in dict, then it stays as is. Thats one more thing to consider. – Joe Ferndz Jan 08 '21 at 00:11

unsupervised_learner · Answer 3 · 2017-05-01T21:14:29.147

I would work with pandas.DataFrame.replace as I think it is the simplest and has built-in arguments to support this task. Also a one-liner solution, as requested.

First case, replace all instances of 'yes' or 'no':

import pandas as pd
import numpy as np
from numpy import random

# Generating the data, 20 rows by 5 columns.
data = random.choice(['yes','no'], size=(20, 5), replace=True)
col_names = ['col_{}'.format(a) for a in range(1,6)]
df = pd.DataFrame(data, columns=col_names)

# Supplying lists of values to what they will replace. No dict needed.
df_bool = df.replace(to_replace=['yes','no'], value=[True, False])

Second case, where you only want to replace in a subset of columns, as described in the documentation for DataFrame.replace. Use a nested dictionary where the first set of keys are columns with values to replace, and values are dictionaries mapping values to their replacements:

dict_map_yn_bool={'yes':True, 'no':False}
replace_dict = {'col_1':dict_map_yn_bool, 
           'col_2':dict_map_yn_bool}
df_bool = df.replace(to_replace=replace_dict)

map multiple columns by a single dictionary in pandas

3 Answers3

Linked

Related