2

I have a data frame with multiple categorical variables that I need to convert into dummy variables. Gender and region (4 types) are easy with pd.get_dummies. However, I have several variables that are yes/no after that. What can I do so that the dummy yes and no columns include the variable name? For example, the 'married' variable would turn into married_yes and married_no?

Here's my current code and a screenshot of first five lines:

genderdummy=pd.get_dummies(bank_df['gender'])
regiondummy=pd.get_dummies(bank_df['region'])
marrieddummy=pd.get_dummies(bank_df['married'])
cardummy=pd.get_dummies(bank_df['car'])
savingsdummy=pd.get_dummies(bank_df['savings_acct'])
currentdummy=pd.get_dummies(bank_df['current_acct'])
mortgagedummy=pd.get_dummies(bank_df['mortgage'])
pepdummy=pd.get_dummies(bank_df['pep'])
newdata_df=pd.concat([genderdummy,regiondummy,marrieddummy,cardummy,savingsdummy,currentdummy,mortgagedummy,pepdummy], axis=1)
newdata_df.head()

enter image description here

So based on suggestions, here's what I now have:

## HW Part 6:  Converting Categorical Variables and Exporting Data
genderdummy=pd.get_dummies(bank_df['gender'])
regiondummy=pd.get_dummies(bank_df['region'])
dummy_vars = [bank_df('married'), bank_df('car'),bank_df('savings_acct'),bank_df('current_acct'),bank_df('mortgage'),bank_df('pep')]
pd.get_dummies(bank_df[dummy_vars])
newdata_df=pd.concat([genderdummy,regiondummy,dummy_vars], axis=1)
newdata_df.head()

enter image description here

2 Answers2

4

If you change your approach, it will automatically do this. You just need to call pd.get_dummies on a dataframe rather than series:

import numpy as np
import pandas as pd

# Define sample data and columns for dummy variables
df = pd.DataFrame(np.random.choice(['yes', 'no'], size=(6, 3)), columns=['gender', 'region', 'married'])
dummy_vars = ['gender', 'married']

# Create dummy variables
pd.get_dummies(df[dummy_vars])

   gender_no  gender_yes  married_no  married_yes
0          0           1           1            0
1          1           0           0            1
2          0           1           1            0
3          1           0           1            0
4          1           0           1            0
5          0           1           1            0

Or you can be explicit using prefix parameter:

pd.get_dummies(df[dummy_vars], prefix=dummy_vars)

Update:

Using your variables, it should look like this:

genderdummy = pd.get_dummies(bank_df['gender'])
regiondummy = pd.get_dummies(bank_df['region'])
dummy_vars = ['married', 'car', 'savings_acct', 'current_acct', 'mortgage', 'pep']
other_dummies = pd.get_dummies(bank_df[dummy_vars])
newdata_df = pd.concat([genderdummy, regiondummy, other_dummies], axis=1)
newdata_df.head()

Notice dummy_vars is just the name of your columns in bank_df.

busybear
  • 10,194
  • 1
  • 25
  • 42
  • I'm sorry but I'm very new to python so I'm probably making a simple mistake. Here's what I tried on just two of these yes/no variables to see if it would work: dummy_vars = [bank_df('married'), bank_df('car')] pd.get_dummies(df[dummy_vars]) As you can probably figure out, bank_df is the name of the original df: TypeError Traceback (most recent call last) in () ----> 1 dummy_vars = [bank_df('married'), bank_df('car')] 2 pd.get_dummies(df[dummy_vars]) TypeError: 'DataFrame' object is not callable – immaprogrammingnoob Jan 30 '19 at 05:22
  • `dummy_vars` should just be the column names as in the example I provided. So try this instead: `dummy_vars = ['married', 'car']`. – busybear Jan 30 '19 at 05:27
  • I'm sorry, but I'm still getting an error. I thought I understood your code but I guess I don't. The first line of code after you call numpy and pandas creates a data frame for use in the example, right? If so, I tried to adapt your code to the dataframe I'm using: "bank_df." That's the reason I did: dummy_vars=[bank_df('married'), bank_df('car')] – immaprogrammingnoob Jan 30 '19 at 21:34
  • Yes `df` is just an example dataframe. It's just in place of your dataframe `bank_df`. `pd.get_dummies(bank_df[dummy_vars])` should work in your case. As long as `dummy_vars` is a list of column names. What error are you getting? You should edit your post to show this new error. – busybear Jan 30 '19 at 22:04
  • @immaprogrammingnoob See update on how to change your code. FYI `bank_df('car')` is not proper syntax, hence `TypeError: 'DataFrame' object is not callable`. – busybear Jan 31 '19 at 02:30
2

Use prefix parameter in pandas.get_dummies()

df = pd.DataFrame({'text':['cat', 'dog','cat','dog']})
df = pd.get_dummies(df['text'], prefix='text')
print(df)

Output

    text_cat    text_dog
0   1           0
1   0           1
2   1           0
3   0           1
Sociopath
  • 13,068
  • 19
  • 47
  • 75