Create dummy coded columns for a column and concatenate it to the dataset

Question

I am working with a dataset on cell phone churn rates. I am attempting to create a dummy code for a column of state abbreviations in a dataset with a shape of 3333 rows × 20 columns. I need to leave out one of the state dummy coded columns to serve as the "reference" column for use in modeling. What I think should happen is a column should be created for each row, and a 1 put in place in the row that corresponds to the newly created dummy column. I am currently getting 0s in every row except the first row which is populated with all 1s. I need to somehow get the dummy variables to include a marker the the appropriate column for each row. I also think I should combine down the columns to only be unique columns (in this case one for each state), but I am not sure if that will mess with the point of dummy coding?

I currently have the following code:

1. Creating dummy variables for 'state' and excluding the first dummy column:

churn_dummies = pd.get_dummies(churn, columns='state', prefix='st').iloc[:,20:]

This returns a dataframe that is 3333x3332.
A screenshot of the churn_dummies dataframe can be found here.

st_OH   st_NJ   st_OH   st_OK   st_AL   st_MA   st_MO   st_LA   st_WV   st_IN   st_RI
0   1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

This result seems to continue through the entire gigantic dataframe that's created, and from spot checks, the rows don't seem to contain the appropriate 1's marked with their corresponding column. I've been using the following pandas doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

2. Then concatenating the columns onto the dataframe:

churn = pd.concat([churn, churn_dummies], axis=1)

can you give some example rows of your input data? – Josh Friedlander Mar 01 '20 at 09:17 — Josh Friedlander, Mar 01 '20 at 09:17

score 0 · Answer 1 · answered Mar 01 '20 at 23:57

0

I figured out the issue. When inputting the columns argument, the column name referenced needed to have square brackets around the name ['state'] in order to call the get_dummies method on that column.

answered Mar 01 '20 at 23:57

tralford

11
1

Create dummy coded columns for a column and concatenate it to the dataset

1 Answers1