Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.
Questions tagged [dummy-variable]
868 questions
9
votes
3 answers
Creating categorical variables from mutually exclusive dummy variables
My question regards an elaboration on a previously answered question about combining multiple dummy variables into a single categorical variable.
In the question previously asked, the categorical variable was created from dummy variables that were…

roody
- 2,633
- 5
- 38
- 50
8
votes
1 answer
Warning message - dummy from dummies package
I am using the dummies package to generate dummy variables for categorical variables, some with more than two categories.
testdf<- data.frame(
"A" = as.factor(c(1,2,2,3,3,1)),
"B" = c('A','B','A','B','C','C'),
"C"=…

Max_IT
- 602
- 5
- 15
8
votes
2 answers
Dummy code categorical / ordinal variables in the tidyverse r
Let's say I have a tibble.
library(tidyverse)
tib <- as.tibble(list(record = c(1:10),
gender = as.factor(sample(c("M", "F"), 10, replace = TRUE)),
like_product = as.factor(sample(1:5, 10, replace =…

Jacob Nelson
- 443
- 1
- 6
- 16
7
votes
2 answers
Keep other variables when executing get_dummies in Pandas
I have a DataFrame with an ID variable and another categorical variable. I want to create dummy variables out of the categorical variable with get_dummies.
dum = pd.get_dummies(df)
However, this makes the ID variable disappear. And I need this ID…

Bert Carremans
- 1,623
- 4
- 23
- 47
6
votes
1 answer
Ordinal Encoding or One-Hot-Encoding
IF we are not sure about the nature of categorical features like whether they are nominal or ordinal, which encoding should we use? Ordinal-Encoding or One-Hot-Encoding?
Is there a clearly defined rule on this topic?
I see a lot of people using…

letdatado
- 93
- 1
- 11
6
votes
1 answer
How to create dummy variables using pandas with reference to one value?
test = {'ngrp' : ['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'Bronx']}
test = pd.DataFrame(test)
dummy = pd.get_dummies(test['ngrp'], drop_first = True)
This gives me:
Brooklyn Manhattan Queens Staten Island
0 0 1 …

John peter
- 144
- 1
- 11
6
votes
3 answers
Mutating dummy variables in dplyr
I want to create 7 dummy variables -one for each day, using dplyr
So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df
#Sample…

Lefkios Paikousis
- 462
- 1
- 6
- 12
6
votes
1 answer
Pandas DataFrame: How to convert binary columns into one categorical column?
Given a pandas DataFrame, how does one convert several binary columns (where 1 denotes the value exists, 0 denotes it doesn't) into a single categorical column?
Another way to think of this is how to perform the "reverse pd.get_dummies()"?
Here is…

ShanZhengYang
- 16,511
- 49
- 132
- 234
5
votes
1 answer
How to create a Dummy Variable in Python if Missing Values are included?
How to create a dummy variable if missing values are included? I have the following data and I want to create a Dummy variable based on several conditions. My problem is that it automatically converts my missing values to 0, but I want to keep them…

Lisa
- 157
- 1
- 9
5
votes
5 answers
How to create dummies based on two columns in R
Assume I have a dataframe:
Gender can take F as female or M as male
Race can take A as Asian, W as White, B as Black and H as Hispanic
| id | Gender | Race |
| --- | ----- | ---- |
| 1 | F | W |
| 2 | F | B |
| 3 | M | A |
| 4 | F …

xxx
- 167
- 1
- 7
5
votes
4 answers
Pandas Group By And Get Dummies
I want to make get dummy variables per unique value. Idea is to turn the data frame into a multi-label target. How can I do it?
Data:
ID L2
A Firewall
A Security
…

Krishnang K Dalal
- 2,322
- 9
- 34
- 55
5
votes
1 answer
Dummy variables, is necessary to standardize them?
I have the following dataset represented like numpy array
direccion_viento_pos
Out[32]:
array([['S'],
['S'],
['S'],
...,
['SO'],
['NO'],
['SO']], dtype=object)
The…

bgarcial
- 2,915
- 10
- 56
- 123
5
votes
1 answer
Dummy Encoding using Pyspark
I am hoping to dummy encode my categorical variables to numerical variables like shown in the image below, using Pyspark syntax.
I read in data like this
data = sqlContext.read.csv("data.txt", sep = ";", header = "true")
In python I am able to…

ALK
- 87
- 1
- 2
- 9
5
votes
3 answers
Is it possible to add a third dummy variable using ifelse() in R?
I was using this code to create a new Group column based on partial strings found inside the column var for 2 groups, Sui and Swe. I had to add another group, TRD, and I've been trying to tweak the ifelse function do this, but no success. Is this…

Adri
- 121
- 8
5
votes
2 answers
How to save mapping of data.frame-to-model.matrix and apply to new observations?
Some modeling functions, e.g. glmnet(), require (or just allow for) the data to be passed in as a predictor matrix and a response matrix (or vector) as apposed to using a formula. In these cases, it's typically the case that the predict() method,…

SamyIshak
- 411
- 5
- 9