Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.
Questions tagged [dummy-variable]
868 questions
5
votes
7 answers
Dummify character column and find unique values
I have a dataframe with the following structure
test <- data.frame(col = c('a; ff; cc; rr;', 'rr; a; cc; e;'))
Now I want to create a dataframe from this which contains a named column for each of the unique values in the test dataframe. A unique…

Michael
- 1,281
- 1
- 17
- 32
5
votes
2 answers
Speed up this loop to create dummy columns with data.table and set in R
I have a data table and I want to create a new column for each unique day, and then assign a 1 in each row where the day matches the column name
I have done this using a for loop but I was wondering if there was any way to optimise it using…

MidnightDataGeek
- 938
- 12
- 21
4
votes
1 answer
How to include factors in a regression model using package "caret" in R?
I am trying to build different regression models using the R package caret. For the data, it includes both numerical values and factors.
Question 1: What is the correct way to include both numerical values and factors in a regression model in…

Yang Yang
- 858
- 3
- 26
- 49
4
votes
5 answers
Split variable into multiple multiple factor variables
I have some dataset similar to this:
df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T))
I'm looking for a guidance in finding a way to split variable x into multiple categorical variables with range 0-1
In the end it…

lewkaj
- 43
- 3
4
votes
5 answers
Turn colum containing list into dummies
I have a dataframe with a list of (space-separated) years that I would like to turn into dummies for each year.
Consider the following toy data:
raw <- data.frame(textcol = c("case1", "case2", "case3"), years=c('1996 1997 1998','1997 1999 2000',…

Ivo
- 3,890
- 5
- 22
- 53
4
votes
2 answers
Formula with interaction terms in event-study designs using R
I am estimating what's often called the "event-study" specification of a difference-in-differences model in R. Basically, we observe treated and control units over time and estimate a two-way fixed effects model with parameters for the "effect" of…

ChrisP
- 5,812
- 1
- 33
- 36
4
votes
2 answers
Does dummyVars predict really return a data frame?
The predict method for dummyVars from the caret library has documentation that clearly states:
"The predict function produces a data frame."
However, every example that I've produced appear to only be a matrix. The following code is an example of…

J. Mini
- 1,868
- 1
- 9
- 38
4
votes
3 answers
How to create dummies from list with multiple values and predefined categories?
I'd like to transform this :
In [4]: df
Out[4]:
label
0 (a, e)
1 (a, d)
2 (b,)
3 (d, e)
to This :
a b c d e
0 1 0 0 0 1
1 1 0 0 1 0
2 0 1 0 0 0
3 0 0 0 1 1
As you can see there are predefined…

Bilal Alauddin
- 91
- 1
- 3
4
votes
2 answers
Create dummy variable of multiple columns with python
I am working with a dataframe containing two columns with ID numbers. For further research I want to make a sort of dummy variables of these ID numbers (with the two ID numbers). My code, however, does not merge the columns from the two dataframes.…

Tox
- 834
- 2
- 12
- 33
4
votes
1 answer
dummy_cols Error: vector memory exhausted (limit reached?)
I am attempting to create dummy variables based on a factor variable with more than 200 factor levels. The data has more than 15 million observations. Using the "fastDummies" package, I am using the "dummy_cols" command to convert the factor…

Abe
- 393
- 2
- 13
4
votes
3 answers
Reconstruct a categorical variable from dummies in R
Heyho,
I am a beginner in R and have a problem to which I couldn't find a solution so far. I would like to transform dummy variables back to categorical variables.
|dummy1| dummy2|dummy3|
|------| ------|------|
| 0 | 1 |0 |
| 1 | 0 …

waterline
- 67
- 1
- 6
4
votes
2 answers
Regression of dummy variables in R
I am new to R and I am trying to performa regression on my dataset, which includes e.g. monthly sales data of a company in different countries over multiple years.
In other statistical programs, in order to control for quarterly cyclical movement of…

Trgovec
- 555
- 3
- 7
- 16
4
votes
1 answer
Dask get_dummies Does Not Transform Variable(s)
I'm trying to use get_dummies via dask but it does not transform my variable, nor does it error out:
>>> import dask.dataframe as dd
>>> import pandas as pd
>>> df_d = dd.read_csv('/datasets/dask_example/dask_get_dummies_example.csv')
>>>…

Frank B.
- 1,813
- 5
- 24
- 44
4
votes
2 answers
Dealing with ties using rank (R)
I'm trying to create dummy variable for whether a child is first born, and one for if the child is second born. My data looks something like this
ID MID CMOB CYRB
1 1 1 1991
2 1 7 1989
3 2 1 1985
4 …

Milhouse
- 177
- 3
- 11
4
votes
2 answers
Factor levels default to 1 and 2 in R | Dummy variable
I am transitioning from Stata to R. In Stata, if I label a factor levels (say--0 and 1) to (M and F), 0 and 1 would remain as they are. Moreover, this is required for dummy-variable linear regression in most software including Excel and…

watchtower
- 4,140
- 14
- 50
- 92