Questions tagged [dummy-variable]

Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.

868 questions
5
votes
7 answers

Dummify character column and find unique values

I have a dataframe with the following structure test <- data.frame(col = c('a; ff; cc; rr;', 'rr; a; cc; e;')) Now I want to create a dataframe from this which contains a named column for each of the unique values in the test dataframe. A unique…
Michael
  • 1,281
  • 1
  • 17
  • 32
5
votes
2 answers

Speed up this loop to create dummy columns with data.table and set in R

I have a data table and I want to create a new column for each unique day, and then assign a 1 in each row where the day matches the column name I have done this using a for loop but I was wondering if there was any way to optimise it using…
4
votes
1 answer

How to include factors in a regression model using package "caret" in R?

I am trying to build different regression models using the R package caret. For the data, it includes both numerical values and factors. Question 1: What is the correct way to include both numerical values and factors in a regression model in…
Yang Yang
  • 858
  • 3
  • 26
  • 49
4
votes
5 answers

Split variable into multiple multiple factor variables

I have some dataset similar to this: df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T)) I'm looking for a guidance in finding a way to split variable x into multiple categorical variables with range 0-1 In the end it…
4
votes
5 answers

Turn colum containing list into dummies

I have a dataframe with a list of (space-separated) years that I would like to turn into dummies for each year. Consider the following toy data: raw <- data.frame(textcol = c("case1", "case2", "case3"), years=c('1996 1997 1998','1997 1999 2000',…
Ivo
  • 3,890
  • 5
  • 22
  • 53
4
votes
2 answers

Formula with interaction terms in event-study designs using R

I am estimating what's often called the "event-study" specification of a difference-in-differences model in R. Basically, we observe treated and control units over time and estimate a two-way fixed effects model with parameters for the "effect" of…
ChrisP
  • 5,812
  • 1
  • 33
  • 36
4
votes
2 answers

Does dummyVars predict really return a data frame?

The predict method for dummyVars from the caret library has documentation that clearly states: "The predict function produces a data frame." However, every example that I've produced appear to only be a matrix. The following code is an example of…
J. Mini
  • 1,868
  • 1
  • 9
  • 38
4
votes
3 answers

How to create dummies from list with multiple values and predefined categories?

I'd like to transform this : In [4]: df Out[4]: label 0 (a, e) 1 (a, d) 2 (b,) 3 (d, e) to This : a b c d e 0 1 0 0 0 1 1 1 0 0 1 0 2 0 1 0 0 0 3 0 0 0 1 1 As you can see there are predefined…
4
votes
2 answers

Create dummy variable of multiple columns with python

I am working with a dataframe containing two columns with ID numbers. For further research I want to make a sort of dummy variables of these ID numbers (with the two ID numbers). My code, however, does not merge the columns from the two dataframes.…
Tox
  • 834
  • 2
  • 12
  • 33
4
votes
1 answer

dummy_cols Error: vector memory exhausted (limit reached?)

I am attempting to create dummy variables based on a factor variable with more than 200 factor levels. The data has more than 15 million observations. Using the "fastDummies" package, I am using the "dummy_cols" command to convert the factor…
Abe
  • 393
  • 2
  • 13
4
votes
3 answers

Reconstruct a categorical variable from dummies in R

Heyho, I am a beginner in R and have a problem to which I couldn't find a solution so far. I would like to transform dummy variables back to categorical variables. |dummy1| dummy2|dummy3| |------| ------|------| | 0 | 1 |0 | | 1 | 0 …
waterline
  • 67
  • 1
  • 6
4
votes
2 answers

Regression of dummy variables in R

I am new to R and I am trying to performa regression on my dataset, which includes e.g. monthly sales data of a company in different countries over multiple years. In other statistical programs, in order to control for quarterly cyclical movement of…
Trgovec
  • 555
  • 3
  • 7
  • 16
4
votes
1 answer

Dask get_dummies Does Not Transform Variable(s)

I'm trying to use get_dummies via dask but it does not transform my variable, nor does it error out: >>> import dask.dataframe as dd >>> import pandas as pd >>> df_d = dd.read_csv('/datasets/dask_example/dask_get_dummies_example.csv') >>>…
Frank B.
  • 1,813
  • 5
  • 24
  • 44
4
votes
2 answers

Dealing with ties using rank (R)

I'm trying to create dummy variable for whether a child is first born, and one for if the child is second born. My data looks something like this ID MID CMOB CYRB 1 1 1 1991 2 1 7 1989 3 2 1 1985 4 …
Milhouse
  • 177
  • 3
  • 11
4
votes
2 answers

Factor levels default to 1 and 2 in R | Dummy variable

I am transitioning from Stata to R. In Stata, if I label a factor levels (say--0 and 1) to (M and F), 0 and 1 would remain as they are. Moreover, this is required for dummy-variable linear regression in most software including Excel and…
watchtower
  • 4,140
  • 14
  • 50
  • 92
1 2
3
57 58