Questions tagged [dummy-variable]

Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.

868 questions
9
votes
3 answers

Creating categorical variables from mutually exclusive dummy variables

My question regards an elaboration on a previously answered question about combining multiple dummy variables into a single categorical variable. In the question previously asked, the categorical variable was created from dummy variables that were…
roody
  • 2,633
  • 5
  • 38
  • 50
8
votes
1 answer

Warning message - dummy from dummies package

I am using the dummies package to generate dummy variables for categorical variables, some with more than two categories. testdf<- data.frame( "A" = as.factor(c(1,2,2,3,3,1)), "B" = c('A','B','A','B','C','C'), "C"=…
Max_IT
  • 602
  • 5
  • 15
8
votes
2 answers

Dummy code categorical / ordinal variables in the tidyverse r

Let's say I have a tibble. library(tidyverse) tib <- as.tibble(list(record = c(1:10), gender = as.factor(sample(c("M", "F"), 10, replace = TRUE)), like_product = as.factor(sample(1:5, 10, replace =…
Jacob Nelson
  • 443
  • 1
  • 6
  • 16
7
votes
2 answers

Keep other variables when executing get_dummies in Pandas

I have a DataFrame with an ID variable and another categorical variable. I want to create dummy variables out of the categorical variable with get_dummies. dum = pd.get_dummies(df) However, this makes the ID variable disappear. And I need this ID…
Bert Carremans
  • 1,623
  • 4
  • 23
  • 47
6
votes
1 answer

Ordinal Encoding or One-Hot-Encoding

IF we are not sure about the nature of categorical features like whether they are nominal or ordinal, which encoding should we use? Ordinal-Encoding or One-Hot-Encoding? Is there a clearly defined rule on this topic? I see a lot of people using…
6
votes
1 answer

How to create dummy variables using pandas with reference to one value?

test = {'ngrp' : ['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'Bronx']} test = pd.DataFrame(test) dummy = pd.get_dummies(test['ngrp'], drop_first = True) This gives me: Brooklyn Manhattan Queens Staten Island 0 0 1 …
John peter
  • 144
  • 1
  • 11
6
votes
3 answers

Mutating dummy variables in dplyr

I want to create 7 dummy variables -one for each day, using dplyr So far, I have managed to do it using the sjmisc package and the to_dummy function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df #Sample…
Lefkios Paikousis
  • 462
  • 1
  • 6
  • 12
6
votes
1 answer

Pandas DataFrame: How to convert binary columns into one categorical column?

Given a pandas DataFrame, how does one convert several binary columns (where 1 denotes the value exists, 0 denotes it doesn't) into a single categorical column? Another way to think of this is how to perform the "reverse pd.get_dummies()"? Here is…
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
5
votes
1 answer

How to create a Dummy Variable in Python if Missing Values are included?

How to create a dummy variable if missing values are included? I have the following data and I want to create a Dummy variable based on several conditions. My problem is that it automatically converts my missing values to 0, but I want to keep them…
Lisa
  • 157
  • 1
  • 9
5
votes
5 answers

How to create dummies based on two columns in R

Assume I have a dataframe: Gender can take F as female or M as male Race can take A as Asian, W as White, B as Black and H as Hispanic | id | Gender | Race | | --- | ----- | ---- | | 1 | F | W | | 2 | F | B | | 3 | M | A | | 4 | F …
xxx
  • 167
  • 1
  • 7
5
votes
4 answers

Pandas Group By And Get Dummies

I want to make get dummy variables per unique value. Idea is to turn the data frame into a multi-label target. How can I do it? Data: ID L2 A Firewall A Security …
5
votes
1 answer

Dummy variables, is necessary to standardize them?

I have the following dataset represented like numpy array direccion_viento_pos Out[32]: array([['S'], ['S'], ['S'], ..., ['SO'], ['NO'], ['SO']], dtype=object) The…
bgarcial
  • 2,915
  • 10
  • 56
  • 123
5
votes
1 answer

Dummy Encoding using Pyspark

I am hoping to dummy encode my categorical variables to numerical variables like shown in the image below, using Pyspark syntax. I read in data like this data = sqlContext.read.csv("data.txt", sep = ";", header = "true") In python I am able to…
ALK
  • 87
  • 1
  • 2
  • 9
5
votes
3 answers

Is it possible to add a third dummy variable using ifelse() in R?

I was using this code to create a new Group column based on partial strings found inside the column var for 2 groups, Sui and Swe. I had to add another group, TRD, and I've been trying to tweak the ifelse function do this, but no success. Is this…
Adri
  • 121
  • 8
5
votes
2 answers

How to save mapping of data.frame-to-model.matrix and apply to new observations?

Some modeling functions, e.g. glmnet(), require (or just allow for) the data to be passed in as a predictor matrix and a response matrix (or vector) as apposed to using a formula. In these cases, it's typically the case that the predict() method,…
SamyIshak
  • 411
  • 5
  • 9
1
2
3
57 58