Questions tagged [dummy-variable]

Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.

868 questions
0
votes
1 answer

dummyVars producing NA values in output

I have used dummyVars function from Caret package before to make dummy variables out of characters/factors with also missing values (NA) and it worked successfully. This time, however, the output I get includes NA values. The default is that it…
0
votes
0 answers

1 hot encoding training and test data separately in R

I need to add about 100 extra columns to data.frame based on the length of a previous data.frame For example, I have two data.frames Xtrain and Xtest. Xtrain as 1000 columns, but Xtest has only 900 columns. This difference is due to 1-hot encoding…
Sonu Mishra
  • 1,659
  • 4
  • 26
  • 45
0
votes
1 answer

Caret RFE to deal to dummy variables that are levels of the same categorical variable

I have a classification problem and one of the predictors is a categorical variable X with four levels A,B,C,D that was transformed to three dummy variables A,B,C. I was trying to use the Recursive Feature Selection (RFE) in the caret package to…
ybeybe
  • 149
  • 1
  • 12
0
votes
1 answer

How to apply linear regresssion of sklearn for some string variable

I am going to predict the box office of a movie using logistic regression. I got some train data including the actors and directors. This is my datas: Director1|Actor1|300 million Director2|Actor2|500 million I am going to encode the directors and…
0
votes
0 answers

Multiply 3-d matrix by a 2-d matrix with dummy variables

I have a 3D matrix, X, of size AxBxC and a 2D matrix, Y, of size CxD. I want to do a matrix multiply and end up with a 3d matrix, R, of size AxBxD: A = 30, B = 70, C = 300, D = 100. The 3-d matrix, is a dummy variable which takes the value: 1 - in…
0
votes
2 answers

Dummy variable as slope shifter without intercept

This is my first time to ask here. I have trouble generating the slope dummy variables only(without intercept dummy). However, if I multiply dummy variable by independent variable as shown below, both slope dummy and intercept dummy results are…
yjkim
  • 1
  • 1
0
votes
1 answer

Estimate SE for all factor levels with zero-inflated model

I have a fairly complicated ZINB model. I have tried to replicate the basic structure of what I'm trying to do: MyDat<-cbind.data.frame(fac1 = rep(c("A","B","C","D"),10), fac2=c(rep("X",20),rep("Y",20)), offset=c(runif(20,…
CAS
  • 1
  • 3
0
votes
1 answer

Create a dummy if any of a number of conditions is met

I would like to create a dummy if an action happens in a capital city and my dataset contains 34 countries in it. Also, some times can happen that the word is within a larger string (e.g. "Berlin, Germany, DE"). Let's say the column looks as…
Spl4t
  • 53
  • 8
0
votes
2 answers

One Hot Encoding of complex variables

I have a dataset where all my data is categorical and I would like to use one hot encoding for further analysis. Main issues I would like to resolve: Some cells contain many text in one cell (an example will follow). Some numerical values need to…
Boro Dega
  • 393
  • 1
  • 3
  • 13
0
votes
1 answer

Rapidminer dummy coding mismatch

I'm trying to use a neural network by training it on trainData and then testing on testData, as anyone would do. However, the data requires dummy coding of some nominal features to numerical. When I do that, it trains the neural network but fails…
Dimebag
  • 833
  • 2
  • 9
  • 29
0
votes
3 answers

Create factor variables for year integers in r

I have a panel data set like below. But actual data set has several thousands observations. I want to create 14 facotors as a new column "Year_dum" for years 1984-1998 (15 years). I searched for creating dummy variables in r, but could not find a…
Doo
  • 29
  • 7
0
votes
2 answers

Iterate through and overwrite specific values in a pandas dataframe

I have a large dataframe collating a bunch of basketball data (screenshot below). Every column to the right of Opp Lineup is a dummy variable indicating if that player (indicated in the column name) is in the current lineup (the last part of the…
John
  • 83
  • 1
  • 2
  • 12
0
votes
1 answer

How to create dummy variables within a loop in python?

So I have a dataframe with a bunch of eatures, some of which I want to make into a dummy variable, some of which I want to leave alone, and I wanted to create a lazy/faster way to do this rather than just typing: dum_A =…
pakkunrob
  • 435
  • 1
  • 5
  • 9
0
votes
3 answers

Subsetting by multiple aggregate conditions in dplyr

I was hoping someone knew of an easy/efficient in dplyr in which I can define an indicator variable to take the value of 1 if on Date X, an IP address was present >50 times. The data is two columns, one of IP addresses and the other associated…
ew23
  • 3
  • 2
0
votes
1 answer

Record variable to a dummy variable using weekdays

I have a variable that starting from Monday that lists each date from 1-7. I want to change this to weekday vs. weekend, with a 0-1 respectively to create a dummy variable. I know how to do one, but I can't figure out how to include 6 AND 7 in…
Greg
  • 3
  • 3