Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.
Questions tagged [dummy-variable]
868 questions
3
votes
2 answers
How can I create a dataframe of dummies from a dict of lists of unequal length?
I have a dictionary where each key is a row index and each value is a list of dummy values. For example:
my_dict = {'row1': ['a', 'b'], 'row2': ['a'], 'row3': ['b', 'c']}
Can I create a dataframe of dummies with the above in an efficient…

Joe B
- 912
- 2
- 15
- 36
3
votes
6 answers
Dummy variable conditioned on repetitions in grouped observations
EDIT
Thank you for your replies. However, I still haven't managed to work out my problem, as my dataset contains 700,000 obeservations, and all the approaches below results in error, or simply continues to run for hours without finishing (I can tell…

Lucas E
- 105
- 2
- 9
3
votes
1 answer
Making a variable from detection of any multiple string pattern in one string
end_result_tbl
This end_result_tbl is an example from a different voter file in ideal format.
ID GEN_16 GEN_14 GEN_08 PP_16 PR_16 PR_15 PR_14
0001 1 1 1 1 0 0 0
0002 0 0 0 0 …

Jorge
- 83
- 1
- 5
3
votes
1 answer
How to create dummy variable for specific range in R?
I have a dataset with daily observations from 1990 to 2017. The columns start and end (below) show the beginning and the end of a certain political demonstration. How can I create a dummy variable that takes the value of 1 for every day the event…

Clemens
- 41
- 3
3
votes
2 answers
Dummy Variable Trap in Linear Regression
I am having dataset which contains categorical attribute state which can take New York, California and Florida.
After encoding these values in dummy variables why we need to drop
one variable?
Can someone explain me what is dummy variable…

Chirag Jain
- 39
- 1
- 4
3
votes
0 answers
Dummy Variables in Feature Importance Analysis
In my regression model, I have created dummy variables for all binary variables in my data set. When I extract the feature importances from my model (XGBoost regression model) and plot them, I have a feature importance for all dummy variables as…

Peter Lawrence
- 719
- 2
- 10
- 20
3
votes
4 answers
Conditional dummy variables in Pandas
df.head()
Player Tourn Score
Tom a 65
Henry a 72
Johno a 69
Ingram a 79
Ben a 76
Harry a 66
Nick b 70
Ingram b 79
Johno b 69
I have a dataframe of player…

Tom Dry
- 121
- 2
- 9
3
votes
1 answer
How does this binary encoder function work?
I'm trying to understand the logic behind this binary encoder.
It automatically takes categorical variables and dummy codes them (similar to one-hot-encoding on sklearn), but reduces the number of output columns equal to the log2 of the length of…

Negative Correlation
- 813
- 1
- 11
- 26
3
votes
2 answers
Solving pd.get_dummies dysfunction in python
I have
a={0: ['I3925'], 1: ['I3925'], 2: ['I3925'], 3: ['I2355'], 4: ['I2355'], 5: ['I2355'], 6: ['I111'], 7: ['I111'], 8: ['I111'], 9: ['I405'], 10: ['I405'], 11: ['I3878', 'I2864'], 12: ['I3878'], 13: ['I534'], 14: ['I534'], 15: ['I134',…

Ando Jurai
- 1,003
- 2
- 14
- 29
3
votes
3 answers
How to add seasonal dummy variables?
I would like to add seasonality dummies in my R data.table based on quarters. I have looked at multiple examples but I haven't been able to solve this issue yet. My knowledge about R is limited so I was wondering if you could get me on the right…

MRJJ17
- 117
- 1
- 1
- 8
3
votes
1 answer
Get dummy variables in Pandas where rows contain multiple variables as a list?
Consider a Pandas dataframe which has a column 'id', and the rows of this column consists of list of strings representing categories. What is an efficient way to obtain the dummy variables?
Example:
Input:
df1 = pd.DataFrame({'id': ['0,1', '24,25',…

Shree
- 73
- 6
3
votes
1 answer
How to do create dummy variables for prediction from user input (only one record)?
I am trying to create a web application for predicting airline delays. I have trained my model offline on my computer, and now am trying to make a Flask app to make predictions based on user input. For simplicity, lets say my model has 3 categorical…

RRC
- 1,342
- 2
- 11
- 17
3
votes
3 answers
Dask + Pandas: Returning a sequence of conditional dummies
In Pandas if I want to create a column of conditional dummies (say 1 if a variable is equal to a string and 0 if it is not), then my goto in pandas is:
data["ebt_dummy"] = np.where((data["paymenttypeid"]=='ebt'), 1, 0)
Naively trying this in a dask…

sfortney
- 2,075
- 6
- 23
- 43
3
votes
3 answers
Pandas sklearn one-hot encoding dataframe or numpy?
How can I transform a pandas data frame to sklearn one-hot-encoded (dataframe / numpy array) where some columns do not require encoding?
mydf = pd.DataFrame({'Target':[0,1,0,0,1, 1,1],
'GroupFoo':[1,1,2,2,3,1,2],
…

Georg Heiler
- 16,916
- 36
- 162
- 292
3
votes
3 answers
Simple way of creating dummy variable in R
I want to know how simply a dummy variables can be created. I found many similar questions on the dummy but either they are based on some external packages or technical.
I have data like this :
df <- data.frame(X=rnorm(10,0,1),…

Neeraj
- 1,166
- 9
- 21