Questions tagged [dummy-variable]

Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.

868 questions
3
votes
2 answers

How can I create a dataframe of dummies from a dict of lists of unequal length?

I have a dictionary where each key is a row index and each value is a list of dummy values. For example: my_dict = {'row1': ['a', 'b'], 'row2': ['a'], 'row3': ['b', 'c']} Can I create a dataframe of dummies with the above in an efficient…
Joe B
  • 912
  • 2
  • 15
  • 36
3
votes
6 answers

Dummy variable conditioned on repetitions in grouped observations

EDIT Thank you for your replies. However, I still haven't managed to work out my problem, as my dataset contains 700,000 obeservations, and all the approaches below results in error, or simply continues to run for hours without finishing (I can tell…
Lucas E
  • 105
  • 2
  • 9
3
votes
1 answer

Making a variable from detection of any multiple string pattern in one string

end_result_tbl This end_result_tbl is an example from a different voter file in ideal format. ID GEN_16 GEN_14 GEN_08 PP_16 PR_16 PR_15 PR_14 0001 1 1 1 1 0 0 0 0002 0 0 0 0 …
Jorge
  • 83
  • 1
  • 5
3
votes
1 answer

How to create dummy variable for specific range in R?

I have a dataset with daily observations from 1990 to 2017. The columns start and end (below) show the beginning and the end of a certain political demonstration. How can I create a dummy variable that takes the value of 1 for every day the event…
Clemens
  • 41
  • 3
3
votes
2 answers

Dummy Variable Trap in Linear Regression

I am having dataset which contains categorical attribute state which can take New York, California and Florida. After encoding these values in dummy variables why we need to drop one variable? Can someone explain me what is dummy variable…
3
votes
0 answers

Dummy Variables in Feature Importance Analysis

In my regression model, I have created dummy variables for all binary variables in my data set. When I extract the feature importances from my model (XGBoost regression model) and plot them, I have a feature importance for all dummy variables as…
Peter Lawrence
  • 719
  • 2
  • 10
  • 20
3
votes
4 answers

Conditional dummy variables in Pandas

df.head() Player Tourn Score Tom a 65 Henry a 72 Johno a 69 Ingram a 79 Ben a 76 Harry a 66 Nick b 70 Ingram b 79 Johno b 69 I have a dataframe of player…
Tom Dry
  • 121
  • 2
  • 9
3
votes
1 answer

How does this binary encoder function work?

I'm trying to understand the logic behind this binary encoder. It automatically takes categorical variables and dummy codes them (similar to one-hot-encoding on sklearn), but reduces the number of output columns equal to the log2 of the length of…
3
votes
2 answers

Solving pd.get_dummies dysfunction in python

I have a={0: ['I3925'], 1: ['I3925'], 2: ['I3925'], 3: ['I2355'], 4: ['I2355'], 5: ['I2355'], 6: ['I111'], 7: ['I111'], 8: ['I111'], 9: ['I405'], 10: ['I405'], 11: ['I3878', 'I2864'], 12: ['I3878'], 13: ['I534'], 14: ['I534'], 15: ['I134',…
Ando Jurai
  • 1,003
  • 2
  • 14
  • 29
3
votes
3 answers

How to add seasonal dummy variables?

I would like to add seasonality dummies in my R data.table based on quarters. I have looked at multiple examples but I haven't been able to solve this issue yet. My knowledge about R is limited so I was wondering if you could get me on the right…
MRJJ17
  • 117
  • 1
  • 1
  • 8
3
votes
1 answer

Get dummy variables in Pandas where rows contain multiple variables as a list?

Consider a Pandas dataframe which has a column 'id', and the rows of this column consists of list of strings representing categories. What is an efficient way to obtain the dummy variables? Example: Input: df1 = pd.DataFrame({'id': ['0,1', '24,25',…
Shree
  • 73
  • 6
3
votes
1 answer

How to do create dummy variables for prediction from user input (only one record)?

I am trying to create a web application for predicting airline delays. I have trained my model offline on my computer, and now am trying to make a Flask app to make predictions based on user input. For simplicity, lets say my model has 3 categorical…
RRC
  • 1,342
  • 2
  • 11
  • 17
3
votes
3 answers

Dask + Pandas: Returning a sequence of conditional dummies

In Pandas if I want to create a column of conditional dummies (say 1 if a variable is equal to a string and 0 if it is not), then my goto in pandas is: data["ebt_dummy"] = np.where((data["paymenttypeid"]=='ebt'), 1, 0) Naively trying this in a dask…
sfortney
  • 2,075
  • 6
  • 23
  • 43
3
votes
3 answers

Pandas sklearn one-hot encoding dataframe or numpy?

How can I transform a pandas data frame to sklearn one-hot-encoded (dataframe / numpy array) where some columns do not require encoding? mydf = pd.DataFrame({'Target':[0,1,0,0,1, 1,1], 'GroupFoo':[1,1,2,2,3,1,2], …
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
3
votes
3 answers

Simple way of creating dummy variable in R

I want to know how simply a dummy variables can be created. I found many similar questions on the dummy but either they are based on some external packages or technical. I have data like this : df <- data.frame(X=rnorm(10,0,1),…
Neeraj
  • 1,166
  • 9
  • 21