Questions tagged [dummy-variable]

Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.

868 questions
2
votes
2 answers

Create dummy variables that are dependent on IDs following an ordered sequence

Here is my input: structure(list(date = c(1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990,…
ZZ Top
  • 93
  • 5
2
votes
2 answers

Create dummy column and input value from other column

I have data containing a list of topics (topics 1-5; and 0 meaning no topic is assigned) and their value. I want to create a new column for each topic and fill the column with the value. Here's what the table looks like... reviewId topic value …
Dewani
  • 137
  • 6
2
votes
1 answer

How to make dummy coding (pd.get_dummies()) only for categories which share in nominal variables is at least 40% in Python Pandas?

I have DataFrame like below: COL1 | COL2 | COL3 | ... | COLn -----|------|------|------|---- 111 | A | Y | ... | ... 222 | A | Y | ... | ... 333 | B | Z | ... | ... 444 | C | Z | ... | ... 555 | D | P | ... |…
dingaro
  • 2,156
  • 9
  • 29
2
votes
1 answer

How to add empty/dummy row with continuous datetime index in pandas?

This is my dataframe consumption hour start_time 2022-09-30 14:00:00+02:00 199.0 14.0 2022-09-30 15:00:00+02:00 173.0 15.0 2022-09-30 16:00:00+02:00 173.0 16.0 2022-09-30…
Naeem
  • 45
  • 4
2
votes
3 answers

Underscore variable with walrus operator in Python

In Python, the variable name _ (underscore) is often used for throwaway variables (variables that will never be used, hence do not need a proper name). With the walrus operator, :=, I see the need for a variable that is rather short lived (used in…
DustByte
  • 651
  • 6
  • 16
2
votes
3 answers

Removing all binary variables from the data

I have data as follows: df <- data.frame(A=c(1,2,3), B=c(1,0,1), C=c(0.1, 0.011, 0.3), D=c(0, 0.5, 1)) A B C D 1 1 1 0.100 0.0 2 2 0 0.011 0.5 3 3 1 0.300 1.0 Ho can I remove all binary variables (= B) from this data.frame?
Tom
  • 2,173
  • 1
  • 17
  • 44
2
votes
1 answer

Pandas: pivot comma delimited column into multiple columns

I have the following Pandas DataFrame: import pandas as pd import numpy as np df = pd.DataFrame({'id': [1, 2, 3, 4], 'type': ['a,b,c,d', 'b,d', 'c,e', np.nan]}) I need to split the type column based on the commma delimiter and pivot the values…
Hui
  • 97
  • 7
2
votes
2 answers

How to use an existing dummy variable to create a new one that takes the value 1 for certain lead observations within a group

I have a dataset like the one below: dat <- data.frame (id = c(1,1,1,1,1,2,2,2,2,2), year = c(2015, 2016, 2017,2018, 2019, 2015, 2016, 2017, 2018, 2019), sp=c(1,0,0,0,0,0,1,0,0,0)) dat id year sp 1 1 2015 …
Teo
  • 33
  • 4
2
votes
3 answers

Separate rows to make dummy rows

Consider this dataframe: dat <- structure(list(col1 = c(1, 2, 0), col2 = c(0, 3, 2), col3 = c(1, 2, 3)), class = "data.frame", row.names = c(NA, -3L)) col1 col2 col3 1 1 0 1 2 2 3 2 3 0 2 3 How can one dummify rows?…
Maël
  • 45,206
  • 3
  • 29
  • 67
2
votes
3 answers

How to specify which column to remove in get_dummies in pandas

I have a DataFrame column with 3 values - Bart, Peg, Human. I need to one-hot encode them such that Bart and Peg stay as columns and human is represented as 0 0. Xi | Architecture 0 | Bart 1 | Bart 2 | Peg 3 | Human 4 | Human 5 | Peg .. . I…
Kiera.K
  • 317
  • 1
  • 13
2
votes
1 answer

How can I create a dummy variable based on text analysis and time sequence of events?

Coworkers Date A 2011-01-01 D 2011-01-02 B;;D 2011-01-03 E;;F 2011-01-04 D 2012-11-05 D;;G 2012-11-06 A 2012-11-09 Hello, I am trying to create a dummy variable based on text analysis (e.g., grepl). The unit of analysis is a…
Juno Oh
  • 23
  • 3
2
votes
3 answers

How do I create two new variables out of one variable, and attach dummy values to it in R?

I am completely new to any kind of coding, nevermind R in particular, so my days of googling have not been very helpful. I would really appreciate any kind of help/insights! I would like to know how to get two new variables out of the original…
Bommby
  • 35
  • 4
2
votes
1 answer

Variable Importance Dummy Variables R

How can I determine variable importance (vip package in r) for categorical predictors when they have been one-hot encoded? It seems impossible for r to do this when the model is built on the dummy variables rather than the original categorical…
mapleleaf
  • 758
  • 3
  • 8
  • 14
2
votes
0 answers

One Hot Encoding: Avoiding dummy variable trap and process unseen data with scikit learn

I'm building a model, pretty much similiar to the well known House Price Prediction. I got to the point that I need to encode my nominal categorical variables by using scikit-learns OneHotEncoder. The so called "Dummy Variable Trap" is clear to me…
Buggy
  • 43
  • 5
2
votes
2 answers

Is there a way to display the reference category in a regression output in R?

I am estimating a regression model with some factor/categorial variables and some numerical ones. Is it possible to display the reference category for each factor/categorial variable in the summary of the regression model? Ideally this would…
SKupek
  • 63
  • 6