Questions tagged [dummy-variable]

Dummy or indicator variables are used to include categorical or qualitative variables in a regression model.

868 questions
4
votes
2 answers

R: Testing each level of a factor without creating new variables

Suppose I have a data frame with a binary grouping variable and a factor. An example of such a grouping variable could specify assignment to the treatment and control conditions of an experiment. In the below, b is the grouping variable while a is…
socialscientist
  • 3,759
  • 5
  • 23
  • 58
4
votes
2 answers

Dummy variables in several regressions using Stargazer in R

I am trying to create a table of regressions using the Stargazer package in R. I have several regressions that differ only in the dummy variables. I want it to report the coefficient of the independent variable, the constant, etc., and to say "yes"…
ejn
  • 415
  • 6
  • 16
3
votes
3 answers

Creating a variable depending on values of two different parameters

I have a question regarding how to craft a variable depending on two other variables. I need to create a dummy variable that will take the value of 1 if Parameter1 is either A or B (but not C) and Parameter2 has a positive value. The variable needs…
Jens
  • 125
  • 1
  • 7
3
votes
1 answer

Creating a conditional dummy variable column in R

I'm working with a cross-country panel dataset, one of my variables (cc_dummy) takes the value of 1 & 0 (there are also missing values indicated by NAs). I want to create a new column such that if cc_dummy takes the value 1 for three consecutive…
3
votes
2 answers

pivot long form categorical data by group and dummy code categorical variables

For the following dataframe, I am trying to pivot the categorical variable ('purchase_item') into wide format and dummy code them as 1/0 - based on whether or not a customer purchased it in each of the 4 quarters within 2016. I would like to…
3
votes
2 answers

pandas get_dummies() for multiple columns with a pre-defined list

I'm struggling with creating columns of dummies for my dataframe. This is my original dataframe: df = pd.DataFrame({'id': ['01', '02', '03'], 'Q1': ['a', 'b', 'a'], 'Q2': ['c', 'b', 'a']}) print(df) id Q1…
3
votes
2 answers

Convert dataframe column string values into dummy variable columns

I have the following dataframe (excluded rest of columns): | customer_id | department | | ----------- | ----------------------------- | | 11 | ['nail', 'men_skincare'] | | 23 | ['nail', 'fragrance'] …
3
votes
3 answers

Frequency of a small subset of values in a Large Pandas Dataframe

This question provided a example of how a frequency count for a given row can be obtained from pandas dataframe using pd.get_dummies + aggregation. However this doesn't scale if you want only a small subset of terms from a very large dataframe. For…
knowads
  • 705
  • 2
  • 7
  • 24
3
votes
1 answer

one-hot-encoding (dummy variables) with BigQuery

I would like to use BigQuery instead of Pandas to create dummy variables (one-hot-encoding) for my categories. I will end up with about 200 columns, therefore I can't do it manually and hard code it Test dataset (the actual one has many more…
Alex
  • 1,447
  • 7
  • 23
  • 48
3
votes
1 answer

Compare two dataframe and transpose each value as column by filling binary when there is a match?

I have two dataframe as follows: df1 SYMBOL seqnames start end SampleID SPATA21 1 16736303 16736303 eAPD114 E2F2 1 23836607 23836607 eAPD114 FCN3 1 27701288 27701288 eAPD120 MARCKSL 1 …
user2110417
3
votes
3 answers

R - Function to make a binary variable

I have some variables which take value between 1 and 5. I would like to code them 0 if they take the value between 1 and 3 (included) and 1 if they take the value 4 or 5. My dataset looks like this var1 var2 var3 1 1 NA 4 …
Emeline
  • 161
  • 9
3
votes
4 answers

split values of a dictionary into seperate panda dataframe columns -- make them dummy

Let's say we have a dataframe in this format: id properties 0 {"cat1":["p1","p2","p4"],"cat2":["p5", "p6"]} 1 {"cat1":["p3"],"cat2":["p7"]} How can we convert it to this format? id p1 p2 p3 p4 p5 p6 p7 0 True True …
MTT
  • 5,113
  • 7
  • 35
  • 61
3
votes
0 answers

Should multiple dummy variables start from different numbers when handling multiple categorical features in a data set?

Considering multiple independent categorical features in a data set, we want to encode multiple variables in each category. Should the dummy variables be different in each category? or is it reasonable to start the dummies in each category from 0?…
3
votes
3 answers

Is there a way to create dummy variables for years that fall between two time points?

I am working with some time series data, where each row is an observation of a person, and I have two time periods, the start date and the end date. I am trying to create dummy variables for each year, such that if the year falls between the start…
Ryan
  • 77
  • 7
3
votes
2 answers

Re-categorize a column in a pandas dataframe

I am trying to build a simple classification model for my data stored in the pandas dataframe train. To make this model more efficient, I created a list of column names of columns I know to store categorical data, called category_cols. I categorize…