3

I am trying to create a dummy variable for R. The thing is there are many categorical variables under my dataset of restaurants 'type'. Among them, I want Vegan restaurants to have value 1 and the rest to be 0. So when I run summary of the regression, I get the intercept, and b1 as reviews_number and b2 as vegan restaurants. For example, a non-vegan restaurant would be y=b0+b1(reviews_number) and a vegan restaurant will be y=b0+b1(reviews_number)+b2(Vegan). The hint is to use ifelse()command, but I can't seem to simplify the coefficients to just 3. Or else, I need to create a value for each type of restaurant respectively......

  • 2
    I feel like you have [XY problem](https://en.wikipedia.org/wiki/XY_problem). Perhaps you need a factor to represent restaurant type. Then use lm with `y~reviews+type`. – mlt Sep 22 '18 at 22:38

2 Answers2

3

Assuming your data frame is called df, you can create your dummy variable (Vegan) using:

df$Vegan <- ifelse(df$type == "Vegan", 1, 0) # where variable type is type of restaurants 

However, you should note that if type is a stored as factor, you can also get the coefficient on each type of restaurants (compared to the reference level) using y=b0+b1(reviews_number)+b2(type) i.e. y~reviews+type, as pointed by @mlt.

nghauran
  • 6,648
  • 2
  • 20
  • 29
0

If you need just one dummy variable, distinguishing vegan vs. non-vegan, then you can just do:

df$Vegan = as.integer(d$type == "Vegan")
vpekar
  • 3,275
  • 1
  • 19
  • 16