3

I want to know how simply a dummy variables can be created. I found many similar questions on the dummy but either they are based on some external packages or technical.

I have data like this :

df <- data.frame(X=rnorm(10,0,1), Y=rnorm(10,0,1))
df$Z <- c(NA, diff(df$X)*diff(df$Y))

Z create a new variable within df ie product of change in X and change in Y. Now I want to create a dummy variable D in df such that if : Z < 0 then D==1, if Z >0 then D==0.

I tried in this way :

df$D <- NA
for(i in 2:10) {
if(df$Z[i] <0 ) {
D[i] ==1
}
if(df$Z[i] >0 ) {
D[i] ==0
}}

This is not working. I want to know why above code is not working (with easy way of doing this) and how dummy variables can be creating in R without using any external packages with little bit of explanation.

Neeraj
  • 1,166
  • 9
  • 21
  • Try `df$D <- 1L * (df$Z < 1)` – dickoa Nov 12 '15 at 08:09
  • can you please explain this ? – Neeraj Nov 12 '15 at 08:10
  • There is a typo in my first comment, it is `df$D <- 1L * (df$Z < 0)` actually. `df$Z < 0` will compare each value of the `df$Z` vector to 0, when the condition is met you will `TRUE` and `FALSE` otherwise. You will end up wil a vector of `TRUE` and `FALSE` but in R (and in general), `TRUE == 1` and `FALSE == 0`, so multiplying my vector of boolean by `1L` is just a trick to turn the final result to a vector of `1` and `0`. – dickoa Nov 12 '15 at 08:14
  • Thanks @dickoa. I learned something new from this. – Neeraj Nov 12 '15 at 08:23

3 Answers3

7

Try :

df$D<-ifelse(df$Z<0,1,0)
df
            X           Y           Z  D
1  -0.1041896 -1.11731404          NA NA
2  -1.4286604  1.42523717 -3.36753491  1
3   0.3931643 -0.05525477 -2.69719691  1
4  -0.2236541  1.64531526 -1.04894297  1
5   1.1725167  0.80063291 -1.17932089  1
6   0.7571427  0.64072381  0.06642209  0
7   0.4929186  1.25125268 -0.16131645  1
8   0.9715885 -0.54755653 -0.86103574  1
9  -0.2962052 -1.37459521  1.04851438  0
10 -1.4838675 -0.85788632 -0.61367565  1

The ifelsefunction takes 3 arguments : the condition to evaluate df$Z<0, the value if the condition is TRUE : 1 and the value if the condition is FALSE : 0. The function is vectorized so it works well in this case.

etienne
  • 3,648
  • 4
  • 23
  • 37
5

We can create a logical vector by df$Z < 0 and then coerce it to binary by wrapping with +.

 df$D <- +(df$Z <0)

Or as @BenBolker mentioned, the canonical options would be

as.numeric(df$Z < 0)

or

as.integer(df$Z < 0)

Benchmarks

set.seed(42)
Z <- rnorm(1e7)
library(microbenchmark)
microbenchmark(akrun= +(Z < 0), etienne = ifelse(Z < 0, 1, 0),
           times= 20L,  unit='relative')
#    Unit: relative
#    expr      min       lq     mean   median      uq      max neval
#   akrun  1.00000  1.00000 1.000000  1.00000 1.00000 1.000000    20
# etienne 12.20975 10.36044 9.926074 10.66976 9.32328 7.830117    20
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can try

df$D[df$Z<0]<-1
df$D[df$Z>0]<-0

But you should consider the possibility that Z can be 0.

søren
  • 80
  • 4