0

Which apply function should be used to avoid the for loop in the code below? The variable labels are simple placeholders for now - the goal is to use this process for setting up a codebook in R for easy export to SPSS where the codebook is ready to go when opened in SPSS. Ideally, this will simplify my ability to work in R for my own work but be compatible with coworkers who use SPSS.

data1 <- read.table(header = TRUE, sep=",",
                   text = "
SubjectID,Age,WeightPRE,WeightPOST,Height,SES,GenderSTR,GenderCoded
1,45,150,145,5.6,2,m,1
2,50,167,166,5.4,2,f,2
3,35,143,135,5.6,2,F,2
4,44,216,201,5.6,2,m,1
5,32,243,223,6,2,m,1
6,48,165,145,5.2,2,f,2
7,50,132,132,5.3,2,m,1
8,51,110,108,5.1,3,f,2
9,46,167,158,5.5,2,,
10,35,190,200,5.8,1,Male,1
11,36,230,210,6.2,1,m,1
12,40,200,195,6.1,1,f,2
13,45,180,185,5.9,3,f,2
14,52,240,220,6.5,2,m,1
15,24,250,240,6.4,2,M,1
16,35,175,174,5.8,2,F,2
17,51,220,221,6.3,2,m,1
18,43,230,215,2.6,2,m,1
19,36,190,180,5.7,1,female,2
20,44,260,240,6.4,3,male,1
")

var.labels = c(SubjectID="aaa",
               Age="Age in Years", 
               WeightPRE="bbb",
               WeightPOST="ccc",
               Height="ddd",
               SES="eee",
               GenderSTR="fff",
               GenderCoded="ggg")

for(i in 1:8){
  attr(dtab1[[names(var.labels)[i]]],"label") <- var.labels[names(var.labels)[i]]
}

# using the haven package
# this creates SPSS datafile with variable labels
library(haven)
write_sav(dtab1,"out1.sav")
  • A for loop in this case is totally acceptable but you should change the iterable values from `1:8` to `1:length(var.labels)` to keep things general. Is there a reason you don't want to use a for loop. – vincentmajor Jul 27 '16 at 21:55

3 Answers3

3

Thank you to lmo for your suggestion.

Mostly I was wanting to avoid for loops as much as possible - but I think you're right in this case the for loop may be ok to use. I just ran a system time analysis using the microbenchmark package and got the following...

library(microbenchmark)
microbenchmark(
  data1[] <- lapply(1:8, function(i) {
    # assign the label
    attr(data1[[names(var.labels)[i]]], "label") <-
      var.labels[names(var.labels)[i]]
    # return the vector
    data1[[names(var.labels)[i]]]
  })
)

which resulted in:

Unit: microseconds expr data1[] <- lapply(1:8, function(i) { attr(data1[[names(var.labels)[i]]], "label") <- var.labels[names(var.labels)[i]] data1[[names(var.labels)[i]]] }) min lq mean median uq max neval 380.291 412.986 638.6603 445.2185 661.102 2863.767 100

and the for loop ran faster...

microbenchmark(
  for(i in 1:8){
    attr(data1[[names(var.labels)[i]]],"label") <- var.labels[names(var.labels)[i]]
  }
)

which gave this time analysis:

Unit: microseconds expr for (i in 1:8) { attr(data1[[names(var.labels)[i]]], "label") <- var.labels[names(var.labels)[i]] } min lq mean median uq max neval 179.015 197.798 289.9299 209.624 278.7255 1186.783 100

Thank you for your feedback and consideration of my question.

0

In this instance, a for loop is probably the best method of adding in this material. As an example, here is roughly the "best" way to add in this material using lapply:

data1[] <- lapply(1:8, function(i) {
                                # assign the label
                                attr(data1[[names(var.labels)[i]]], "label") <-
                                     var.labels[names(var.labels)[i]]
                                # return the vector
                                data1[[names(var.labels)[i]]]
            })

To my eyes, this is less easy to read and involves some unnecessary stretching relative to your for loop.

lmo
  • 37,904
  • 9
  • 56
  • 69
0

As for loop considered good option, here's a minor improvement on the loop itself:

for(var in names(var.labels)) {
  attr(data1[[var]], "label") <- var.labels[var]
}

Improvements include:

  • Generalizable to as many labels as needed
  • Won't run into problems if you neglect to create a label for any variables
  • shorter code and easier to read (for me at least)
Simon Jackson
  • 3,134
  • 15
  • 24