1

I am struggling with creating efficient code for importing SAS data file.

My code is the follow:

library(foreign)
library(haven)
f <- file.path(path = "E:/Cohortdata/Raw cohort/Nationalscreeningcohort/01.jk", 
               c("nhis_heals_jk_2002.sas7bdat","nhis_heals_jk_2003.sas7bdat" ,"nhis_heals_jk_2004.sas7bdat",
                 "nhis_heals_jk_2005.sas7bdat","nhis_heals_jk_2006.sas7bdat","nhis_heals_jk_2007.sas7bdat",
                 "nhis_heals_jk_2008.sas7bdat","nhis_heals_jk_2009.sas7bdat","nhis_heals_jk_2010.sas7bdat",       "nhis_heals_jk_2011.sas7bdat","nhis_heals_jk_2012.sas7bdat","nhis_heals_jk_2013.sas7bdat"))
d <- lapply (f, read_sas)

I know rewriting it with for loop would be much more efficient, but don't know how the code should be look like

I would be very thankful if you help me.

Anchal Singh
  • 364
  • 4
  • 10
  • 1
    Are you trying to read *every* `sas7bdat` file in the folder? That would make it easier to simplify the code. – Marius Mar 18 '19 at 06:05
  • Yes, that exactly what I am trying to do. My sas7bdat files are stored in a folder. What I don't like in my code is that I am writing name of all sas files, but the desired code is with for loop. – Sangwon Steve Lee Mar 18 '19 at 06:10

1 Answers1

4

It's a variation of a code that I posted here but you can use it for SAS files too.

Please note that instead of using file.path() I used list.files(). That allowed me to read all the files in the path "E:/Cohortdata/Raw cohort/Nationalscreeningcohort", which is where I assumed your files are. In addition, I used the argument pattern to look only for sas7bdat files.

list.files() returns a vector, here you can use your *apply method that you'd like. However, I like changing the vector to tbl_df and to use the the tidyverse approach. Which means reading all the files using purrr::map() (part of tidyverse) and create a big data tbl_df of all of the files.

library(tidyverse)
library(foreign)
library(haven)

df <- list.files(path = "E:/Cohortdata/Raw cohort/Nationalscreeningcohort",
                 full.names = TRUE,
                 recursive = TRUE,
                 pattern = "*.sas7bdat") %>% 
  tbl_df() %>%
  mutate(data = map(value, read_sas)) %>%
  unnest(data) 
DJV
  • 4,743
  • 3
  • 19
  • 34