1

I would like to import a file in .sas7bdat into R. The file is very big, so I need to import only the rows that match given criteria.

With a file .csv, the code would be

library(sqldf)
read.csv.sql("mtcars.csv", sql = "select * from file where carb in (1,2)" )

How can I do the same thing, but for a .sas7bdat file ?

I already tried with sqldf package :

library(sqldf)
data <- file("file.sas7bdat")
sqldf("select * from data where CONDITION1 eq 'CRITERIA1'")

However, it does not work and it says:

Error in utils::read.table(value, sep = sep, header = header, skip = skip, : duplicate 'row.names' are not allowed

Siva
  • 1,481
  • 1
  • 18
  • 29
John
  • 11
  • 3
  • 1
    Is it impossible to import the whole thing or you just don't want too? Have you looked into the package `haven`? – Mike May 03 '19 at 14:46
  • I want to import only the rows that match input criteria (and, by the way, since the database is huge, I think it is also lighter to do so). I looked into `haven` but I didn't find the solution. I know how to do it in csv, but not with a file in .sas7bdat. Is there anyone who can help me? – John May 14 '19 at 14:28
  • Do you have access to SAS on your machine or just have a large SAS file with no version of SAS on your computer? – Mike May 14 '19 at 16:02
  • I have access to SAS on my machine but I do not want to filter the dataset using SAS. I would like to do it only by using R. I know how to do it for a csv and for a connection with a SAS server. However, I do not know how to do it when I am importing a sas7bdat file. – John May 15 '19 at 12:51
  • I looked around and couldn't find a good solution to subset rows. However `haven::read_sas()` has a `cols_only` input argument that will only read in select columns. This would help with speed. You could also look into `foreign` or `SASxport`. example using haven - `cnames <- colnames(mtcars)[1:5] haven::read_sas("G:/mtcars.sas7bdat", cols_only = cnames)` – Mike May 15 '19 at 13:49

0 Answers0