7

I am trying to read a .sas7bdat file in R. When I use the command

library(sas7bdat)
read.sas7bdat("filename")

I get the following error:

Error in read.sas7bdat("county2.sas7bdat") : file contains compressed data

I do not have experience with SAS, so any help will be highly appreciated.

Thanks!

Michael Ohlrogge
  • 10,559
  • 5
  • 48
  • 76
user3641630
  • 311
  • 1
  • 3
  • 11

5 Answers5

8

According to the sas7bdat vignette [vignette('sas7bdat')], COMPRESS=BINARY (or COMPRESS=YES) is not currently supported as of 2013 (and this was the vignette active on 6/16/2014 when I wrote this). COMPRESS=CHAR is supported.

These are basically internal compression routines, intended to make filesizes smaller. They're not as good as gz or similar (not nearly as good), but they're supported by SAS transparently while writing SAS programs. Obviously they change the file format significantly, hence the lack of implementation yet.

If you have SAS, you need to write these to an uncompressed dataset.

options compress=no;
libname lib '//drive/path/to/files';
data lib.want;
set lib.have;
run;

That's the simplest way (of many), assuming you have a libname defined as lib as above and change have and want to names that are correct (have should be the filename without extension of the file, in most cases; want can be changed to anything logical with A-Z or underscore only, and 32 or fewer characters).

If you don't have SAS, you'll have to ask your data provided to make the data available uncompressed, or as a different format. If you're getting this from a PUDS somewhere on the web, you might post where you're getting it from and there might be a way to help you identify an uncompressed source.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • I do not have SAS but good to know.I might have to give it a shot.Thanks very much! – user3641630 Jun 16 '14 at 16:13
  • For future reference, dsread and DsShell (both available at http://www.oview.co.uk) support compressed data sets using either BIN or CHAR compression. – Chris Long Jan 05 '15 at 12:09
7

This admittedly is not a pure R solution, but in many situations (e.g. if you aren't on a pc and don't have the ability to write the SAS file yourself) the other solutions posted are not workable.

Fortunately, Python has a module (https://pypi.python.org/pypi/sas7bdat) which supports reading compressed SAS data sets - it's certainly better using this than needing to acquire SAS if you don't already have it. Once you extract the file and save it to text via Python, you can then access it in R.

from sas7bdat import SAS7BDAT
import pandas as pd

InFileName = "myfile.sas7bdat"
OutFileName = "myfile.txt"

with SAS7BDAT(InFileName) as f:
    df = f.to_data_frame()

df.to_csv(path_or_buf = OutFileName, sep = "\t", encoding = 'utf-8', index = False)
Michael Ohlrogge
  • 10,559
  • 5
  • 48
  • 76
6

The haven package can read compressed SAS-files:

library(haven)
df <- read_sas("sasfile.sas7bdat")

But only SAS-files which are compressed using compress=char, but not compress=binary.

So haven will be able to read this SAS-file:

data output.compressed_data_char (compress=char);
set inputdata;
run;

But not this SAS-file:

data output.compressed_data_binary (compress=binary);
set inputdata;
run;

https://cran.r-project.org/package=haven

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001002773.htm

Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79
4

"RevoScaleR" is a good package to read SAS data sets (compressed or uncompressed).You can use rxImport function of this package. Below is the example

Importing library

library(RevoScaleR)

Reading data

R_df_name <- rxImport("fake_path/file_name.sas7bdat")

The speed of this function is far better than haven/sas7bdat/sas7bdat.parso. I hope this helps anyone who struggles to read SAS data sets in R.

Cheers!

Atendra Gautam
  • 465
  • 3
  • 11
0

I found R to be the easiest for this kind of challenge, especially with compressed sas7dbat files, three simple lines:

library(haven)
data <- read_sas("yourfile.sas7dbat")

and then transform it to csv

write.csv(data,"data.csv")
Hamad Alibrahim
  • 137
  • 1
  • 3