Converting large SAS dataset to hdf5

Question

I have multiple large (>10GB) SAS datasets that I want to convert for use in pandas, preferably in HDF5. There are many different data types (dates, numerical, text) and some numerical fields also have different error codes for missing values (i.e. values can be ., .E, .C, etc.) I'm hoping to keep the column names and label metadata as well. Has anyone found an efficient way to do this?

I tried using MySQL as a bridge between the two, but I got some Out of range errors when transferring, plus it was incredibly slow. I also tried export from SAS in Stata .dta format, but SAS (9.3) exports in an old Stata format that is not compatible with read_stat() in pandas. I also tried the sas7bdat package, but from the description it has not been widely tested so I'd like to load the datasets another way and compare the results to make sure everything is working properly.

Extra details: the datasets I'm looking to convert are those from CRSP, Compustat, IBES and TFN from WRDS.

score 1 · Accepted Answer · answered Feb 10 '14 at 02:32

1

I haven't had much luck with this in the past. We (where I work) just use Tab separated files for transport between SAS and Python -- and we do it a lot.

That said, if you are on Windows, you can attempt to setup an ODBC connection and write the file that way.

answered Feb 10 '14 at 02:32

DomPazz

12,415
17
23

1

Out to csv/delimited files of some sort is probably the best bet. SAS have a vested interest in preventing interoperability, so I don't like your chances of getting a good and efficient transfer. Though a commercial product, I have heard good reports of people using this program: https://www.stattransfer.com/stattransfer/formats.html – thelatemail Feb 10 '14 at 04:24

score 0 · Answer 2 · answered Jul 02 '14 at 15:54

You might be interested in the dirty hack used in a fork of sas7bdat. It provides a read_sas method to read sas files into a pandas data frame.

original sas7bdat : http://git.pyhacker.com/sas7bdat

fork with read_sas : https://github.com/openfisca/sas7bdat

Improvements are welcome !

Converting large SAS dataset to hdf5

2 Answers2