I have multiple large (>10GB) SAS datasets that I want to convert for use in pandas, preferably in HDF5. There are many different data types (dates, numerical, text) and some numerical fields also have different error codes for missing values (i.e. values can be ., .E, .C, etc.) I'm hoping to keep the column names and label metadata as well. Has anyone found an efficient way to do this?
I tried using MySQL as a bridge between the two, but I got some Out of range errors when transferring, plus it was incredibly slow. I also tried export from SAS in Stata .dta format, but SAS (9.3) exports in an old Stata format that is not compatible with read_stat() in pandas. I also tried the sas7bdat package, but from the description it has not been widely tested so I'd like to load the datasets another way and compare the results to make sure everything is working properly.
Extra details: the datasets I'm looking to convert are those from CRSP, Compustat, IBES and TFN from WRDS.