The SASData object has to_csv
and to_df_CSV
methods, but both of these write to locations on the host machine (where the SAS session is running). Is there a way to write a large SAS data table to .CSV on my local machine from a remote connection? The option on the saspy Github page was to get the DataFrame, via to_df
, and then write that to .CSV, but this won't work in my case because the SAS table is bigger than memory. Must I read and write in chunks?
Asked
Active
Viewed 1,656 times
2

kingfischer
- 408
- 2
- 13
2 Answers
1
Here's an example of what I'm suggesting.
tom64-3> python3.5
Python 3.5.5 (default, Feb 6 2018, 10:56:47)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import saspy
>>> sas = saspy.SASsession(cfgname='iomjwin')
SAS Connection established. Subprocess id is 3681
No encoding value provided. Will try to determine the correct encoding.
Setting encoding to cp1252 based upon the SAS session encoding value of wlatin1.
>>> sas
Access Method = IOM
SAS Config name = iomjwin
WORK Path = C:\Users\sastpw\AppData\Local\Temp\SAS Temporary Files\_TD20052_d10a626_\Prc4\
SAS Version = 9.04.01M4P11142016
SASPy Version = 2.4.3
Teach me SAS = False
Batch = False
Results = Pandas
SAS Session Encoding = wlatin1
Python Encoding value = cp1252
SAS process Pid value = 20052
>>> cars = sas.sasdata('cars', 'sashelp')
>>> cars.head()
Make Model Type Origin DriveTrain MSRP Invoice EngineSize \
0 Acura MDX SUV Asia All 36945 33337 3.5
1 Acura RSX Type S 2dr Sedan Asia Front 23820 21761 2.0
2 Acura TSX 4dr Sedan Asia Front 26990 24647 2.4
3 Acura TL 4dr Sedan Asia Front 33195 30299 3.2
4 Acura 3.5 RL 4dr Sedan Asia Front 43755 39014 3.5
Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 6 265 17 23 4451 106 189
1 4 200 24 31 2778 101 172
2 4 200 22 29 3230 105 183
3 6 270 20 28 3575 108 186
4 6 225 18 24 3880 115 197
>>> cars.to_csv(sas.workpath+'\cars.csv')
11 The SAS System 11:15 Tuesday, February 19, 2019
99
100 options nosource;
NOTE: The file X is:
Filename=C:\Users\sastpw\AppData\Local\Temp\SAS Temporary Files\_TD20052_d10a626_\Prc4\cars.csv,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=19Feb2019:11:16:58,
Create Time=19Feb2019:11:16:58
NOTE: 429 records were written to the file X.
The minimum record length was 68.
The maximum record length was 123.
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
428 records created in X from SASHELP.CARS.
NOTE: "X" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 1.49 seconds
cpu time 0.21 seconds
>>> res = sas.download('/u/sastpw', r'C:\Users\sastpw\AppData\Local\Temp\SAS Temporary Files\_TD20052_d10a626_\Prc4\cars.csv')
>>> print(res['LOG'])
19 The SAS System 11:15 Tuesday, February 19, 2019
278
279 filename _sp_updn 'C:\Users\sastpw\AppData\Local\Temp\SAS Temporary Files\_TD20052_d10a626_\Prc4\cars.csv' recfm=F encoding=binary lrecl=4096
280
281
20 The SAS System 11:15 Tuesday, February 19, 2019
284
21 The SAS System 11:15 Tuesday, February 19, 2019
287
288 filename _sp_updn;
NOTE: Fileref _SP_UPDN has been deassigned.
289
290
>>>
SAS Connection terminated. Subprocess id was 3681
tom64-3> ll /u/sastpw/cars.csv
-rw-r--r-- 1 sastpw r&d 38142 Feb 19 11:18 /u/sastpw/cars.csv
tom64-3>
tom64-3> head /u/sastpw/cars.csv
Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
Acura,MDX,SUV,Asia,All,"$36,945","$33,337",3.5,6,265,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820","$21,761",2,4,200,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990","$24,647",2.4,4,200,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195","$30,299",3.2,6,270,20,28,3575,108,186
Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755","$39,014",3.5,6,225,18,24,3880,115,197
Acura,3.5 RL w/Navigation 4dr,Sedan,Asia,Front,"$46,100","$41,100",3.5,6,225,18,24,3893,115,197
Acura,NSX coupe 2dr manual S,Sports,Asia,Rear,"$89,765","$79,978",3.2,6,290,17,24,3153,100,174
Audi,A4 1.8T 4dr,Sedan,Europe,Front,"$25,940","$23,508",1.8,4,170,22,31,3252,104,179
Audi,A41.8T convertible 2dr,Sedan,Europe,Front,"$35,940","$32,506",1.8,4,170,23,30,3638,105,180
tom64-3>

Tom
- 68
- 6
-
And, if for some reason, you couldn't use the latest version w/ the download() method,you could still accomplish this with the existing to_df_CSV() method, though it's a kludge and I wouldn't really suggest this, to_df_CSV takes 2 optional parameters; the local file to create (the .csv file copied over from the server), and whether to keep or delete that local csv file. – Tom Feb 19 '19 at 19:47
-
Again, you can't hit enter :( here's the signature to_df_CSV(tempfile: str = None, tempkeep: bool = False, **kwargs) → pandas.core.frame.DataFrame – Tom Feb 19 '19 at 19:48
-
Arrg, so you can do this with your current version try: cars.to_df_CSV('/u/sastpw/cars.csv', keeptemp=True) except: pass That will get the same first 2 steps as to_csv and download and catch the abend trying to create the data frame, leaving the local csv file. Of course this is a hack, but it would work. I still suggest getting 2.4.3 and using download() though. Tom – Tom Feb 19 '19 at 19:51
-
sorry, new to this. I'll attempt to format this and see if it works: ``` try: cars.to_df_CSV('/u/sastpw/cars.csv', keeptemp=True) except: pass ``` hope that works – Tom Feb 19 '19 at 19:53
-
Thanks @Tom, download() is exactly what we're looking for. I had considered the try-except using to_df_CSV(), but it would try to fill up memory before failing, and this code is to be used by others and not only myself. I'll see if I can get 2.4.3, otherwise I can copy the code into a local function for the time being. Cheers! – kingfischer Feb 21 '19 at 09:36
0
With saspy V2.4.3, you could try to_csv() to create the csv file on the SAS server, then download(), which is new in 2.4.3, to pull the csv over to your local filesystem.

Tom
- 68
- 6
-
Thanks @Tom. I've been looking into the saspy code. to_csv() let's you create a csv file on the remote machine, but to_df_CSV() (or its helper function sasdata2dataframeCSV()) actually seem to write the csv to my local machine from the remote session. The issue with this function is that it crashes when it executes its final portion, pd.read_csv(), which then causes an out-of-memory error on large datasets. I was looking for a way to stop building the pd.DataFrame, or just move the csv writing to a new function, but I don't completely understand how it works. – kingfischer Feb 19 '19 at 12:26
-
Yes, to_CSV() runs on the SAS server to write the SAS Data Set to the Servers filesystem as a .csv file. The to_df_CSV is to pull the SAS data Set over as a data frame into python, as you see. to_df() and to_df_CSV() are just 2 different mechanisms to move SAS data to Python as a data frame. – Tom Feb 19 '19 at 16:11
-
Whoops, can't hit enter here apparently. In the current version of saspy 2.4.3, there's a new method that you can try to do what you want. Use to_csv() to write the SAS Data Set out as a .csv file on the server, then use the download() method (off the SASsession object) to download that csv file from the server file system to your local filesystem (where saspy is running). It's just a file copy, not pandas or anything in python. It sounded like that's what you were trying to do. – Tom Feb 19 '19 at 16:14