0

I am working on python on some data get from a SAS server. I am currently using SASPY to_df() function to bring it from SAS to local pandas.

I would like to know if its possible to filter/query the data that is being transferred so I could avoid bringing unneeded that and speeding up my download.

I couldn't find anything on saspy documentation, it only offers the possibility of using "**kwargs" but I couldn't figure out how to do it.

Thanks.

FábioRB
  • 335
  • 1
  • 12
  • 2
    Wouldn't that be part of the definition of the dataset object that you want to run for to_df() method on? Use the WHERE= dataset option to "filter" observations. Use the KEEP= or DROP= option to limit the variables ("columns") used. – Tom Feb 03 '23 at 17:52
  • Did you mean using drop or keep on pandas o my local machine? My focus is to reduced the amount of data to speed up the download. – FábioRB Feb 04 '23 at 20:14
  • 1
    SAS datasets are stored in files. So you will have to read all of the file. But you do not have to transfer all of the contents of the file. By defining the dataset object to only use some of the variables or some of the observations you will limit the amount of information that has to move from the SAS server to your python session. – Tom Feb 04 '23 at 21:37

1 Answers1

1

You need to define the sasdata object using the WHERE= dataset option to limit the observations pulled.

https://sassoftware.github.io/saspy/api.html#saspy.sasdata.SASdata

Then when you use the to_df() method only the selected data will be transferred.

You can also use the KEEP= or DROP= dataset option to limit the variables that are transferred. Remember that in order to reference any variables in the WHERE= option they have to be kept.

The "**kwargs" looks to be about changing how you connect to the SAS server, so that is not important for what you want.

Tom
  • 47,574
  • 2
  • 16
  • 29