I have a raw SAS file that is around 16GB, and even after keeping columns relevant to my problem, the file size comes to around 8GB. It kind of looks like this:
CUST_ID FIELD_1 FIELD_2 FIELD_3 ... FIELD_7
1 65 786 ABC Y
2 87 785 GHI N
3 88 877 YUI Y
...
9999999 92 767 XYS Y
When I tried to import it into Python using the code:
df=pd.read_sas(path,format='SAS7BDAT')
my screen turned black, and after multiple attempts I finally got the error MemoryError
.
Since I need the entire set of CUST_ID
for my problem, selecting only a sample and deleting other rows is out of the question.
I thought maybe I could split this entire file into multiple sub-files so that I can carry out all the required calculations that I need to, and then finally reunite these files into a single large file after completing all necessary work.
Is there any way to solve this issue? I really appreciate all the help that I can get!
Edit:
I've tried this
chunk_list=[]
for chunk in df_chunk
chunk_filter=chunk
chunk_list.append(chunk_filter)
df_concat=pd.concat(chunk_list)
But I'm still getting a Memory Error
. Any help??