I am trying to read a 34Gb stata file but getting an error. So just to make sure I tried the same code on an 11Mb file.
The code is:
import pyreadstat
dtafile = 'E:/Work/test file.dta'
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_dta, dtafile, chunksize= 5, limit= 1)
for df,meta in reader:
print (df)
And I got correct output as:
app_id inventor_id ... lagged_generality_FYnormalized _merge
0 101985 ... 1.038381 3
1 102019 SCHOTTEK 2827 ... 0.830110 3
2 102019 KUELLMER 2827 ... 0.830110 3
3 102019 DICKNER 2827 ... 0.830110 3
4 102562 VINEGAR 986 ... 0.825088 3
[5 rows x 1448 columns]
Process finished with exit code 0
But when I am doing the same thing with the 34Gb file then I am getting the following error:
Traceback (most recent call last):
File "C:\Users\Gaju\PycharmProjects\first project\work.py", line 77, in <module>
for df,meta in reader:
File "pyreadstat\pyreadstat.pyx", line 661, in read_file_in_chunks
File "pyreadstat\pyreadstat.pyx", line 276, in pyreadstat.pyreadstat.read_dta
File "pyreadstat\_readstat_parser.pyx", line 1080, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat\_readstat_parser.pyx", line 864, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat\_readstat_parser.pyx", line 794, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Invalid file, or file has unsupported features
Process finished with exit code 1
I know that the both (the test file and the 34Gb file) are similar and are made in stata but I am still unable to understand what is going wrong?