Your main problem is that after going:
decompressed = dctx.decompress(data)
The variable decompress
now contains the whole un-compressed data (so the content itself of the csv.zst.
And then when you do:
with open(decompressed, 'rb') as f:
You are trying to open a file whose name is "{content of your csv}".
What you are thinking about is making an input stream of the decompressed data. Module io's StringIO is what you would be looking for. You pass it a text content, you get a file-like object that works as if it was coming from a file opened with open()
:
import io
with io.StringIO(decompressed) as f:
csv_data = f.read()
csv = pd.read_csv(csv_data)
# crashes here:---^
Except that, THIS WILL crash too, because read_csv()
is considering strings as being a "path", so again it will be looking a file whose name is "{content of your csv}".
If you want to pass a block of text to csv_read, you need to pass the f object itself:
import io
with io.StringIO(decompressed) as f:
csv = pd.read_csv(f)
This will work, EXCEPT THAT, read _csv can also decompress files.
So with recent pandas you can actually completely skip the whole "decompression" part, and directly give the file name. Pandas will take care of decompressing:
csv = pd.read_csv(zst_datapath)
note that different compression scheme requires different dependencies to be installed to work.
Hope that this helps.