How does one read a zipped csv file into a python polars
DataFrame?
The only current solution is writing the entire thing into memory and then passing it into pl.read_csv
.
How does one read a zipped csv file into a python polars
DataFrame?
The only current solution is writing the entire thing into memory and then passing it into pl.read_csv
.
From the documentation:
Path to a file or a file-like object. By file-like object, we refer to objects with a
read()
method, such as a file handler (e.g. via builtin open function) orStringIO
orBytesIO
. If fsspec is installed, it will be used to open remote files.
So, to read "my_file.csv" that is inside a "something.zip":
/something.zip
/my_file.csv
from zipfile import ZipFile
import polars as pl
zip_file = "something.zip"
pl.read_csv(
Zipfile("something.zip").read("my_file.csv")
)
Here, the use of .open
instead of .read
throws a FileNotFound
error.
However, it is still possible to use open
, we just need to call .read()
, as follows:
pl.read_csv(
Zipfile("something.zip").open("my_file.csv", method='r').read()
)
The difference lies in what read
vs open
return. As read
returns "file bytes for name" with the .read()
method already called. While open
returns a "file-like object for 'name'", a class ZipExtFile
, that does contain the .read()
method but this method is not called on the return of .open()
which means that in order to use it, we have to add it, as I do above.