0

How does one read a zipped csv file into a python polars DataFrame?

The only current solution is writing the entire thing into memory and then passing it into pl.read_csv.

Test
  • 962
  • 9
  • 26
  • 1
    I think you could use a reader. I.e. something like this? ```with file.open(csv_path) as fpz: with gzip.open(fpz) as fp: df = pl.read_csv(fp)``` – jvz Mar 27 '22 at 08:23
  • Apologies, it's zipped, not gzipped – Test Mar 27 '22 at 17:14

1 Answers1

1

Read a zipped csv file into Polars Dataframe without extracting the file

From the documentation:

Path to a file or a file-like object. By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO or BytesIO. If fsspec is installed, it will be used to open remote files.

So, to read "my_file.csv" that is inside a "something.zip":

/something.zip

    /my_file.csv
from zipfile import ZipFile
import polars as pl

zip_file = "something.zip"

pl.read_csv(
Zipfile("something.zip").read("my_file.csv")
)

Here, the use of .open instead of .read throws a FileNotFound error. However, it is still possible to use open, we just need to call .read(), as follows:

pl.read_csv(
Zipfile("something.zip").open("my_file.csv", method='r').read()
)

The difference lies in what read vs open return. As read returns "file bytes for name" with the .read() method already called. While open returns a "file-like object for 'name'", a class ZipExtFile, that does contain the .read() method but this method is not called on the return of .open() which means that in order to use it, we have to add it, as I do above.