1

I have a zipped file which contains a csv, compressed with xz. I want to unzip it into the memory, and read wit pandas' read_csv method. Pandas knows xz compression

data = pd.read_csv(filepath_or_buffer=file, index_col=0, compression='xz', engine='c')

I know how to unzip a file

input_zip=ZipFile(zip_file)
    input_zip=ZipFile(zip_file)

    file in input_zip.namelist():

But I do not know how to glue the two code together

Solution:

    input_zip=ZipFile(input_zip)

    for filename in input_zip.namelist():
        bytes = input_zip.read(filename)
        data = pd.read_csv(io.BytesIO(bytes), index_col=0, compression='xz', engine='c')
Paolo
  • 117
  • 8
  • `libarchive` supports reading from an archive without extracting on disk. You can use a `StringIO` object to pass it to pandas from there – Marat Dec 13 '21 at 18:31
  • What does the zipped file contain, i.e. what is the result of `input_zip.namelist()`? – Rodalm Dec 13 '21 at 18:32
  • the zipped file contains an archive, compressed with xz. `input_zip.namelist()` gives back the name of this file. – Paolo Dec 13 '21 at 18:53
  • @Marat could you please show an example? – Paolo Dec 13 '21 at 19:01
  • 1
    Sorry, I misunderstood the question and thought you want to extract content of an `.xz` file into memory. I see you worked out a solution already – Marat Dec 13 '21 at 19:49

0 Answers0