0

I have a zip archive path_to_zip_file in a read-only system. The tricky thing is that I need to unzip its content and open a CSV file testfile.csv that is included in the zip archive. Please notice that the zip archive includes many different files, but I only want to take a CSV file from it. My goal is to get the content of this CSV file into pandas dataframe df.

My code is shown below. Is there any way to update it in such a way that it can be executed in a read-only system? In other words, how can I run it in memory without writing to disk?

import zipfile
import pandas as pd

path_to_zip_file = "data/test.zip"
directory_to_extract_to = "result"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

csv_file_name = "testfile.csv"
df = pd.read_csv("{}/{}".format(directory_to_extract_to,csv_file_name), index_col=False)
Fluxy
  • 2,838
  • 6
  • 34
  • 63
  • did you have a look at [this post](https://stackoverflow.com/questions/18885175/read-a-zipped-file-as-a-pandas-dataframe)? I don't know if it is possible to do it when there are several csv in one zip file though. – Ben.T Oct 15 '21 at 20:39
  • @Ben.T: I have a single CSF file in a ZIP archive, but there are also many non-CSV files there. – Fluxy Oct 15 '21 at 20:46
  • @Ben.T: Thanks, let me try this solution. – Fluxy Oct 15 '21 at 20:46

2 Answers2

2

Using ZipFile.open on the already opened archive, we can do just that:

import zipfile
import pandas as pd

with zipfile.ZipFile("archive.zip") as archive:
    with archive.open("testing.txt") as csv:
        df = pd.read_csv(csv)

print(df)
Hampus Larsson
  • 3,050
  • 2
  • 14
  • 20
1

Easy way to do it is to extract it to /tmp, which is a directory in RAM. You could also use python's tempfile library to create a temporary directory and extract it there (it will probably just create a directory in /tmp)

Thomas Q
  • 850
  • 4
  • 10