1

The following reproducible code is giving me: BadZipFile("File is not a zip file") error. I don't know how to fix it.


import pandas as pd
from io import StringIO, BytesIO, TextIOWrapper
from zipfile import ZipFile
import urllib

url = 'https://drive.google.com/file/d/1gQGYF8TaznCfdevo7-MsHW5IcwftZWP_/view?usp=sharing'
zip_file_name = 'high_dimensional_datasets.zip'
resp = urllib.request.urlopen(url +  urllib.request.quote(zip_file_name))

zipfile = ZipFile(BytesIO(resp.read()))

data = TextIOWrapper(zipfile.open('high42.csv'), encoding='utf-8')

df = pd.read_csv(data)
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
Georg_Z
  • 15
  • 3
  • Well, have you checked that `url + urllib.request.quote(zip_file_name)` gives you the url to a valid zip file? Write the output of `resp.read()` to a file. Can you open it using a zip program? If not, then the file is not a zip file, and we don't know how to fix it either. – Pranav Hosangadi Oct 26 '21 at 22:23
  • @PranavHosangadiyes the link leads to a zip file where upon clicking download it is downloaded. It seems however that the URL is wrong. This is the reason I posted it, so you can try it. It is a valid zip file because I have downloaded and extracted it manually. – Georg_Z Oct 27 '21 at 16:56
  • The link leads to a zip file _in your browser_. Did you inspect the output of `resp.read()`? It is most definitely **not** a zip file. – Pranav Hosangadi Oct 27 '21 at 17:00
  • This is the output: `resp.read() Out[19]: b''` I don't know how to write it to a file, but this output is also the same with a working example. – Georg_Z Oct 27 '21 at 17:06
  • You can only read the response _once_. The first time you read it, you get a html file with a whole bunch of javascript. I expect the javascript renders the page and redirects you to download the zip file, but it is not a zip file, which is why you get the error. Have you tried using the API instead of accessing web pages? https://pypi.org/project/PyDrive/ – Pranav Hosangadi Oct 27 '21 at 17:13

0 Answers0