I found this piece of code here, which allows me to download a single file from an online zip file. It works miraculously but I don't understand how it works, especially how the class
works here (I only have some basic knowledge on class
). I simplified the original code a bit to get the below MWE.
import zipfile
import urllib2
DEBUG = True
def HTTPGetFileSize(url):
request = urllib2.Request(url)
page = urllib2.urlopen(request)
size = page.headers['content-length']
page.close()
return int(size)
def HTTPGetPartialData(url, f, t):
request = urllib2.Request(url)
request.headers['range'] = 'bytes=%u-%u' % (f, t)
partial_page = urllib2.urlopen(request)
partial_data = partial_page.read()
partial_page.close()
return partial_data
class MyFileWrapper:
def __init__(self, url):
self.url = url
self.position = 0
self.total_size = HTTPGetFileSize(url)
def seek(self, offset, whence):
if whence == 0:
self.position = offset
elif whence == 1:
self.position += offset
elif whence == 2:
self.position = self.total_size + offset
if DEBUG==True:
print "seek: (%u) %u -> %u" % (whence, offset, self.position)
pass
def tell(self):
if DEBUG==True:
print "tell: -> %u" % self.position
return self.position
def read(self, amount=-1):
if amount == -1:
amount = self.total_size - self.position
d = HTTPGetPartialData(self.url, self.position, self.position + amount - 1)
self.position += len(d)
if DEBUG==True:
print "read: %u %u -> %u" % (self.position - len(d), amount, self.position)
return d
url = 'http://the.url.that/contains/the/zipfiles.zip'
filename = 'the_name_of_the_file_I_need.csv'
f = MyFileWrapper(url)
print "class like object f is constructed"
z = zipfile.ZipFile(f)
print "f is read by zipfile and passed to z"
content = z.open(filename)
print "open filename, pass to content"
print content.read()
I have a lot of questions, but I am mainly confused by:
- How does my input
filename
ever get into all the functions? - What is the flow/order of the functions in this piece of codes? It seems after running
tell
function, the codes go back toseek
function again. - How are
offset
andwhence
initialized and updated?
Any help is appreciated.
EDIT: I include the debugged version of the code and below is the output of a sample test:
class like object f is constructed
seek: (2) 0 -> 34632410
tell: -> 34632410
seek: (2) -22 -> 34632388
read: 34632388 22 -> 34632410
seek: (2) -42 -> 34632368
read: 34632368 20 -> 34632388
seek: (0) 34622294 -> 34622294
read: 34622294 10094 -> 34632388
f is read by zipfile and passed to z
seek: (0) 34621363 -> 34621363
read: 34621363 30 -> 34621393
read: 34621393 41 -> 34621434
open filename, pass to content
read: 34621434 860 -> 34622294
....content of the filename.....