Retrieving a file from a remote zip, HTTP, Python, sockets

Question

I want to retrieve a list of file names inside a zip file on a remote server, and a specific file (from this zip) without downloading the entire zip content, using the HTTP range method as described at http://www.codeproject.com/KB/cs/remotezip.aspx.

I know how to use the Range: ... header, I know how to, for instance, download the image in parts and concatenate the parts together to make up an image:

create socket
connect
send request: HEAD %s HTTP/1.1\r\nHOST: %s\r\n\r\n receive headers only and find inside Content-Length (I know it's not always the best option, but my server supports it) in order to know the size of the image we need to download
send GET with Range: ..., and receive response
repeat step 4 until you download the whole image

However, I do not know how can I accomplish the task I described (the one with zips). I saw this topic: Is there a library for retrieving a file from a remote zip? but there people use libraries. It's not my goal. I would like to know how raw HTTP operates on zips.

Steffen Ullrich · Answer 1 · 2017-04-15T05:21:53.800

1

There is nothing special to ZIP in HTTP. A ZIP file is just data, like anything else. And HTTP Range with ZIP is the same as HTTP Range with images, i.e. just retrieve a byte range from some remote data.

Extracting only a single file from a remote ZIP without downloading the whole file can done the same way as with doing the same thing with a local file or with a ZIP archive spread over multiple media: First you have to get the central directory from the end of the file and then you can get the bytes for the specific file because the necessary offsets are defined in the central directory. And once you have the compressed file you can decompress it. There is nothing specific to HTTP here except the way you get a range of bytes but you'll already realized that you can do this with HTTP Range header.

Thus just follow the implementation in the article you've linked and replace everything where you don't like to have a library with your own re-implementation of the specific parts, i.e. re-implement zlib compression if needed and re-implement HTTP if needed. Using HTTP Range do get parts of the ZIP file is no different from using HTTP Range to get part of an image and you already know how do do the last one.

edited Apr 15 '17 at 05:21

answered Apr 15 '17 at 05:16

Steffen Ullrich

114,247
10
131
172

Ok, it seems easy. Do you perhaps know the way how can I extract and parse zip file without any library? – Brian Brown Apr 15 '17 at 10:33
@BrianBrown: I'm not sure if I understand you. Do you have problems to find the ZIP format description (see the article you've linked to) or the description of deflate compression (see [RFC 1951](https://www.ietf.org/rfc/rfc1951.txt)) or do you expect me to write the code? The last one would not only be off-topic but way too long. It's definitely not a simple task doing all of these without libraries. – Steffen Ullrich Apr 15 '17 at 11:00
I found the zip format at wiki page. No, I do not ask you about writing the code, I want to make it myself. But I have problems to understand what I need to know, and how to start. An example pseudocode at least would be very helpful. Thank you. – Brian Brown Apr 15 '17 at 11:07
@BrianBrown: Even the pseudo code could easily span 100s of lines (i.e. too broad) unless you can narrow down which part exactly you have problems to understand. There are basically three major parts: determining the byte range needed (see ZIP format), getting the byte range (HTTP range request) and decompressing (deflate/zlib algorithm). – Steffen Ullrich Apr 15 '17 at 11:29

Retrieving a file from a remote zip, HTTP, Python, sockets

1 Answers1