Create Zipfile in Python in memory on Web Server

Question

I'm working on a HTML WYSIWYG editor, and I am currently working on a 'Download' feature where the user can press a Download button to download a zip file of their theme. I am using a Python CGI script to achieve this feature. Currently, my script makes a zip file and prompts the user to Download it, but when I try to decompress the zip file, it only creates another zip file with the extension '.cpgz'. I believe that my script did not create the zip properly.

I am using the 'zipfile' module to create a zipfile object in memory instead of on Disk, the 'StringIO' module to create a filelike object in memory, and the 'cgi' module to accept POST data from an Ajax request.

My problem is in my for-loop. The 'zf' zip file isn't adding the files and sub-directories from the 'layoutDir' parameter I passed into os.walk(). The script will prompt the browser to download the zip file, but I am unable to decompress it.

#!/usr/bin/python

import sys
import os
import zipfile
import StringIO
import cgitb

cgitb.enable()

layoutDir = 'http://localhost:8888/funWYSIWYG/public/views/layouts/Marketing'

tmpZip = StringIO.StringIO() 
zf = zipfile.ZipFile(tmpZip, 'w', zipfile.ZIP_DEFLATED)

for root, dirs, files in os.walk(layoutDir):
    for name in files:
        absfn = os.path.join(root, name)
        relfn = absfn[len(layoutDir) + len(os.sep):]
        zf.write(absfn, relfn)

zf.close()

sys.stdout.write("Content-Type: application/octet-stream\r\n")
sys.stdout.write("Content-Disposition: attachment; filename=\"funWYSIWYG-Marketing.zip\"\r\n\r\n")

sys.stdout.write(tmpZip.getvalue())

# Close opened file
tmpZip.close()

UPDATE 1: I got rid of some of the irrelevant stuff that was in my code. I also correct the typo with 'absfn' and 'adsfn'. The code above now represents exactly what I have in my local code editor. I am still having the same problem with being unable to decompress the zip file that is made.

UPDATE 2: Here is what the 'Marketing' directory looks like on my computer.

|---- Marketing
      |---- css
      |    |---- default.css
      | 
      |---- img
      |
      |---- index.html

As a side note, [`os.path.relpath`](http://docs.python.org/2/library/os.path.html#os.path.relpath) is a much better way to get a relative path than trying to work out the length of the prefix and slicing it off manually. — abarnert, Jan 06 '14 at 05:42
Also, `\r\n\n` is almost never what you want in any context. If you need to send CRLF, you need `\r\n\r\n`. If you don't, you just want `\n\n`. I can't think of any case where you want some CRLFs and some plain LFs. — abarnert, Jan 06 '14 at 05:44
Finally, is this your actual code? If so, your problem is a simple typo: `adsfn` will raise a `NameError` because there is no such variable. If not, then please give us an [SSCCE](http://sscce.org) that actually runs and demonstrates the problem. (Another part of it being an SSCCE is stripping out the irrelevant stuff. Unless you think the CGI stuff is related to the problem, give us a simpler program that just creates the zipfile and, say, saves it to a local file.) — abarnert, Jan 06 '14 at 05:48

abarnert · Answer 1 · 2014-01-06T20:25:57.733

1

If this is your actual code, your problem is a simple typo:

zf.write(adsfn, relfn)

You don't have a variable named adsfn, you have one named absfn. So this will raise a NameError and fail to return anything.

If I fix that, then run this code with layoutDir set to a reasonable relative path with some kind of hierarchy in it, and store the resulting in-memory zipfile to disk like this:

with open('foo.zip', 'wb') as f:
    f.write(tmpZip.getvalue())

… then I end up with a zipfile with all of the files stored in it properly, meaning there is no problem at all.

So, if that typo isn't your problem, then whatever is your problem appears to be some other difference between the code you posted and the actual code we can't see.

It seems like your actual problem is that in your real code, you're trying to use an absolute or relative URL, like http://localhost:8888/path-to/Marketing, as the layoutDir. URLs and paths aren't the same thing. If you try to use that as a path one, it will be effectively the same as ./http:/localhost:8888/path-to-Marketing. You almost certainly don't have a directory named http: in the current working directory, so os.walk will just yield nothing, meaning you'll end up creating an empty zip file.

If the files you're trying to add are actually available at some (relative or absolute) path, use that path in place of the URL, and your problem will go away.

If they're only available via HTTP, then what you're trying to do is impossible; there is no way to walk the "subdirectories" of an HTTP URL; the concept isn't even meaningful. In many cases, of course, web servers map parts of the URL path to some filesystem path and provide some way to navigate that filesystem indirectly (e.g., by having an option to auto-generate an index.html page full of links), but to take advantage of that you need to know exactly how the server in question is exposing that information, then writing scraping code to take advantage of it. And then, even once you have all the links, you can't pass a URL to zipfile.write, only a file. Which means you have to download each URL (e.g., read it into memory and then writestr the result).

edited Jan 06 '14 at 20:25

answered Jan 06 '14 at 05:51

abarnert

354,177
51
601
671

Could you show me what relative path you used? I appended the code that you just wrote to the bottom my script, but it created a zip file on server that I am still unable to decompress. – Rashad Jan 06 '14 at 06:46
The relative path I used was `foo`. I created a directory named `foo` alongside the script, added a directory named `bar` inside it, and added 1-line text files named `spam` and `eggs` to both `foo` and `foo/bar`. The resulting `foo.zip` is a zip file that reproduces that structure (when expanded with either Python, or standard InfoZip `unzip` 5.52). – abarnert Jan 06 '14 at 07:12
As I was experimenting with your method, and I think I discovered what the actual problem is. I changed layoutDir to a relative path reflecting what's on my local disk, and the script works. It makes a zipfile of the content in the Marketing folder, and I am able to decompress. However, I am trying to mimic this same action on a web server. When I change layoutDir to a relative path on my webserver or to an absolute path at `http://localhost:8888/path-to/Marketing` it doesn't work. – Rashad Jan 06 '14 at 20:12
@Rashad: `http://localhost:8888/path-to/Marketing` isn't a path, it's a URL. So, if that's what you're doing, that's your problem. Let me edit the answer. – abarnert Jan 06 '14 at 20:16
I don't think the zipfile.write(absfn, relfn) works if absfn is an HTTP absolute path or a relative path on the web server to the Marketing directory. – Rashad Jan 06 '14 at 20:18
The relative path from my script to the Marketing directory on my web server is: `../funWYSIWYG/public/views/layouts/Marketing/`. On my local disk it is: `../htdocs/funWYSIWYG/public/views/layouts/Marketing/`. – Rashad Jan 06 '14 at 20:21
@Rashad: That's exactly what I just got done explaining. `zipfile.write` will not work with a URL, and neither will `os.walk` in the first place. You need filesystem paths, not HTTP URLs. – abarnert Jan 06 '14 at 20:28
You are right. I started experimenting with urllib, and I am now about to grab the files from my web server. I am using zipfile.writestr() instead of zipfile.write, to write the contents of the response. That fixed the problem. My WYSIWYIG is now able to edit a HTML theme, and prompt the browser to download it into a zip using HTTP Headers. Although I am not finished yet, I still need to figure out how to traverse though the directories on my web server just like I did on my filesystem os.walk(). I may need to used something like wget or BeautifulSoup. Thank you for your help. – Rashad Jan 06 '14 at 22:33
@Rashad: As I explained in my answer, in general there is no way to "traverse the directories on a web server". Some web sites may have a way to do it, but it will be specific to that site, and you will have to figure out how to scrape it. And if it's your web site that you control, this is silly—just expose the information directly so you don't have to scrape it. – abarnert Jan 06 '14 at 23:17

Create Zipfile in Python in memory on Web Server

1 Answers1