4

I have a zip archive containing a __main__.py file : archive.zip

I can execute it with

python archive.zip
=> OK !

but not with

cat archive.zip | python
=> File "<stdin>", line 1
SyntaxError: Non-ASCII character '\x9e' in file <stdin> on line 2,
but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

why is there a difference between the 2 modes and is there a way to make the pipe work without unzipping outside of python ?

I receive this archive over the network and want to execute it as soon as i receive it and as fast as possible so I thought that piping the zip into python would work !

Multimedia Mike
  • 12,660
  • 5
  • 46
  • 62
Jerome WAGNER
  • 21,986
  • 8
  • 62
  • 77
  • I guess that Python determines how to proceed based on the filename but when you send it via stdin, there is no filename. (I didn't see the interpreter code.) – erny Nov 29 '13 at 00:08
  • no, removing the .zip gives the same results. – Jerome WAGNER Nov 29 '13 at 00:14

5 Answers5

7

The reason that you can 'python file.zip', but not 'cat file.zip | python' is that Python has the 'zipimport' built in so that when you run python against files (or try to import them), zipimport takes a crack at them as part of the import process. (See the import module for details).

But with stdin, python does not make any attempt to search the streaming data - because the streaming data could be anything - could be user input that is handled by code, could be code. There's no way to know and Python makes no real effort to know for that reason.

edit

Occasionally, when you're answering questions - you think 'I really shouldn't tell someone the answer', not because you wish to be secretive or hold some amount of power over them. Simply because the path they're going down isn't the right path and you want to help them out of the hole they're digging. This is one of those situations. However, against my better judgement, here's an extremely hacky way of accomplishing something similar to what you want. It's not the best way, it's probably in fact the worst way to do it.

I just played around with the zipimporter for a while and tried all the tricks I could think of. I looked at 'imp', 'compile' as well.. Nothing can import a zipped module (or egg) from memory so far that I can see. So, an interim step is needed.

I'll say this up front, I'm embarrassed to even be posting this. Don't show this to people you work with or people that you respect because they laugh at this terrible solution.

Here's what I did:

mkdir foo
echo "print 'this is foo!'" >>foo/__init__.py
zip foo.zip -r foo
rm -rf foo                   # to ensure it doesn't get loaded from the filesystem
mv foo.zip somethingelse.zip # To ensure it doesn't get zipimported from the filesystem

And then, I ran this program using

cat somethingelse.zip | python script.py

#!/usr/bin/python 

import sys
import os
import zipfile
import StringIO
import zipimport
import time

sys.path.append('/tmp')

class SinEater(object):
    def __init__(self):
        tmp = str(int(time.time()*100)) + '.zip'
        f = open(tmp, 'w')
        f.write(sys.stdin.read(1024*64)) # 64kb limit
        f.close()
        try:
            z = zipimport.zipimporter(tmp)
            z.load_module('foo')

        except:
            pass

if __name__ == '__main__':
    print 'herp derp'
    s = SinEater()

Produces:

herp derp
this is new

A solution that would be about a million times better than this would be to have a filesystem notification (inotify, kevent, whatever windows uses) that watches a directory for new zip files. When a new zip file is dropped in that directory, you could automatically zipimport it. But, I cannot stress enough even that solution is terrible. I don't know much about Ansible (anything really), but I cannot imagine any engineer thinking that it would be a good solution for how to handle code updates or remote control.

synthesizerpatel
  • 27,321
  • 5
  • 74
  • 91
  • i changed the accepted answer because in your answer i do not understand why "cat myfile.py | python" works. It could also be any kind of data but python recognizes it and executes it. Even adding a python shebang at the start of the zip file does not make the trick – Jerome WAGNER Nov 30 '13 at 09:58
  • Because "cat myfile.py | python" is exactly the same as running the interactive Python CLI and typing 'myfile.py' in line by line. The Python interpreter doesn't understand what a '#!' is (beyond thinking it's a comment) - your shell is what interprets the #! as a hint to run a program. – synthesizerpatel Nov 30 '13 at 10:03
  • But how does python decide that each line should be interpreted as python in the first place. reading your comment, i undestand there could be some lines that I can prepend to the zip file in order to make it work ? like "cat prepend.py zipfile.zip | python" ? – Jerome WAGNER Nov 30 '13 at 10:08
  • or maybe a "python -c" trick could help ? would appreciate if you have an insight on this – Jerome WAGNER Nov 30 '13 at 10:18
  • I'm sure it's **possible**, but not without writing some code. It wouldn't be difficult code. You could just do something that checks stdin for the bytes in a ZIP file, read it, then import it. I really tried to imagine a situation where you would need or want to do this and just couldn't come up with one. It's highly likely what you're wanting to do is a bad idea, and I don't mean that to offend you. I'm just saying. – synthesizerpatel Nov 30 '13 at 10:35
  • I'm trying to work an ansible project which needs to send snippets of python code over the network and get them executed there. Right now, one file is scp'ed and then executed. I have a working patch without first scp'ing the file doing "cat module.py | ssh .. python" but would like to find a way to send a module archive.zip the same way. This is only for optimization purpose and avoiding access to filesystem. do you have any pointers to help me write this "mixed" stdin thing ? – Jerome WAGNER Nov 30 '13 at 10:49
  • Ok, you're definitely trying to do things the wrong way. Please clarify what your goal is - are you trying to (1) run a script that uses Ansible's Python API? (2) Add a plugin into Ansible's runtime instance? (3) other? – synthesizerpatel Nov 30 '13 at 11:22
  • (3) other: I am currently trying to modify the way ansible sends modules to the remote hosts to minimize the ssh roundtrips (cf https://groups.google.com/forum/#!topic/ansible-devel/67ltJ2OGlLc). no need for zip files at this stage but i am looking into how in the future modules could not be a monolytic python file but a package a bit like egg files. currently, ansible modules need to be "templated" with data before beeing sent because only one .py file is sent. Anyway this "zip" thing is not as straightforward as I tought it could have been. – Jerome WAGNER Nov 30 '13 at 13:00
  • Check my original answer - I posted a terrible hack that kind of does what you want. Last idea - you could compile a different version of zipimport.c (Python-X.X/Modules/zipimport.c) which allows you to pass in a file handle rather than a file name. In that case you could read data into a StringIO object and then zipimport directly from that. – synthesizerpatel Nov 30 '13 at 14:45
  • +1 for the time you spent on this and believing that I am looking for an overall elegant solution to this. I agree we're not quite there yet mainly to me because of the interim tmp file needed for zipimporter to work. thanks for pushing the limits. – Jerome WAGNER Nov 30 '13 at 19:25
2

A .zip file consists of a series of files where each is a local header and the compressed data, followed by a central directory which has the local header information repeated, offsets to the local headers, and some other data to allow random access to the files.

The usual way to access a .zip file is to find the central directory at the end of the file and read that in, and then use that information to access the local entries. That requires seeking.

It is possible to write an unzip that reads a zip file from a pipe. (In fact I did that once.) However that is not the kind of code that Python is using to read zip files.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
0

Interesting. I had no idea this was possible. But I'll take your word for it.

If I were to guess why it doesn't work when streaming in from the STDIN, I would say it's because processing a ZIP archive often requires backwards seeking. A ZIP archive consists of a bunch of compressed files concatenated together (with enough header data to decompress independently), and then an index at the end. In my experience, decompressors tend to seek straight to the end to grab the index and then seek earlier in the file to fetch and decompress payload data (even though it's possible to iterate through the compressed files individually).

Since in this case, the data comes from STDIN, the decompressor can't seek backwards. The same would apply for a naïve network stream as well.

Multimedia Mike
  • 12,660
  • 5
  • 46
  • 62
  • 1
    FWIW, 'egg' files are zips, thats why zipimport is built into Python by default (Python-x.x/Modules/Setup source if you're curious). – synthesizerpatel Nov 29 '13 at 02:20
0

It is possible. But requires some coding) Main idea is use memory-mapped temporary file and redirect it into STDIN.

run_zipped_project.py

#!/usr/bin/env python
# encoding: utf-8
import os
import subprocess
from tempfile import SpooledTemporaryFile as tempfile

if __name__ == '__main__':
    filename = "test.zip" # here your zipped project
    size = os.path.getsize(filename)
    with open(filename, "rb") as test:
        code = test.read()
    test.close()

    # NOW WE LOAD IT FROM DISK BUT YOU CAN USE ANY ANOTHER SOURCE

    print "loaded {file} with size {size}".format(file=filename, size=size)
    size += 1  # prevent buffer overrun and dumping to disk


    f = tempfile(max_size=size, bufsize=size)
    f.write(code)
    f.seek(0)

    process = subprocess.Popen(["python2", "loader.py"],
        stdin=f,
        stdout=subprocess.PIPE,
        bufsize=size
        )
    print process.communicate()[0]
    f.close()
    print "closed"

loader.py

#!/usr/bin/env python
# encoding: utf-8
from zipimport import zipimporter

if __name__ == '__main__':
    zip = zipimporter('/dev/stdin')
    zip.load_module('__main__')
0

If you don't need the cat command you can do something like?

unzip -p archive.zip | python3

Basically you decompress to stdout (-p option) before sending the data to python!

rkachach
  • 16,517
  • 6
  • 42
  • 66