0

I want to parse logfiles from rackspace. I'm using the official python sdk. I have previously saved the file to disk and then read it from there with gzip.open.

Now I'm on heroku and can't / don't want to save the file to disk, but do the unzipping in memory.

However, I can't manage to download the object as string or pseudo file object to handle it.

Does someone has an idea?

logString = ''
buffer = logfile.stream()

while True:
    try:
        logString += buffer.next()
    except StopIteration:
        break

# logString is always empty here

# I'd like to have something that enables me to do this:
for line in zlib.decompress(logString):
    # having each line of the log here

Update

I've noticed, that the empty string is not entirely true. This is going through a loop, and just the first occurence is empty. The next occurences I do have data (that looks like it's gzipped), but I get this zlib error:

zlib.error: Error -3 while decompressing data: incorrect header check

Update II

As suggested, I implemented cStringIO, with the same result:

 buffer = logfile.stream()
     output = cStringIO.StringIO()

         while True:
             try:
                  output.write(buffer.next())
             except StopIteration:
             break

         print(output.getvalue())

Update III This does work now:

output = cStringIO.StringIO()

try:
    for buffer in logfile.stream():
        output.write(buffer)
except StopIteration:
    break

And at least no crash in here, but it seems not to get actual lines:

for line in gzip.GzipFile(fileobj=output).readlines():
    # this is never reached

How to proceed here? Is there some easy way to see the incoming data as normal string to know if I'm on the right way?

shredding
  • 5,374
  • 3
  • 46
  • 77
  • `logfile.save()` does work, but `logfile.stream().next()` calls return empty strings? – Martijn Pieters Nov 30 '12 at 12:51
  • 1
    To turn a python string into a file-like in-memory object, use the `StringIO` module (or it's optimized C companion `cStringIO`). – Martijn Pieters Nov 30 '12 at 12:52
  • @MartijnPieters: There's no `save()` method, there is only `save_to_filename()`, and yes that works. Please look at my update! I've tried StringIO but failed to get it to work. – shredding Nov 30 '12 at 14:24

1 Answers1

0

I found out, that read() is also an option, that led to an easy solution like this:

 io = cStringIO.StringIO(logfile.read())
     for line in GzipFile(fileobj=io).readlines():
         impression = LogParser._parseLine(line)
         if impression is not None:
             impressions.append(impression)
shredding
  • 5,374
  • 3
  • 46
  • 77