How to compress Twitter streaming using LZO in Linux/Python/Tweepy environment?

Question

I'm receiving huge amounts of data streaming from Twitter using Tweepy (a Python Twitter API library). What I want to do is to compress the stream of received tweets and store them in file.

The compression must be LZO and I don't want to use Linux pipes for compression. I want to use LZO directly from the Python code. Using Linux pipes I could do:

Python downloader.py | lzop -c > output.json.lzo

But I don't want to use pipes and want to compress the stream within the Python script downloader.

I couldn't find any Python library or sample code to compress streaming data using LZO.

I wrote a piece of code here that does the job: https://github.com/afshinrahimi/twitter-fetcher/blob/master/fetcher.py — Ash, Nov 03 '16 at 01:51

score 2 · Accepted Answer · answered May 07 '16 at 03:14

2

Two options:

use the library.

if for some reason you cannot use the library, the following code is an equivalent of the one you wrote:

from subprocess import Popen, PIPE, STDOUT

p = Popen(['lzop', '-c'], stdout=PIPE, stdin=PIPE, stderr=STDOUT)    
result_stdout = p.communicate(input=json.dump(results))[0]

answered May 07 '16 at 03:14

lesingerouge

1,160
7
14

1

The final code is here for users facing the same problem in future http://pastebin.com/npzW5fh7 – Ash May 08 '16 at 01:55
1

Also as another note: If you're using streaming you shouldn't use p.communicate as it closes the pipe afterwards. You should use p.stdin.write(data) instead to keep the pipe open. For dumping the output to a file instead of stdout=PIPE you can replace the PIPE with a file as in open('output.json.lzo', 'wb'). – Ash May 08 '16 at 01:58
The code is here: https://github.com/afshinrahimi/twitter-fetcher/blob/master/fetcher.py – Ash May 09 '16 at 01:30

How to compress Twitter streaming using LZO in Linux/Python/Tweepy environment?

1 Answers1