1

I'm receiving huge amounts of data streaming from Twitter using Tweepy (a Python Twitter API library). What I want to do is to compress the stream of received tweets and store them in file.

The compression must be LZO and I don't want to use Linux pipes for compression. I want to use LZO directly from the Python code. Using Linux pipes I could do:

Python downloader.py | lzop -c > output.json.lzo

But I don't want to use pipes and want to compress the stream within the Python script downloader.

I couldn't find any Python library or sample code to compress streaming data using LZO.

Ash
  • 3,428
  • 1
  • 34
  • 44
  • I wrote a piece of code here that does the job: https://github.com/afshinrahimi/twitter-fetcher/blob/master/fetcher.py – Ash Nov 03 '16 at 01:51

1 Answers1

2

Two options:

  1. use the library.

  2. if for some reason you cannot use the library, the following code is an equivalent of the one you wrote:

    from subprocess import Popen, PIPE, STDOUT
    
    p = Popen(['lzop', '-c'], stdout=PIPE, stdin=PIPE, stderr=STDOUT)    
    result_stdout = p.communicate(input=json.dump(results))[0]
    
lesingerouge
  • 1,160
  • 7
  • 14
  • 1
    The final code is here for users facing the same problem in future http://pastebin.com/npzW5fh7 – Ash May 08 '16 at 01:55
  • 1
    Also as another note: If you're using streaming you shouldn't use p.communicate as it closes the pipe afterwards. You should use p.stdin.write(data) instead to keep the pipe open. For dumping the output to a file instead of stdout=PIPE you can replace the PIPE with a file as in open('output.json.lzo', 'wb'). – Ash May 08 '16 at 01:58
  • The code is here: https://github.com/afshinrahimi/twitter-fetcher/blob/master/fetcher.py – Ash May 09 '16 at 01:30