Collecting and Processing data with PHP (Twitter Streaming API)

Question

after reading through all of the twitter streaming API and Phirehose PHP documentation i've come across something I have yet to do, collect and process data separately.

The logic behind it, If I understand correctly, is to prevent a log jam at the processing phase that will back up the collecting process. I've seen examples before but they basically write right to a MySQL database right after collection which seems to go against what twitter recommends you do.

What I'd like some advice/help on is, what is the best way to handle this and how. It seems that people recommend writing all the data directly to a text file then parsing/processing it with a separate function. But with this method, I'd assume it could be a memory hog.

Here's the catch, it's all going to be running as a daemon/background process. So does anyone have any experience with solving a problem like this, or more specifically, the twitter phirehose library? Thanks!

Some notes: *The connection will be through a socket so my guess is that the file will constantly be appended? not sure if anyone has any feedback on that

score 1 · Answer 1 · answered May 17 '12 at 23:32

The phirehose library comes with an example of how to do this. See:

This uses a flat file, which is very scalable and fast, ie: Your average hard disk can write sequentially at 40MB/s+ and scales linearly (ie: unlike a database, it doesn't slow down as it gets bigger).

You don't need any database functionality to consume a stream (ie: you just want the next tweet, there's no "querying" involved).

If you rotate the file fairly often, you will get near-realtime performance (if desired).

Collecting and Processing data with PHP (Twitter Streaming API)

1 Answers1