18

I'm using Flask and Tweepy to search for live tweets. On the front-end I have a user text input, and button called "Search". Ideally, when a user gives a search-term into the input and clicks the "Search" button, the Tweepy should listen for the new search-term and stop the previous search-term stream. When the "Search" button is clicked it executes this function:

@app.route('/search', methods=['POST'])
# gets search-keyword and starts stream
def streamTweets():
    search_term = request.form['tweet']
    search_term_hashtag = '#' + search_term
    # instantiate listener
    listener = StdOutListener()
    # stream object uses listener we instantiated above to listen for data
    stream = tweepy.Stream(auth, listener)

    if stream is not None:
        print "Stream disconnected..."
        stream.disconnect()

    stream.filter(track=[search_term or search_term_hashtag], async=True)
    redirect('/stream') # execute '/stream' sse
    return render_template('index.html')

The /stream route that is executed in the second to last line in above code is as follows:

@app.route('/stream')
def stream():
    # we will use Pub/Sub process to send real-time tweets to client
    def event_stream():
        # instantiate pubsub
        pubsub = red.pubsub()
        # subscribe to tweet_stream channel
        pubsub.subscribe('tweet_stream')
        # initiate server-sent events on messages pushed to channel
        for message in pubsub.listen():
            yield 'data: %s\n\n' % message['data']
    return Response(stream_with_context(event_stream()), mimetype="text/event-stream")

My code works fine, in the sense that it starts a new stream and searches for a given term whenever the "Search" button is clicked, but it does not stop the previous search. For example, if my first search term was "NYC" and then I wanted to search for a different term, say "Los Angeles", it will give me results for both "NYC" and "Los Angeles", which is not what I want. I want just "Los Angeles" to be searched. How do I fix this? In other words, how do I stop the previous stream? I looked through other previous threads, and I know I have to use stream.disconnect(), but I'm not sure how to implement this in my code. Any help or input would be greatly appreciated. Thanks so much!!

stthomas
  • 183
  • 1
  • 4
  • 1
    For actual project take look into this https://github.com/kimasx/twtr-search-map – Raja Simon Apr 23 '15 at 17:17
  • Have you tried keeping a reference to the stream object that you created (`stream`) in the actual app outside the function, so that you can then call `.disconnect` on it as the first action in `tweetStream()` where `\search` is routed, before you create the new one? I haven't used Flask, so don't know if this is a standard pattern but it seems like it should work. – J Richard Snape Apr 24 '15 at 15:14
  • @JRichardSnape Reference to the stream `object` ? How to do that.? Can you make it answer so we can discuss further ... – Raja Simon Apr 25 '15 at 01:34
  • @RajaSimon What I meant was - if you've got multiple users, all inputting their own search and you want to display their stream to them until they input another search, in which case you want to disconnect and show them a different one, then you need some way of keeping a reference to the stream object that each user is "listening" to and associating it with that user. Somewhat like the [first answer here](http://stackoverflow.com/a/29859195/838992). I'm not an expert on the `redis.pubsub()` model, so it would take a while to figure out the best / intended way to do this. – J Richard Snape Apr 27 '15 at 08:38
  • @rajasimon I haven't published an answer because I'm not sure I've got the know-how to give a good answer that will work at scale. – J Richard Snape Apr 27 '15 at 08:38
  • @JRichardSnape Thanks for the input. Actually I found the way to close the redis connection. Bug: In my method one user refresh the page or close the page then background stream still running... in this case what should I do... This is what I need now... – Raja Simon Apr 27 '15 at 10:03
  • @Raja: you may want to do what you wrote here on top of other things. 1) Send a command to stopSearch on exit (javascript). This may not get called in extreme cases so you need a fallback 2) Use your signal code below or add a timeout. Make this doesn't break normal user's behavior. 3) Rest of the stopSearch logic (on new call) – Pierre-Francoys Brousseau Apr 27 '15 at 18:51
  • @Pierre-FrancoysBrousseau Your feedback nice..! I will try. – Raja Simon Apr 28 '15 at 02:26

3 Answers3

4

Below is some code that will cancel old streams when a new stream is created. It works by adding new streams to a global list, and then calling stream.disconnect() on all streams in the list whenever a new stream is created.

diff --git a/app.py b/app.py
index 1e3ed10..f416ddc 100755
--- a/app.py
+++ b/app.py
@@ -23,6 +23,8 @@ auth.set_access_token(access_token, access_token_secret)
 app = Flask(__name__)
 red = redis.StrictRedis()

+# Add a place to keep track of current streams
+streams = []

 @app.route('/')
 def index():
@@ -32,12 +34,18 @@ def index():
 @app.route('/search', methods=['POST'])
 # gets search-keyword and starts stream
 def streamTweets():
+        # cancel old streams
+        for stream in streams:
+            stream.disconnect()
+
        search_term = request.form['tweet']
        search_term_hashtag = '#' + search_term
        # instantiate listener
        listener = StdOutListener()
        # stream object uses listener we instantiated above to listen for data
        stream = tweepy.Stream(auth, listener)
+        # add this stream to the global list
+        streams.append(stream)
        stream.filter(track=[search_term or search_term_hashtag],
                async=True) # make sure stream is non-blocking
        redirect('/stream') # execute '/stream' sse

What this does not solve is the problem of session management. With your current setup a search by one user will affect the searches of all users. This can be avoided by giving your users some identifier and storing their streams along with their identifier. The easiest way to do this is likely to use Flask's session support. You could also do this with a requestId as Pierre suggested. In either case you will also need code to notice when a user has closed the page and close their stream.

MattL
  • 1,132
  • 10
  • 23
  • I also submitted a [pull request](https://github.com/kimasx/twtr-search-map/pull/1) – MattL Apr 29 '15 at 18:11
  • If user close the browser or something happen in the frontend then the stream running background forever right ? – Raja Simon Apr 30 '15 at 06:58
  • That's right. With this code a stream will stay open until a new stream is started or the server shuts down. – MattL Apr 30 '15 at 12:19
1

Disclaimer: I know nothing about Tweepy, but this appears to be a design issue.

Are you trying to add state to a RESTful API? You may have a design problem. As JRichardSnape answered, your API shouldn't be the one taking care of canceling a request; it should be done in the front-end. What I mean here is in the javascript / AJAX / etc calling this function, add another call, to the new function

@app.route('/cancelSearch', methods=['POST']) With the "POST" that has the search terms. So long as you don't have state, you can't really do this safely in an async call: Imagine someone else makes the same search at the same time then canceling one will cancel both (remember, you don't have state so you don't know who you're canceling). Perhaps you do need state with your design.

If you must keep using this and don't mind breaking the "stateless" rule, then add a "state" to your request. In this case it's not so bad because you could launch a thread and name it with the userId, then kill the thread every new search

def streamTweets():
    search_term = request.form['tweet']
    userId = request.form['userId'] # If your limit is one request per user at a time. If multiple windows can be opened and you want to follow this limit, store userId in a cookie.
    #Look for any request currently running with this ID, and cancel them

Alternatively, you could return a requestId, which you would then keep in the front-end can call cancelSearch?requestId=$requestId. In cancelSearch, you would have to find the pending request (sounds like that's in tweepy since you're not using your own threads) and disconnect it.

Out of curiosity I just watched what happens when you search on Google, and it uses a GET request. Have a look (debug tools -> Network; then enter some text and see the autofill). Google uses a token sent with every request (every time you type something)). It doesn't mean it's used for this, but that's basically what I described. If you don't want a session, then use a unique identifier.

  • Can you elaborate more about this line ? `#Look for any request currently running with this ID, and cancel them` ?? How to find the running with ID ? – Raja Simon Apr 25 '15 at 01:38
  • Since I don't know tweepy, i'd do it the ugly way: a global dictionary of ``streamDict = {userId : stream}`` . The problem with a signal is you may kill too early or too late and waste resources. If this were my project i'd probably modify tweepy to have a lookup function and each stream to have tags (or known thread-names). PS: globals are usually a sign that you don't have the right tools. It's true in this case: I don't know the right tweepy tool or it doesn't exist. – Pierre-Francoys Brousseau Apr 27 '15 at 18:41
1

Well I solved it by using timer method But still I'm looking for pythonic way.

from streamer import StreamListener
def stream():
    hashtag = input
    #assign each user an ID ( for pubsub )
    StreamListener.userid = random_user_id
    def handler(signum, frame):
        print("Forever is over")
        raise Exception("end of time")

    def main_stream():
        stream = tweepy.Stream(auth, StreamListener())
        stream.filter(track=track,async=True)
        redirect(url_for('map_stream'))

    def close_stream():
        # this is for closing client list in redis but don't know it's working
        obj = redis.client_list(tweet_stream)
        redis_client_list = obj[0]['addr']
        redis.client_kill(redis_client_list)
        stream = tweepy.Stream(auth, StreamListener())
        stream.disconnect()

    import signal
    signal.signal(signal.SIGALRM, handler)
    signal.alarm(300)
    try:
        main_stream()
    except Exception:
        close_stream()
        print("function terminate")
Raja Simon
  • 10,126
  • 5
  • 43
  • 74