6

I am using tweepy streaming API to get the tweets containing a particular hashtag . The problem that I am facing is that I am unable to extract full text of the tweet from the Streaming API . Only 140 characters are available and after that it gets truncated.

Here is the code:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)


def analyze_status(text):

    if 'RT' in text[0:3]:
        return True
    else:
        return False

    class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):

    if not analyze_status(status.text):

        with open('fetched_tweets.txt', 'a') as tf:
            tf.write(status.text.encode('utf-8') + '\n\n')

        print(status.text)

    def on_error(self, status):
    print("Error Code : " + status)

    def test_rate_limit(api, wait=True, buffer=.1):
        """
        Tests whether the rate limit of the last request has been reached.
        :param api: The `tweepy` api instance.
        :param wait: A flag indicating whether to wait for the rate limit reset
                 if the rate limit has been reached.
        :param buffer: A buffer time in seconds that is added on to the waiting
                   time as an extra safety margin.
        :return: True if it is ok to proceed with the next request. False otherwise.
        """
        # Get the number of remaining requests
        remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
        # Check if we have reached the limit
        if remaining == 0:
        limit = int(api.last_response.getheader('x-rate-limit-limit'))
        reset = int(api.last_response.getheader('x-rate-limit-reset'))
        # Parse the UTC time
        reset = datetime.fromtimestamp(reset)
        # Let the user know we have reached the rate limit
        print "0 of {} requests remaining until {}.".format(limit, reset)

        if wait:
            # Determine the delay and sleep
            delay = (reset - datetime.now()).total_seconds() + buffer
            print "Sleeping for {}s...".format(delay)
            sleep(delay)
            # We have waited for the rate limit reset. OK to proceed.
            return True
        else:
            # We have reached the rate limit. The user needs to handle the rate limit manually.
            return False

        # We have not reached the rate limit
        return True

    myStreamListener = MyStreamListener()
    myStream = tweepy.Stream(auth=api.auth, listener=myStreamListener,
                             tweet_mode='extended')

    myStream.filter(track=['#bitcoin'], async=True)

Does any one have a solution ?

stuckoverflow
  • 625
  • 2
  • 7
  • 23
Varad Bhatnagar
  • 599
  • 1
  • 7
  • 19

8 Answers8

8

tweet_mode=extended will have no effect in this code, since the Streaming API does not support that parameter. If a Tweet contains longer text, it will contain an additional object in the JSON response called extended_tweet, which will in turn contain a field called full_text.

In that case, you'll want something like print(status.extended_tweet.full_text) to extract the longer text.

Andy Piper
  • 11,422
  • 2
  • 26
  • 49
  • Still not working . Getting error : AttributeError: 'Status' object has no attribute 'extended_tweet' – Varad Bhatnagar Jan 19 '18 at 07:02
  • that would only be there in the case of Tweets which are longer than 140 characters. Have you tried tracing out the complete JSON object you're getting back from the API, to check the exact structure? – Andy Piper Jan 19 '18 at 14:40
  • 1
    I accomplished the task I wanted to do using the on_data() functionality of tweepy which returned a JSON like object. – Varad Bhatnagar Jan 26 '18 at 14:30
7

There is Boolean available in the Twitter stream. 'status.truncated' is True when the message contains more than 140 characters. Only then the 'extended_tweet' object is available:

        if not status.truncated:
            text = status.text
        else:
            text = status.extended_tweet['full_text']

This works only when you are streaming tweets. When you are collecting older tweets using the API method you can use something like this:

tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
    print(tweet.full_text)

This full_text field contains the text of all tweets, truncated or not.

2

You have to enable extended tweet mode like so:

s = tweepy.Stream(auth, l, tweet_mode='extended')

Then you can print the extended tweet, but remember due to Twitter APIs you have to make sure extended tweet exists otherwise it'll throw an error

l = listener()

class listener(StreamListener):
    def on_status(self, status):
        try:
            print(status.extended_tweet['full_text'])
        except Exception as e:
            raise
        else:
            print(status.text)
        return True
    def on_error(self, status_code):
        if status_code == 420:
            return False

Worked for me.

1

In addition to the previous answer: in my case it worked only as status.extended_tweet['full_text'], because the status.extended_tweet is nothing but a dictionary.

Budi Mulyo
  • 384
  • 5
  • 22
Ivan Klimuk
  • 21
  • 1
  • 4
1

Building upon @AndyPiper's answer, you can check to see if the tweet is there by either a try/except:

  def get_tweet_text(tweet):
    try:
      return tweet.extended_tweet['full_text']
    except AttributeError as e:
      return tweet.text

OR check against the inner json:

  def get_tweet_text(tweet):
    if 'extended_tweet' in tweet._json:
      return tweet.extended_tweet['full_text']
    else:
      return tweet.text

Note that extended_tweet is a dictionary object, so "tweet.extended_tweet.full_text" doesn't actually work and will throw an error.

AndersonHappens
  • 507
  • 1
  • 4
  • 16
0

this is what worked for me:

status = tweet if 'extended_tweet' in status._json: status_json = status._json['extended_tweet']['full_text'] elif 'retweeted_status' in status._json and 'extended_tweet' in status._json['retweeted_status']: status_json = status._json['retweeted_status']['extended_tweet']['full_text'] elif 'retweeted_status' in status._json: status_json = status._json['retweeted_status']['full_text'] else: status_json = status._json['full_text'] print(status_json)'

https://github.com/tweepy/tweepy/issues/935 - implemented from here, needed to change what they suggest but the idea stays the same

Dina Bavli
  • 59
  • 2
0

I use the Following Function:

def full_text_tweeet(id_):
    status = api.get_status(id_, tweet_mode="extended")
    try:
        return status.retweeted_status.full_text
    except AttributeError:  
        return status.full_text

and then call it in my list

 tweets_list = []
    # foreach through all tweets pulled
    for tweet in tweets:
        # printing the text stored inside the tweet object
        tweet_list = [str(tweet.id),str(full_text_tweeet(tweet.id))]
        tweets_list.append(tweet_list)
  • Welcome to SO. This is an old question with an already accepted answer. You might want to explain a bit more how your answer differs from the others and why someone should use it. – Xerillio Jan 31 '21 at 10:37
0

try this, this is the most simplest and fastest way.

def on_status(self, status):
if hasattr(status, "retweeted_status"):  # Check if Retweet
    try:
        print(status.retweeted_status.extended_tweet["full_text"])
    except AttributeError:
        print(status.retweeted_status.text)
else:
    try:
        print(status.extended_tweet["full_text"])
    except AttributeError:
        print(status.text)

Visit the link it will give you the how extended tweet can be achieve