Creating DataFrame with Pandas from Tweepy API V2 Output

Question

Here I am trying to use tweepy.Paginator to search over 100+ tweets and then load them into pandas df for further analysis. I, however, keep getting the repeated wrong username in each row, instead of the username of the author of the tweet in my df output.

Below is an example of my code. Please can anyone help fix it?

# create a list of records
query = 'Wizkid'

tweet_info_ls = []
# iterate over each tweet and corresponding user details
for tweet in tweepy.Paginator(client.search_recent_tweets,query=query,tweet_fields=['context_annotations', 'created_at','author_id', 'public_metrics'], 
                                     expansions=['author_id','referenced_tweets.id'], max_results=100, user_fields=['username', 'name','public_metrics']).flatten(limit = 100):
    tweet_info = {
        'created_at': tweet.created_at,
        'text': tweet.text,
        #'source': tweet.source,
        'name': user.name,
        'username': user.username,
        #'location': user.location,
        #'verified': user.verified,
        #'description': user.description,
        'followers': user.public_metrics['followers_count'],
        'repost':tweet.public_metrics['retweet_count']
    }
    tweet_info_ls.append(tweet_info)
# create dataframe from the extracted records
tweets_df = pd.DataFrame(tweet_info_ls)
# display the dataframe
tweets_df.head(20)

and here's the head of the output

enter image description here

score 0 · Answer 1 · answered Sep 14 '22 at 21:06

This is untested but looks more logical,

import tweepy
import pandas as pd
import numpy as np
import json

consumer_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
consumer_secret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
access_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
access_secret = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)

for tweet in tweepy.Cursor(api.search, q='trump',tweet_fields=['context_annotations', 'created_at','author_id', 'public_metrics'], 
                                     expansions=['author_id','referenced_tweets.id'], count=100, lang="en", user_fields=['username', 'name','public_metrics']).items():
    print (tweet.created_at, tweet.text)

tweet_info = tweet._json
print(json.dumps(tweet_info, indent=4, sort_keys=True))

Do you think there is a way to extract the tweets by calling Twitter API V2 and using client.search_recent_tweets() endpoint? I am just trying to get more comfortable with API V2 to be ready for the moment when v1.1 will be archived. — Eugene, Sep 15 '22 at 05:35

Creating DataFrame with Pandas from Tweepy API V2 Output

1 Answers1