I'm currently using PRAW right now to grab the results from multiple different subreddits and trying to group every single result from each subreddit into one dataframe.
Right now it works perfectly fine with one subreddit, however once I pass in a list of subreddits it overwrites the results from the previous one and returns the most recently done subreddit. For some reason I feel like this should be trivial but I'm drawing a complete blank.
This is the code I have now:
sub_list = ['5 different subreddits']
for sub in sub_list:
print('Working on this sub right now: \n', sub)
subreddit = protest_sniffer.subreddit(sub)
cont_subreddit = subreddit.controversial(limit=1000)
topics_dict = { "title":[], \
"score":[], \
"id":[], "url":[], \
"comms_num": [], \
"created": [], \
"body":[]}
count = 0
for submission in cont_subreddit:
topics_dict["title"].append(submission.title)
topics_dict["score"].append(submission.score)
topics_dict["id"].append(submission.id)
topics_dict["url"].append(submission.url)
topics_dict["comms_num"].append(submission.num_comments)
topics_dict["created"].append(submission.created)
topics_dict["body"].append(submission.selftext)
count += 1
progress = round(100 * (count/1000), 1)
print('%s percent finished' % progress)
topics_data = pd.DataFrame(topics_dict)
print(topics_data.describe())