-1

I'm currently using PRAW right now to grab the results from multiple different subreddits and trying to group every single result from each subreddit into one dataframe.

Right now it works perfectly fine with one subreddit, however once I pass in a list of subreddits it overwrites the results from the previous one and returns the most recently done subreddit. For some reason I feel like this should be trivial but I'm drawing a complete blank.

This is the code I have now:

sub_list = ['5 different subreddits']

for sub in sub_list:

    print('Working on this sub right now: \n', sub)

    subreddit = protest_sniffer.subreddit(sub)

    cont_subreddit = subreddit.controversial(limit=1000)

    topics_dict = { "title":[], \
                    "score":[], \
                    "id":[], "url":[], \
                    "comms_num": [], \
                    "created": [], \
                    "body":[]}
    count = 0

    for submission in cont_subreddit:
        topics_dict["title"].append(submission.title)
        topics_dict["score"].append(submission.score)
        topics_dict["id"].append(submission.id)
        topics_dict["url"].append(submission.url)
        topics_dict["comms_num"].append(submission.num_comments)
        topics_dict["created"].append(submission.created)
        topics_dict["body"].append(submission.selftext)

        count += 1
        progress = round(100 * (count/1000), 1)
        print('%s percent finished' % progress)

topics_data = pd.DataFrame(topics_dict)
print(topics_data.describe())
Sebastian Goslin
  • 477
  • 1
  • 3
  • 22
  • Will a multireddit work for you? You can set the subreddit to be something like `"pics+gifs+funny"` and it will collect posts from all 3 of those at the same time. – DenverCoder1 Jul 23 '19 at 12:20

1 Answers1

1

Because topics_dict is defined inside the first loop, every new iteration for a subreddit overwrites the previous dictionary with one that contains empty lists. I'd say try moving this outside the loop.

sub_list = ['5 different subreddits']

topics_dict = { "title":[], \
                    "score":[], \
                    "id":[], "url":[], \
                    "comms_num": [], \
                    "created": [], \
                    "body":[]}

for sub in sub_list:
    ...