4

I'm trying to scrape all comments from a subreddit. I've found a library called PRAW. It gives an example

import praw
r = praw.Reddit('Comment parser example by u/_Daimon_')
subreddit = r.get_subreddit("python")
comments = subreddit.get_comments()

However, this returns only the most recent 25 comments. How can I parse all comments in the subreddit? On the Reddit interface, there's a next button, so it should be possible to go back in history page by page.

siamii
  • 23,374
  • 28
  • 93
  • 143

1 Answers1

3

From the docs:

See UnauthenticatedReddit.get_comments() for complete usage.

That function has *args and **kwargs, and the function notes:

The additional parameters are passed directly into get_content(). Note: the url parameter cannot be altered.

Therefore, I looked at that function (find it here). One of the arguments for get_content is limit.

limit – the number of content entries to fetch. If limit <= 0, fetch the default for your account (25 for unauthenticated users). If limit is None, then fetch as many entries as possible (reddit returns at most 100 per request, however, PRAW will automatically make additional requests as necessary).

(Emphasis added). So my test was:

 comments=subreddit.get_comments(limit=None)

And I got 30+ comments (probably the 100 limit, but I had to go through them manually, so I thought 30 was enough).

IronManMark20
  • 1,298
  • 12
  • 28
  • ok, I see, however, I need to get `all` comments from a subreddit. I'm doing data analysis, and need at least 10,000 comments minimum – siamii Jun 28 '15 at 18:46
  • As the docs note, `(reddit returns at most 100 per request, however, PRAW will automatically make additional requests as necessary).`. So if you keep asking for more data, it will fetch more. – IronManMark20 Jun 28 '15 at 18:57