You can use the Python Pushshift.io API Wrapper (PSAW) to get all the most recent submissions and comments from a specific subreddit, and can even do more complex queries (such as searching for specific text inside a comment). The docs are available here.
For example, you can use the get_submissions()
function to get the top 1000 submissions from r/dogs from 2015:
import datetime as dt
import praw
from psaw import PushshiftAPI
r = praw.Reddit(...)
api = PushshiftAPI(r)
start_epoch=int(dt.datetime(2015, 1, 1).timestamp()) # Could be any date
submissions_generator = api.search_submissions(after=start_epoch, subreddit='dogs', limit=1000) # Returns a generator object
submissions = list(submissions_generator) # You can then use this, store it in mongoDB, etc.
Alternatively, to get the first 1000 comments from r/dogs in 2015, you can use the search_comments()
function:
start_epoch=int(dt.datetime(2015, 1, 1).timestamp()) # Could be any date
comments_generator = api.search_comments(after=start_epoch, subreddit='dogs', limit=1000) # Returns a generator object
comments = list(comments_generator)
As you can see, PSAW still uses PRAW, and so returns PRAW objects for submissions and comments, which may be handy.