4

I'm using PRAW to view a large number of Reddit search results (both submissions and comments), and the method I'm using to collect the data is frequently generating a 503 error:

prawcore.exceptions.ServerError: received 503 HTTP response

As I understand it, if it were a rate limit issue, PRAW would throw a praw.errors.RateLimitExceeded error.

The function in which the error is produced, is the following:

def search_subreddit(subreddit_name, last_post=None):
    params = {'sort': 'new', 'time_filter': 'year', 
                      'limit': 100, 'syntax':'cloudsearch'}

    if last_post:
        start_time = 0 
        end_time = int(last_post.created) + 1
        query = 'timestamp:%s..%s' % (start_time, end_time)
    else: 
        query = ''

    return reddit.subreddit(subreddit_name).search(query, **params)

That's being called within a loop. Any idea as to why the 503 error is being generated, and how to prevent it from happening?

Dreadnaught
  • 125
  • 2
  • 6

4 Answers4

7

Why it is being generated?

503 is HTTP protocol code reserved for informing that server is temporarily unavailable. In almost all cases it means that it doesn't have resources at the moment of request to generate response due to overload.

How to prevent it from happening?

Since this is server-side issue and I'll assume here that you are not a part of reddit networking team you cannot do anything directly to fix that. I'll try to list your possible options here

  • Complain on social media that reddit servers suck (Probably ineffective)
  • Try to reach reddit networking unit and inform them about the issue (Still ineffective, but might do good in long term)
  • Suggest feature to PRAW - keyword repeat_in_case_of_server_overload and repeat_in_case_of_server_overload_timeout, which when first is set to True (default False) would try to repeat requests for some customizable amount of time. (It would be interesting to see, but unlikely to get accepted in this form, also it would take some time to process)
  • Modify PRAW to do a thing described above yourself and then add pull request in github. (You would have it immiedietely, but still might not get accepted and requires a bit of work)
  • You could try to run your script when reddit servers are less busy (That honestly might work if you run it manually and only need data occasionally)
  • Add simple mechanism that will try to get search results multiple times until till succeedss (Yeah, this is probably reccomended one)

Something like:

result = None
last_exception = None
timeout = 900 #seconds = 15 minutes
time_start = int(time.time())
while not result and int(time.time()) < time_start + timeout:
    try:
        result = reddit.subreddit(subreddit_name).search(query, **params)
    except prawcore.exceptions.ServerError as e:
        #wait for 30 seconds since sending more requests to overloaded server might not be helping
        last_exception = e
        time.sleep(30)
if not result:
    raise last_exception
return result

Also code above is more of pseudocode, since i haven't tested it in any way and it possibly won't even work verbatim, but hopefully will convey the idea clearly.

Community
  • 1
  • 1
Tomasz Plaskota
  • 1,329
  • 12
  • 23
  • 1
    PRAW retries requests that fail due to 503 response codes twice with an approximate 2 second delay between the first retry, and approximate 4 second delay between the second. It's only after three failures that you should actually see such an exception, in which case you must manually decide how to proceed. – bboe Mar 27 '17 at 07:05
4

You might receive this error if you use Subreddit.submissions since it was deprecated in PRAW: https://github.com/praw-dev/praw/pull/916

0

If the above solution is giving you 'time' not defined error

import time
from time import sleep

result = None
last_exception = None
timeout = 900 #seconds = 15 minutes
time_start = int(time.time())
while not result and int(time.time()) < time_start + timeout:
    try:
        result = reddit.subreddit(subreddit_name).search(query, **params)
    except prawcore.exceptions.ServerError as e:
        #wait for 30 seconds since sending more requests to overloaded server might not be helping
        last_exception = e
        time.sleep(30)
if not result:
    raise last_exception
return result
0

The other answers give the right idea that you want some sort of retry system for when you don't get an answer from Reddit for some reason. But if you have an app that uses PRAW in a bunch of places, wrapping each one in retry logic is a real pain. PRAW actually has a retry mechanism built in called FiniteRetryStrategy, but it only retries a few times and then gives up. It's undocumented, but you can write your own and slot it in:

from prawcore.sessions import RetryStrategy
import random

class InfiniteRetryStrategy(RetryStrategy):
    """Retries requests forever using capped exponential backoff with jitter."""

    def _sleep_seconds(self):
        if self._attempts == 0:
            return None
        return random.randrange(0, min(self._cap, self._base * 2 ** self._attempts))

    def __init__(self, _base = 2, _cap = 60, _attempts = 0):
        self._base = _base
        self._cap = _cap
        self._attempts = _attempts

    def consume_available_retry(self):
        return type(self)(_base=self._base, _cap=self._cap, _attempts=self._attempts + 1)

    def should_retry_on_failure(self):
        return True

# To use it, slot it in right after you initialize your praw.Reddit object:
reddit = praw.Reddit(client_id=...,
                     client_secret=...,
                     username=...,
                     password=...,
                     user_agent=...)
reddit._core._retry_strategy_class = InfiniteRetryStrategy

This will make PRAW infinitely retry every request it makes instead of giving up after a few.

c0d3rman
  • 665
  • 6
  • 15