1

Okay, here is my setup. I'm on Heroku running a scrapyd daemon using the scrapy-heroku package https://github.com/dmclain/scrapy-heroku.

I'm having issues running out of database connections. I decided to try pooling the database connections use pgbouncer. I'm using this buildpack: https://github.com/heroku/heroku-buildpack-pgbouncer

My procfile was: web: scrapyd

And I changed it to: web: bin/start-pgbouncer-stunnel scrapyd

The buildpack is supposed to rewrite your DATABASE_URL when it initializes so that whatever child process is run can just use the DATABASE_URL as normal but will now be connecting to pgbouncer instead of directly to the database.

Within scrapy I'm using adbapi to create a pool for each spider as such:

def from_settings(cls, settings):
    dbargs = dict(
        host=settings['MYSQL_HOST'],
        database=settings['MYSQL_DBNAME'],
        user=settings['MYSQL_USER'],
        password=settings['MYSQL_PASSWD'],
        #charset='utf8',
        #use_unicode=True,
    )
    dbpool = adbapi.ConnectionPool('psycopg2', cp_max=2, cp_min=1, **dbargs)
    return cls(dbpool)

And in my settings this is how I'm getting the DATABASE_URL info:

import urlparse
urlparse.uses_netloc.append("postgres")
url = urlparse.urlparse(os.environ["DATABASE_URL"])
MYSQL_HOST = url.hostname
MYSQL_DBNAME = url.path[1:]
MYSQL_USER = url.username
MYSQL_PASSWD = url.password

This was working fine before I added pgbouncer buildpack. Now I get connection errors:

Traceback (most recent call last):
  File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/xlib/pydispatch/robustapply.py", line 57, in robustApply
    return receiver(*arguments, **named)
  File "/tmp/etc/etc/etc/middlewares.py", line 92, in spider_opened
  File "/app/.heroku/python/lib/python2.7/site-packages/psycopg2/__init__.py", line 164, in connect
    conn = _connect(dsn, connection_factory=connection_factory, async=async)
OperationalError: could not connect to server: Connection refused
    Is the server running on host "127.0.0.1" and accepting
    TCP/IP connections on port 5432?

Does anyone have an idea what the issue may be?

jeffjv
  • 3,461
  • 2
  • 21
  • 28

0 Answers0