52

It's not clear to me how connections pools work, and how to properly use them. I was hoping someone could elaborate. I've sketched out my use case below:

settings.py:

import redis

def get_redis_connection():
    return redis.StrictRedis(host='localhost', port=6379, db=0)

task1.py

import settings

connection = settings.get_redis_connection()

def do_something1():
    return connection.hgetall(...)

task2.py

import settings

connection = settings.get_redis_connection()

def do_something1():
    return connection.hgetall(...)

etc.


Basically I have a setting.py file that returns redis connections, and several different task files that get the redis connections, and then run operations. So each task file has its own redis instance (which presumably is very expensive). What's the best way of optimizing this process. Is it possible to use connection pools for this example? Is there a more efficient way of setting up this pattern?

For our system, we have over a dozen task files following this same pattern, and I've noticed our requests slowing down.

Thanks

vgoklani
  • 10,685
  • 16
  • 63
  • 101

3 Answers3

35

Redis-py provides a connection pool for you from which you can retrieve a connection. Connection pools create a set of connections which you can use as needed (and when done - the connection is returned to the connection pool for further reuse). Trying to create connections on the fly without discarding them (i.e. not using a pool or not using the pool correctly) will leave you with way too many connections to redis (until you hit the connection limit).

You could choose to setup the connection pool in the init method and make the pool global (you can look at other options if uncomfortable with global).

redis_pool = None

def init():
    global redis_pool
    print("PID %d: initializing redis pool..." % os.getpid())
    redis_pool = redis.ConnectionPool(host='10.0.0.1', port=6379, db=0)

You can then retrieve the connection from a pool like this:

redis_conn = redis.Redis(connection_pool=redis_pool)

Also, I am assuming you are using hiredis along with redis-py as it should improve performance in certain cases. Have you also checked the number of connections open to the redis server with your existing setup as it most likely is quite high? You can use the INFO commmand to get that information:

redis-cli info

Check for the Clients section in which you will see the "connected_clients" field that will tell you how many connections you have open to the redis server at that instant.

Will Vousden
  • 32,488
  • 9
  • 84
  • 95
ali haider
  • 19,175
  • 17
  • 80
  • 149
  • Completely agree with your comment. For my case, each task imports its own settings.py file, so "global" is ambiguous. I'm more than happy to refactor our approach, but I'm just wondering if there's a simpler solution. If we have 12 task files running, each with its own pool, then redis will be quite slow. – vgoklani Jul 31 '15 at 21:11
  • You could look at the combination of factory and singleton patterns in python though you may find using the global route simpler to implement in python. Using the global route will prevent you from spawning new connection pools per module - you will be using connections from a connection pool – ali haider Aug 01 '15 at 18:31
  • perhaps i've misunderstood, if each instance of tasks imports settings.py (so there are two settings.py files running simultaneously in memory), won't each have it own global ? – vgoklani Aug 01 '15 at 18:41
  • how many connections do you have open (did you try with INFO)? If connections are way more than what you should have, can you confirm if you are releasing the connection (you can call release, not disconnect, when using the connection pool - mentioned earlier). You could try using the builtin modules but I think this approach would be easiest to implement (i.e. keep code the same, perhaps explicitly use the connection pool so its clear when reading the code, acquire connection from pool, release the connection when no longer needed so it goes back to the pool). – ali haider Aug 01 '15 at 18:57
  • yes, to your earlier comment, each module does have its own global – ali haider Aug 01 '15 at 19:00
  • I don't get why some people create a pool and a connection in one method, line after line. Init should create a pool and then a method can use a connection from `self`. Thanks for explaining that `redis.Redis` actually retrieves a connection. Many examples around the web are wrong, imo. – Tom Wojcik May 14 '18 at 07:35
  • 1
    I'm confused by your reference to "the init method". Is this some feature of Python I'm not aware of, or are you assuming some framework is used? – Hubro Oct 06 '18 at 22:46
13

You shall use a singleton( borg pattern ) based wrapper written over redis-py, which will provide a common connection pool to all your files. Whenever you use an object of this wrapper class, it will use the same connection pool.

REDIS_SERVER_CONF = {
    'servers' : {
      'main_server': {
        'HOST' : 'X.X.X.X',
        'PORT' : 6379 ,
        'DATABASE':0
    }
  }
}

import redis
class RedisWrapper(object):
    shared_state = {}

    def __init__(self):
        self.__dict__ = self.shared_state

    def redis_connect(self, server_key):
        redis_server_conf = settings.REDIS_SERVER_CONF['servers'][server_key]
        connection_pool = redis.ConnectionPool(host=redis_server_conf['HOST'], port=redis_server_conf['PORT'],
                                               db=redis_server_conf['DATABASE'])
        return redis.StrictRedis(connection_pool=connection_pool)

Usage:

r_server = RedisWrapper().redis_connect(server_key='main_server')
r_server.ping()

UPDATE

In case your files run as different processes, you will have to use a redis proxy which will pool the connections for you, and instead of connecting to redis directly, you will have to use the proxy. A very stable redis ( and memcached ) proxy is twemproxy created by twitter, with main purpose being reduction in open connections.

NoobEditor
  • 15,563
  • 19
  • 81
  • 112
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175
  • 1
    i agree that the singleton pattern is correct, but it's not clear to me how each task file could share the same RedisWrapper. For example, if each task file imports redis_wrapper, then they won't share the same object. It would make sense if the wrappers ran the instances in separate threads, but that's another can of worms – vgoklani Aug 04 '15 at 15:00
  • @vgoklani I was under assumption that all those modules are part of the same process. If they run as seperate processes, you can use a connection pooling proxy. Have updated my answer for the same. – DhruvPathak Aug 04 '15 at 16:46
  • 1
    Seems like we will assimilate this example. We will add your creative singleton nickname to our own. Resistance if futile. – DarkCygnus Nov 13 '18 at 21:09
  • 1
    For what is here `shared_state`? – ruslan_krivoshein Nov 30 '19 at 12:15
  • 1
    The code for RedisWrapper should assign the connectionpool to an instance variable if you want the Borg pattern to not instantiate new connectionpool. i.e. `self.pool = redis.ConnectionPool(....)` – canadadry Sep 16 '20 at 23:37
9

Here's a quote right from the Cheese Shop page.

Behind the scenes, redis-py uses a connection pool to manage connections to a Redis server. By default, each Redis instance you create will in turn create its own connection pool. You can override this behavior and use an existing connection pool by passing an already created connection pool instance to the connection_pool argument of the Redis class. You may choose to do this in order to implement client side sharding or have finer grain control of how connections are managed.

pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)

Moreover, instances are thread-safe:

Redis client instances can safely be shared between threads. Internally, connection instances are only retrieved from the connection pool during command execution, and returned to the pool directly after. Command execution never modifies state on the client instance.

You say:

So each task file has its own redis instance (which presumably is very expensive). ... For our system, we have over a dozen task files following this same pattern, and I've noticed our requests slowing down.

It's quite unlikely that several dozens of connections can slow down Redis server. But because your code, behind the scenes, use connection pool, the problem is somewhere out of connections per se. Redis is in-memory storage, thus very fast in most imaginable cases. So I would rather look for the problem in the tasks.

Update

From comment of @user3813256. Yes, he uses connection pool at task level. The normal way to utilize built-in connection pool of redis package is just share the connection. In simplest way, your settings.py may look like this:

import redis

connection = None

def connect_to_redis():
    global connection
    connection = redis.StrictRedis(host='localhost', port=6379, db=0)

Then somewhere in bootstrapping of your application call connect_to_redis. Then use import connection in task modules.

Community
  • 1
  • 1
saaj
  • 23,253
  • 3
  • 104
  • 105
  • 2
    OP is not using a connection pool! He is spawning a new connection from every module (and he does not show if he is closing the connection anywhere). Yes, issue could be elsewhere but the connections (based on the code shared) is not coming from a pool nor is it being closed. –  Jul 31 '15 at 13:14
  • @user3813256 Strictly speaking, from what I emphasised in the quote, you're wrong. Instantiating `redis.StrictRedis` creates a pool, which OP is indeed using. What you're right, is that he does it per task, which is likely suboptimal. I've updated the answer. – saaj Jul 31 '15 at 14:01
  • 1
    the tricky thing is that each "task" imports its own settings.py file, so there is no global state. – vgoklani Jul 31 '15 at 21:09