2

I'm looking into adding in-memory cache, like redis, to my application, and I'm having some issues understanding how it all fits together.

const postLoader = new DataLoader(async keys => {
    // ... sql code to get multiple posts by ids
})

const postRepository = {
    async get(id) {
        let post = await cachingImplementation.get("post:" + id)

        if (!post) {
            post = await postLoader.load(id)
        }

        return post
    }
}

I understand the need to batch queries to the database, but does the same principle apply to queries to a redis server?

In this scenario, if I run the postRepository.get method 10 times within the same tick, I would have to make 10 different requests to the redis server.

Is this a problem? Should I move the actual fetching source ( cache or database ) inside the dataloader resolver, so that it, instead of executing the sql code directly, would first look into cache and then into the database.

For example

cache = {
  1: ...,
  2: ...
}

If I ask for posts with id's 1,2,3, the cache has only two of them. So I would have to filter out the existing ones and query the database for the remaining ones, or just check if the requested id's match the returned rows length, and if it doesn't, query the database for everything.

What are the downsides of both approaches? Is there a preferred solution?

Diyan Slavov
  • 353
  • 2
  • 11

1 Answers1

3
  1. The main cost of making multiple requests to any database vs aggregating those request into one, is networking. You don't eliminate this cost by using an in-memory database. So I suggest you use the cache inside the data loader.
  2. Gradually populating the cache is the way to go but don't forget to put an expiration date on Redis keys because you may get out of memory pretty soon if you cache every entity.

Depending on your application an API layer cache may be a better option for you. Checkout Apollo Server Caching Doc if you are using GraphQL.

In our system, we call the data loader, API aggregation layer. What you want to achieve here is to have an API aggregation layer cache. I suggest that you generalize this cache regardless of the data model and use a higher-order function whenever you wanted to cache a data loader.

const memo = (type, loadData) => {
  return async (keys) => {
    { cacheData, notFoundKeys } = loadFromRedis(type, keys);
    let data = cacheData;
    if (notFoundKeys.length > 0) {
      loadedData = await loadData(notFoundKeys);
      populateCache(type, notFoundKeys, data);
      data = addLoadedData(cacheData, loadedData);
    }
    return data;
  }
}

const postLoaderMemoized = memo('post', async keys => {
  // ... sql code to get multiple posts by ids
})

const postLoader = new DataLoader(postLoaderMemoized)
HosseinAgha
  • 759
  • 6
  • 18
  • I did go through the Apollo docs for caching, but they don't seem to provide an expiration mechanism other than timeouts. The application that I am working on allows users to create posts and other entities and edit them, so time-based expiry is not a good fit, at least not on first sight. I have the necessity to manually invalidate the cache from some mutations. Do they happen to provide an API for this that I have missed? – Diyan Slavov Jul 25 '20 at 09:04
  • So a Redis (data-loader) cache may be a good option for you (my solution above is for a Redis cache). I just mentioned Apollo cache as it sometimes is a better fit for some applications. I still think you need to use Redis expirations in addition to cache invalidation inside your mutations. storing all the viewed posts on Redis may fill your machine's memory pretty soon unless you'll have a limited number of posts. – HosseinAgha Jul 25 '20 at 09:19
  • Yes, I will use expiration timeouts, but they will generally be longer than what I would consider with Apollo Caching. Do you happen to know any patterns that would allow the "concerns" to be separated a bit, because I'm not especially fond of mixing cache queries with database queries, its too tightly coupled together it seems. – Diyan Slavov Jul 25 '20 at 09:26
  • We make database requests in the _data access layer_, our _platform (application) layer_ functions then call data layer to access the database, and finally, we call platform layer functions from _API aggregation layer_ (data loaders) and finally _API layer_ calls data loaders (and sometimes platform layer directly). You can put a Redis cache before any of these layers but I am fond of the API aggregation layer cache using higher-order functions (in your case). I also suggest you take a look at [prisma](https://www.prisma.io/) if you are using SQL and need a more organized data access layer. – HosseinAgha Jul 25 '20 at 09:37