Strategy handling and invalidating cached data on subscriptions in a moderately complex usecase

Question

Let's take a chat application for example. A user has access to multiple chat threads, each with multiple messages. The user interface consists of a list of threads on the left side (listThreads) that contains the name of the other party, the last message, the unread message count and the date & time of the last message, and the actual messages (viewThread) and a reply box on the right hand side (think facebook messenger).

When the user selects a message thread, the viewThread component subscribes to a query something along the lines of:

  query thread {
    threads(id: 'xxxx') {
      id
      other_party { id name }
      unread_messages
      messages { 
        sent_by { id } 
        timestamp
        text
      }
  }

To make updates live, it is setting q.subscriptToMore with a subscription along the lines of:

  subcription newMessages {
    newMessage(thread_id: 'xxx') {
        sent_by { id } 
        timestamp
        text    
    }
  }

This works perfectly, the new messages show up as they should.

To list the available message threads, a less detailed view of all threads are queried:

  query listThreads {
    threads {
      id
      other_party { id name }
      unread_messages
      last_updated_at
    }
  }

To keep the list in sync the same subscription is used, without filtering on the thread_id, and the thread list data is updated manually

This also works fine.

However if a thread A is selected, the messages of thread A are cached. If thread B is selected afterwards the subscription to the query getting the detailed info of thread A is destroyed since the observable is destroyed when the router excanges the viewThread component.

If then a message arrives to thread A while the user is viewing thread B, the threadList is updated (since that subscription is live), but if the user switches back to thread A, the messages are loaded from the cache which are now outdated, since there was no subscription for that particular message thread that would have updated or invalidated the cache.

In other circumstances where the user navigates to an entirely different page, where thread list would not be in view the problem is even more obvious, as there is nothing related to the chat messages that are actively subscribed to, so nothing to invalidate the cached data when a new message arrives, although the server theoretically provides a way to do that by offering new message subscription events.

My question is the following:

What are the best practices on keeping data in sync / invalidating that has been cached by Apollo, which are not actively "in use"? What are the best practices on keeping nested data in sync (messages of threads of an event [see below]). I don't feel like having to implement the logic on how to subscribe to and update message data in the event query is a clean solution.

Using .subscribeToMore works for keeping data that is actively used in sync, but once that query is no longer in use the data remains in the cache which may or may not get outdated with time. Is there a way to remove cached data when an observable goes out of scope? As in keep this data cached as long as there is at least one query using it, because i trust that it also implements logic that will keep it in sync based on the server push events.

Should a service be used that subscribes (through the whole lifecycle of the SPA) to all subscription events and contains the knowledge on how to update each type of cached data, if present in the cache? (this service could be notified on what data needs to be kept in sync to avoid using more resources than necessary) (as in a service that subscribes to all newMessage events, and pokes the cache based on that)? Would that automatically emit new values for queries that have returned objects that have references to such data? (would updating message:1 make a thread query that returned the same message:1 in its messages field emit a new value automatically) Or those queries have to also be updated manually?

This starts to be very cumbersome when extending this model with say Events that also have their own chat thread, so querying event { thread { messages { ... } } now needs to subscribe to the newMessage subscription which breaks encapsulation and the single responsibility principle. It is also problematic that to subscribe to newMessage data one would need to provide the id of the message thread associated with the event, but that is not known before the query returns. Due to this .subscribeToMore cannot be used, because at that point I don't have the thread_id available yet.

Please properly [format](https://stackoverflow.com/editing-help) your code blocks and inline code to improve the readability of this post. — Daniel Rearden, May 20 '20 at 11:41
@DanielRearden sorry about that, not sure why my formatting got removed. — Ákos Vandra-Meyer, May 20 '20 at 17:53
@ÁkosVandra - did you ever find a solution to this? If so, can you post it here? A year and seven months later I'm facing this exact same thing. — Justin Handley, Dec 21 '21 at 15:54
@JustinHandley I temporarily halted the personal project this was necessary for - and the pause took longer than expected haha, but I don't remember finding a real solution for this unfortunately. I think I experimented with the new graphql api and found a hacky solution there, but I'm not sure, sorry. If you end up finding a solution, I'd appreciate if you'd post it, so that I can use it when I finally can get back to my project. Becoming a dad shifts priorities :) — Ákos Vandra-Meyer, Dec 24 '21 at 08:42

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

If the intended behavior is "every time I open a thread, show the latest messages and not just what's cached", then you just need to set the fetchPolicy for your thread query to network-only, which will ensure that the request is always sent to the server rather than being fulfilled from the cache. The docs for apollo-angular are missing information about this option, but here's the description from the React docs:

Valid fetchPolicy values are:

cache-first: This is the default value where we always try reading data from your cache first. If all the data needed to fulfill your query is in the cache then that data will be returned. Apollo will only fetch from the network if a cached result is not available. This fetch policy aims to minimize the number of network requests sent when rendering your component.

cache-and-network: This fetch policy will have Apollo first trying to read data from your cache. If all the data needed to fulfill your query is in the cache then that data will be returned. However, regardless of whether or not the full data is in your cache this fetchPolicy will always execute query with the network interface unlike cache-first which will only execute your query if the query data is not in your cache. This fetch policy optimizes for users getting a quick response while also trying to keep cached data consistent with your server data at the cost of extra network requests.

network-only: This fetch policy will never return you initial data from the cache. Instead it will always make a request using your network interface to the server. This fetch policy optimizes for data consistency with the server, but at the cost of an instant response to the user when one is available.

cache-only: This fetch policy will never execute a query using your network interface. Instead it will always try reading from the cache. If the data for your query does not exist in the cache then an error will be thrown. This fetch policy allows you to only interact with data in your local client cache without making any network requests which keeps your component fast, but means your local data might not be consistent with what is on the server. If you are interested in only interacting with data in your Apollo Client cache also be sure to look at the readQuery() and readFragment() methods available to you on your ApolloClient instance.

no-cache: This fetch policy will never return your initial data from the cache. Instead it will always make a request using your network interface to the server. Unlike the network-only policy, it also will not write any data to the cache after the query completes.

While this works around the problem, it is not a solution for it. Turning off the cache and fetching the data from the server each time is suboptimal, the cache's purpose should be exactly to avoid this. Also this would mean that each query would have to know wether there may or may not be outdated data in the cache, unless the cache is entirely turned off. I'm looking for a way to use subscriptions for their original purpose: keep cached data in sync, or invalidate them. IMHO if the server provides a way, the cache should NEVER store outdated info. — Ákos Vandra-Meyer, May 20 '20 at 17:55

Strategy handling and invalidating cached data on subscriptions in a moderately complex usecase

1 Answers1