Realtime Data Caching with multiple subscribers and live updates

Question

I have a design problem. I have an application which subscribes to a realtime system to show data. essentially what happens is the client connects to a server, downloads a snapshot of the data at the current time, and subscribes for live updates, which are immediately displayed on the UI.

One problem we have is that we can open multiple realtime reports which means that we have multiple connections and duplications of data which are not necessary. So we want to make a central data repository to hold all of the data and serve it to the reports, so that we only use 1 socket connection and one set of data crosses the wire.

The problem I have is this. When a report subscribes to my data repository, it retrieves the snapshot at the present time, and then receives live updates afterward. That means my repository is updating it's internal cache with the live updates from the server, and sending those updates to subscribed reports.

when another report connects to the repository, it needs to also download the current data and subscribe to updates. However, if updates come in while the snapshot is being downloaded, they will be missed by the report. I also can't lock the cache while the snapshot is being downloaded, because that would cause report 1 to stop updating while report 2 gets its snapshot.

how can i ensure that report 1 continues to get its updates, while report 2 downloads an unmolested snapshot and then begins to receive all the updates that it missed in the meantime as well as future updates?

Sorry if this isn't clear. I am not always good at describing my problem :) The data that comes in is essentially rows in a table which i then summarize into a tree. they can be identified by key fields in the "row", and my cache would store the latest copies of each "row"

Thanks in advance!

Eugene Petrov · Answer 1 · 2012-05-24T03:26:16.190

0

If I understand you correct you have 3 part of your system:

Realtime system that write the information about report
Cache server where the information is stored to
Clients that get this information

right?

If so, if I were you, I would develop a manager for the cache server and make 2 API for realtime system and clients that they would use to work with cache server. I would stay at the rule one to write at one time no one to read or all to read no one to write. I would make queues. One for clients requests and one for realtime. And we need the synchronization mechanism for that queues.

I see the next way for work:

if realtime system write new information:

there are client readers for reports that are updating right now

1.1 Cache manager write all info to the second store for these reports

If manager see that there is new info, it stop new readers requests and put them in the queue and wait until all threads that had already started for reading to be finished and make an updated from the second reposytory to the first
No readers

2.1. Put info to the main store block readers on the reports that are modified

If your realtime system is realy real time(works on a realtime processor) and write everytime you should add timouts for merging two stores and to stop readers for that time.

edited May 24 '12 at 03:26

answered May 24 '12 at 03:16

Eugene Petrov

651
1
5
14

Hi Eugene, thanks for your response. one issue is that a report could be opened at any time, and must catch up with all of the data issued so far, but also subscribe to further updates, without stopping updates to the other subscribed reports. your solution would work but it would cause the existing reports to stop updating while the newly connected report "catches up", correct? This is part of my initial solution, but now i'm looking for a way to keep the existing reports going while I catch the new report up, and then have it be an observer like the rest of them – Greg Olmstead May 24 '12 at 13:35
Yes, you got my thoughts correct. I think you will have to find the place where you will block you report for updating, because you can't to write and read concurrently in the same report. Because your users will get damaged information. – Eugene Petrov May 25 '12 at 03:28
What if we put the responsibility for getting new info not on the client program, but on the cache service? When your realtime system to produces new information it put ones on the cache service through API. Cache service update the report in the cache and make a pocket(you can invent your own protocol) with new info of a report and put it to a queue, then a mechanism get this info and send to client. So we get the system when not a client ask for updates, but a client to connect and just listen for new info. In this case we doesn't have to block something – Eugene Petrov May 25 '12 at 03:40
Exept when your client asks a report for first time, of course – Eugene Petrov May 25 '12 at 03:59
I like your idea. If real-time you specified could allow some shift into the past, you can have always put the enough-to-catch-up data in the "download" package. This way a new protocol won't be necessary as long as you can guarantee download duration. – Bamboo Nov 07 '12 at 08:44

score 0 · Answer 2 · answered Jun 22 '21 at 11:10

why don't you have every cache state have a hash. all subscribers must check with the manager for the current hash number if ours is a different hash then trigger the to download only the updates.

I will suggest you save the updates and allow the client to update themselves to the latest version before subscribing to updates. you can twig how far back to store the updates.

Realtime Data Caching with multiple subscribers and live updates

2 Answers2