Database Design for continuous data stream

Question

I am currently developing a tool where one client A can send a continuous data stream (just text) to a server and another client B should be able to watch the data stream in real time, as in fetching the same data from the server again. Of course the server should not send all the available data to client B since it can get a lot of text, so I am currently thinking in how to design that client B is only fetching the newest data.

My first approach was to do it similar to pagination where client B sends another attribute client_lines = 10 to the server indicating how many lines of data he already posses and then we can query our database with where lines > client_lines. But the database can grow quite large since we would have one database for many users each sending data which can have a lot of text-lines. So querying the complete database with data from different users does not seem like a super efficient solution.

Is there any smarter approach? Maybe using a NoSQL database like MongoDB?

score 1 · Accepted Answer · answered Mar 16 '21 at 20:10

You are looking for a Topic, a pub-sub implementation in which multiple subscribers can receive messages that are published and consumers may consume just the incremental bits. You can find good implementations of this by products like ActiveMQ, JMS, Kafka, Amazon SNS, Kinesis, and many more. It occasionally implemented in a relational database, but rarely is it implemented well in a relational database. You are generally far better off using a dedicated solution.

Note, often a database will subscribe to the topic in order to receive updates, and bridge to the relational model.

Database Design for continuous data stream

1 Answers1