This question may look like this. I am trying to gather ideas on how to implement a BGP pipeline.
I am receiving 100-1000 messages (BGP updates) per second, a few kilobytes per update, over Kafka.
I need to archive them in a binary format with some metadata for fast lookup: I am building periodically a "state" of the BGP table which will merge all the updates received over a certain time. Thus the need of a database.
What I was doing until now: group them in "5 minute" files (messages end-to-end) as it is common thing for BGP collection tools and add the link in a database. I realize some disadvantages: complicated (having to group by key, manage Kafka offset commit), no fine selection where to start/end.
What I am thinking: using a database (Clickhouse/Google BigTable/Amazon Redshift) and insert every single entry with the metadata and a link to the unique update stored on S3/Google Cloud storage/local file.
I am worried of the download performances (most likely over HTTP) since compiling all the updates into a state may take a few thousands of those messages. Do you have experience of batch downloading this? I do not think storing the updates directly in the database would be optimal too.
Any opinion, ideas, suggestions? Thank you