-1

This question may look like this. I am trying to gather ideas on how to implement a BGP pipeline.

I am receiving 100-1000 messages (BGP updates) per second, a few kilobytes per update, over Kafka.

I need to archive them in a binary format with some metadata for fast lookup: I am building periodically a "state" of the BGP table which will merge all the updates received over a certain time. Thus the need of a database.

What I was doing until now: group them in "5 minute" files (messages end-to-end) as it is common thing for BGP collection tools and add the link in a database. I realize some disadvantages: complicated (having to group by key, manage Kafka offset commit), no fine selection where to start/end.

What I am thinking: using a database (Clickhouse/Google BigTable/Amazon Redshift) and insert every single entry with the metadata and a link to the unique update stored on S3/Google Cloud storage/local file.

I am worried of the download performances (most likely over HTTP) since compiling all the updates into a state may take a few thousands of those messages. Do you have experience of batch downloading this? I do not think storing the updates directly in the database would be optimal too.

Any opinion, ideas, suggestions? Thank you

2 Answers2

0

What I was doing until now: group them in "5 minute" files (messages end-to-end) as it is common thing for BGP collection tools and add the link in a database. I realize some disadvantages: complicated (having to group by key, manage Kafka offset commit), no fine selection where to start/end.

Why don’t you try Kafka-streams which gives you windowing feature and then just group by key and dump into database? With Kafka-streams you won’t have to worry about group by key and many other issues you mentioned.

If Kafka-streams is not an option for you then just store the message with update one at a time in database and the dB reader can just group by time window and key.

mrnakumar
  • 625
  • 6
  • 13
0

Cloud Bigtable is capable of 10,000 requests per second per "node", and costs $0.65 per node per hour. The smallest production cluster is 3 nodes for a total of 30,000 rows per second. Your application calls for a maximum of 1,000 requests per second. While Cloud Bigtable can handle your workload, I would suggest that you consider Firestore.

At a couple of K per message, I would also consider putting the entire value in the database rather than just the metadata for ease of use.

Solomon Duskis
  • 2,691
  • 16
  • 12