0

I am working on an application that uses Amazon Kinesis, and one of the things I was wondering about is how you can roll over an application during an upgrade without data loss on streams. I have heard about things like blue/green deployments and such, but I was wondering what is the best practice for upgrading a data streaming service so you don't loose data from your streams.

For example, my application has an HTTP endpoint that ingests data as a series of POST operations. If I want to replace the service with a newer version, how do I manage existing application streaming to my endpoint?

Eric Kolotyluk
  • 1,958
  • 2
  • 21
  • 30

1 Answers1

0

One common method is having a software load balancer (LB) with a virtual IP; behind this LB there would be at least two HTTP ingestion endpoints during normal operation. During upgrade, each endpoint is announced out and upgraded in turn. The LB ensures that no traffic is forwarded to an announced out endpoint.

(The endpoints themselves can be on separate VMs, Docker containers or physical nodes).

Of course, the stream needs to be finite; the TCP socket/HTTP stream is owned by one of the endpoints. However, as long as the stream can be stopped gracefully, the following flow works, assuming endpoint A owns the current ingestion:

  1. Tell endpoint A not to accept new streams. All new streams will be redirected only to endpoint B by the LB.
  2. Gracefully stop existing streams on endpoint A.
  3. Upgrade A.
  4. Announce A back in.
  5. Rinse and repeat with endpoint B.

As a side point, you would need two endpoints with a load balanced (or master/slave) set-up if you require any reasonable uptime and reliability guarantees.

There are more bespoke methods which allow hot code swap on the same endpoint, but they are more bespoke and rely on specific internal design (e.g. separate process between networking and processing stack connected by IPC).

RomanK
  • 1,258
  • 7
  • 18