0

I want to create a Custom Apache Flink Sink to AWS Sagemaker Feature store, but there is no documentation for how to create custom sinks on Flink's website. There are also multiple base classes that I can potentially extend (e.g. AsyncSinkBase, RichSinkFunction), so I'm not sure which to use.

I am looking for guidelines regarding how to implement a custom sink (both in general and for my specific use-case). For my specific use-case: Sagemaker Feature Store has a synchronous client with a putRecord call to send records to AWS Sagemaker FS, so I am ideally looking for a way to create a custom sink that would work well with this client. Note: I require at at least once processing guarantees, as Sagemaker FS is DynamoDB (a key-value store) under the hood.

What I've Found so Far

  • Some older articles which say to use org.apache.flink.streaming.api.functions.sink.RichSinkFunction and SinkFunction
  • Some connectors using classes in org.apache.flink.connector.base.sink.writer (e.g. AsyncSinkWriter, AsyncSinkBase)
  • This section of the Flink docs says to use the SourceReaderBase from org.apache.flink.connector.base.source.reader when creating custom sources; SourceBaseReader seems to be the equivalent source to the sink classes in the bullet above

Any help/guidance/insights are much appreciated, thanks.

r_g_s_
  • 224
  • 1
  • 8

1 Answers1

3

How about extending RichAsyncFunction ?

you can find similar example here - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api

  • What Sucheth said. For most endpoints with a client making HTTP requests, using Flink's Async IO is going to be the easiest and most performant way to integrate. I don't know whether SageMaker supports idempotent writes, so you might not be able to support exactly once. – kkrugler Feb 09 '23 at 17:27
  • Does Async IO work given that the Sagemaker client is synchronous? – r_g_s_ Feb 09 '23 at 19:48