0

I'm new working with real-time applications. Currently, I'm using AWS Kinesis/Flink and Scala I have the following architecture:

old architecture

As you can see I consume a CSV file using CSVTableSource. Unfortunately, the CSV file became too big for the Flink Job. The file is updated daily, then new rows are added. So, now I am working in a new architecture, where I want to replace the CSV for a DynamoDB.

new architecture

My question is: what do you recommend to consume the DynamoDB table?

PD: I need the to do a left outer join using the DynamoDB table and the Kinesis Data Stream data

1 Answers1

0

You could use a RichFlatMapFunction to open DynamoDB client and lookup data from DynamoDB. A sample code is given below.

public static class DynamoDBMapper extends RichFlatMapFunction < IN, OUT > {

    // Declare Dynamodb table
    private Table table;
    private String tableName = "";

    @Override
    public void open(Configuration parameters) throws Exception {
        // Initialize DynamoDB client
        AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
            .withRegion(Regions.US_EAST_1)
            .build();
        DynamoDB dynamoDB = new DynamoDB(client);
        this.table = dynamoDB.getTable(tableName);
    }

    @Override
    public void flatMap(IN < T > , Collector < T > out) throws Exception {
        // execute getitem
        out.collect();
    }
}
naba
  • 171
  • 2
  • 5