Flink - DynamoDB source

Question

I'm new working with real-time applications. Currently, I'm using AWS Kinesis/Flink and Scala I have the following architecture:

old architecture

As you can see I consume a CSV file using CSVTableSource. Unfortunately, the CSV file became too big for the Flink Job. The file is updated daily, then new rows are added. So, now I am working in a new architecture, where I want to replace the CSV for a DynamoDB.

new architecture

My question is: what do you recommend to consume the DynamoDB table?

PD: I need the to do a left outer join using the DynamoDB table and the Kinesis Data Stream data

You may want to look at this post: https://stackoverflow.com/questions/45436414/consume-dynamodb-streams-in-apache-flink — Dominik Wosiński, Jan 07 '22 at 00:09
Hi Dominik, I read that post a few days ago, but the thing is that I don't need to read the dynamodb table as a data stream. — Felipe Jorquera Uribe, Jan 07 '22 at 01:29
So, what's the idea here since the DynamoDB is udpated daily, how do You want to propagate changes ? — Dominik Wosiński, Jan 07 '22 at 02:26
Maybe I don't explain myself very well, but I would like to work with the dynamodb table similarly to CSVTableSource. Sorry if I am misunderstanding some Flink concepts — Felipe Jorquera Uribe, Jan 07 '22 at 02:59

score 0 · Answer 1 · answered Nov 21 '22 at 19:05

You could use a RichFlatMapFunction to open DynamoDB client and lookup data from DynamoDB. A sample code is given below.

public static class DynamoDBMapper extends RichFlatMapFunction < IN, OUT > {

    // Declare Dynamodb table
    private Table table;
    private String tableName = "";

    @Override
    public void open(Configuration parameters) throws Exception {
        // Initialize DynamoDB client
        AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
            .withRegion(Regions.US_EAST_1)
            .build();
        DynamoDB dynamoDB = new DynamoDB(client);
        this.table = dynamoDB.getTable(tableName);
    }

    @Override
    public void flatMap(IN < T > , Collector < T > out) throws Exception {
        // execute getitem
        out.collect();
    }
}

Flink - DynamoDB source

1 Answers1