I have a use case when I execute some calculations on a part of data and these calculations depend on the context (intermediate state).
For example: I have some orders and perform some calculations on them. Calculations are performed for orders grouped by symbol field.
class Order {
LocalDateTime ts;
String symbol;
....
}
So I decided to key orders by symbol field and keep separate state for each group:
DataStream<Order> orders = tableEnv.toDataStream(selectStatement, Order.class);
orders.keyBy(Order::getSymbol).flatMap(new SymbolExecutionContext()).addSink(jdbcSink);
Function with state:
public class SymbolExecutionContext extends RichFlatMapFunction<Order, OrderBookRow> {
private transient ValueState<OrderBook> orderBookState;
@Override
public void flatMap(Order input, Collector<OrderBookRow> out) throws Exception {
OrderBook orderBook = this.orderBookState.value();
if (orderBook == null) {
orderBook = new OrderBook(input.getSymbol());
this.orderBookState.update(orderBook);
orderBook = this.orderBookState.value();
}
final List<OrderBookRow> execute = OrderBookService.execute(input, orderBook);
for (final OrderBookRow orderBookRow : execute) {
out.collect(orderBookRow);
}
this.orderBookState.update(orderBook);
}
@Override
public void open(Configuration config) {
ValueStateDescriptor<OrderBook> descriptor = new ValueStateDescriptor<OrderBook>("orderbook", OrderBook.class, null);
orderBookState = getRuntimeContext().getState(descriptor);
}
}
So I create new OrderBook for each unique symbol. OrderBook represents symbol context in which some calculations are performed for each Order with the same symbol.
However it seems it doesn't work. It works well if there is only one symbol. If more than one it produces invalid results (missing or not accurate) and the results are more or less unpredictive.
Flink job is executed in batch mode.
Is there a better way to handle this use case?