I am trying to use flink for data-enrichment on multiple streams of data.
Here I have some data in account_stream and status_stream. I want to add that data to all other streams coming from multiple different sources. all the streams have one field common in their data: "account_id".
This is the approach i took.
account_stream.connect(status_stream)
.flat_map(EnrichmentFunction())
.filter(lambda x: x['name'] != "-" and x['date'] != "0000-00-00 00:00:00")
.key_by(lambda row: row['account_id'])
.connect(stream1)
.flat_map(function_2())
.filter(lambda x: x!="2")
.key_by(lambda row: row['account_id'])
.connect(stream2)
.flat_map(function_2())
.key_by(lambda row: row['account_id'])
.connect(stream3)
.flat_map(function_3())
.key_by(lambda row: row['account_id'])
.connect(stream4)
.flat_map(function_4())
.key_by(lambda row: row['account_id'])
.connect(stream5)
.flat_map(function_5())
.key_by(lambda row: row['account_id'])
.connect(stream6)
.flat_map(function_6())
.key_by(lambda row: row['account_id'])
.connect(stream7)
.flat_map(function_7())
.key_by(lambda row: row['account_id'])
.connect(stream_8)
.flat_map(function_8())
.map(lambda a: str(a),Types.STRING())
.add_sink(kafka_producer)
I am saving necessary data in state and appending that to all streams using flat_map function. And at the end adding one kafka sink to send all streams enriched with state.
Now once I execute this, I am getting this error:''java.io.IOException: Insufficient number of network buffers: required 17, but only 8 available. The total number of network buffers is currently set to 2048 of 32768 bytes each.''
I tried changing taskmanager.memory.network.fraction to 0.5 , taskmanager.memory.network.max to 15 gb and taskmanager.memory.process.size to 10 gb in flink config file. But it still gave same error. Do I have to do something other than just saving it to see the changes reflect in flink job? or problem is something else?
Also let me know if this approach is not efficient for the task and if there's something else I should try?
I am using single 32gb ram, 8-core server to run this in python with pyflink library, with kafka and elastic running on same server.
Thank you.