I would like to know, how JdbcIO would execute a query in parallel if my query returns millions of rows. I have referred https://issues.apache.org/jira/browse/BEAM-2803 and the related pull requests. I couldn't understand it completely.
ReadAll
expand
method uses a ParDo
. Hence would it create multiple connections to the database to read the data in parallel? If I restrict the number of connections that can be created to a DB in the datasource, will it stick to the connection limit?
Can anyone please help me to understand how this would handled in JdbcIO
? I am using 2.2.0
Update :
.apply(
ParDo.of(
new ReadFn<>(
getDataSourceConfiguration(),
getQuery(),
getParameterSetter(),
getRowMapper())))
The above code shows that ReadFn is applied with a ParDo. I think, the ReadFn will run in parallel. If my assumption is correct, how would I use the readAll()
method to read from a DB where I can establish only a limited number of connections at a time?
Thanks Balu