I am trying to read cloud SQL table in java beam using JdbcIO.Read. I want to convert each row in Resultset into GenericData.Record using .withRowMapper(Resultset resultSet) method. Is there a way I can pass JSON Schema String as input in .withRowMapper method like ParDo accepts sideInputs as PCollectionView
I have tried doing both reads operations (read from information_schema.columns and My Table in same JdbcIO.Read transform). However, I would like to have Schema PCollection generated first and then read table using JdbcIO.Read
I am generating Avro schema of table on the fly like this :
PCollection<String> avroSchema= pipeline.apply(JdbcIO.<String>read()
.withDataSourceConfiguration(config)
.withCoder(StringUtf8Coder.of())
.withQuery("SELECT DISTINCT column_name, data_type \n" +
"FROM information_schema.columns\n" +
"WHERE table_name = " + "'" + tableName + "'")
.withRowMapper((JdbcIO.RowMapper<String>) resultSet -> {
// code here to generate avro schema string
// this works fine for me
}))
Creating PCollectionView which will hold my json schema for each table.
PCollectionView<String> s = avroSchema.apply(View.<String>asSingleton());
// I want to access this view as side input in next JdbcIO.Read operation
// something like this ;
pipeline.apply(JdbcIO.<String>read()
.withDataSourceConfiguration(config)
.withCoder(StringUtf8Coder.of())
.withQuery(queryString)
.withRowMapper(new JdbcIO.RowMapper<String>() {
@Override
public String mapRow(ResultSet resultSet) throws Exception {
// access schema here and use it to parse and create
//GenericData.Record from ResultSet fields as per schema
return null;
}
})).
withSideInputs(My PCollectionView here); // this option is not there right now.
Is there any better way to approach this problem?