0

I am trying to read cloud SQL table in java beam using JdbcIO.Read. I want to convert each row in Resultset into GenericData.Record using .withRowMapper(Resultset resultSet) method. Is there a way I can pass JSON Schema String as input in .withRowMapper method like ParDo accepts sideInputs as PCollectionView

I have tried doing both reads operations (read from information_schema.columns and My Table in same JdbcIO.Read transform). However, I would like to have Schema PCollection generated first and then read table using JdbcIO.Read

I am generating Avro schema of table on the fly like this :

PCollection<String> avroSchema= pipeline.apply(JdbcIO.<String>read()
                .withDataSourceConfiguration(config)
                .withCoder(StringUtf8Coder.of())
                .withQuery("SELECT DISTINCT column_name, data_type \n" +
                        "FROM information_schema.columns\n" +
                        "WHERE table_name = " + "'" + tableName + "'")
                .withRowMapper((JdbcIO.RowMapper<String>) resultSet -> {
            // code here to generate avro schema string
           // this works fine for me

}))

Creating PCollectionView which will hold my json schema for each table.

 PCollectionView<String> s = avroSchema.apply(View.<String>asSingleton());

// I want to access this view as side input in next JdbcIO.Read operation
// something like this ;

pipeline.apply(JdbcIO.<String>read()
        .withDataSourceConfiguration(config)
        .withCoder(StringUtf8Coder.of())
        .withQuery(queryString)
        .withRowMapper(new JdbcIO.RowMapper<String>() {

            @Override
            public String mapRow(ResultSet resultSet) throws Exception {
                // access schema here and use it to parse and create 
               //GenericData.Record from ResultSet fields as per schema

                return null;
            }
        })).

    withSideInputs(My PCollectionView here); // this option is not there right now.

Is there any better way to approach this problem?

Pablo
  • 10,425
  • 1
  • 44
  • 67
Onkar
  • 297
  • 5
  • 9

1 Answers1

1

At this point IOs API do not accept SideInputs.

It should be feasible to add ParDo right after read and do mapping there. That ParDo can accept side inputs.