0

I need to load a file into my database, but before that I have to verify data is present in the database based on some file data. For instance, suppose I have 5 records in a file then I have to check 5 times in the database for separate records.

So how can I get this value dynamically? We have to pass dynamic value instead of 2 in line (preparedStatement.setString(1, "2");)

Here we are creating a Dataflow pipeline which loads data into the database using Apache Beam. Now we create a pipeline object and create a pipeline. Using a PCollection we are storing into database.

Pipeline p = Pipeline.create(options);
p.apply("Reading Text", TextIO.read().from(options.getInputFile()))
    .apply(ParDo.of(new FilterHeaderFn(csvHeader)))
    .apply(ParDo.of(new GetRatePlanID()))
    .apply("Format Result", MapElements.into(
        TypeDescriptors.strings()).via(
        (KV < String, Integer > ABC) - >
        ABC.getKey() + "," + ABC.getValue()))
    .apply("Write File", TextIO.write()
        .to(options.getOutputFile())
        .withoutSharding());

// Retrieving data from database
PCollection < String > data =
    p.apply(JdbcIO. < String > read()
        .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
                "com.mysql.cj.jdbc.Driver", "jdbc:mysql://localhost:3306/XYZ")
            .withUsername("root")
            .withPassword("root1234"))

        .withQuery("select * from xyz where z = ?")
        .withCoder(StringUtf8Coder.of())
        .withStatementPreparator(new JdbcIO.StatementPreparator() {
            private static final long serialVersionUID = 1 L;
            @Override
            public void setParameters(PreparedStatement preparedStatement) throws Exception {
                preparedStatement.setString(1, "2");

            }
        })
        .withRowMapper(new JdbcIO.RowMapper < String > () {
            private static final long serialVersionUID = 1 L;
            public String mapRow(ResultSet resultSet) throws Exception {
                return "Symbol: " + resultSet.getInt(1) + "\nPrice: " + resultSet.getString(2) +

                    "\nCompany: " + resultSet.getInt(3);
            }
        }));
Cubez
  • 878
  • 5
  • 11
  • Shouldn't be more efficient if you load the full file in your database in a temporary table and then perform a query to merge only the field that you want need? And then delete the temporary table. – guillaume blaquiere Mar 12 '21 at 13:33
  • Do we have any alternate option for same. As we have restriction to create table on production. How we can mange it into code? do we have any code snippet for same. – Gaurav Khandelwal Mar 12 '21 at 14:55

1 Answers1

0

As suggested, the most efficient would probably be loading the whole file into a temporary table and then doing a query to update the requisite rows.

If that can't be done, you could instead read the table into Dataflow (i.e. "select * from xyz") and then do a join/CoGroupByKey to match records with those found in your file. If you expect the existing database to be very large compared to the files you're hoping to upload into it, you could have a DoFn that makes queries to your database directly using JDBC (possibly caching the connection in the DoFn's setUp method) rather than using JdbcIO.

robertwb
  • 4,891
  • 18
  • 21