0

I am trying to read the records from the source based on the count of total max records to be processed which should be given by the user.

Eg: Total Records in the source table is 1 million Total Max records to process are 100K

I need to process those 100k records only from source. I have gone through JDBC IO library classes to check if I have any option to implement it like there is an option to set the batch size, but I have found none.

PS: I want to implement it IO level, Not by adding limit to query

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62

2 Answers2

1

I was able to do it using with setMaxRows by turning off the auto-commit for JDBC IO

  • Can you please share the code snippet how you used setmaxrecords? – Tanveer Uddin Dec 18 '18 at 07:45
  • that makes sense. You said you used setMaxRows which does not exist. In that case, you could have accepted my answer since setFetchSize worked for you as per my suggested answer. – Tanveer Uddin Dec 18 '18 at 10:53
  • oh Sorry, Actually I have tried the setFetchSize but it did not work for me, setMaxRows worked for me! that was mistakenly posted. setMaxRows(100) fetches 100 records. – Poornima Jasti Dec 19 '18 at 06:21
  • you are unwilling to use LIMIT 100 but happy to use setMaxRows(100). Thats strange. Glad that it works for you – Tanveer Uddin Dec 19 '18 at 07:07
  • My requirement is such!! Implementing limit in the query is different from reading from IO. – Poornima Jasti Dec 19 '18 at 08:44
0

you can use the withQuery to specify the query with the number of records to read e.g. .withQuery("select id,name from Person limit 1000"). You can also parameterize the number of records using JdbcIO.StatementPreparator. The example in the doc may help.

EDIT Another option is to use withFetchSize

Tanveer Uddin
  • 1,520
  • 9
  • 15
  • I dont want to implement it with query as i mentioned in the question, as i will get memory issues while running multiple pipelines at once. I want to handle it in IO level @Tanveer Uddin – Poornima Jasti Dec 17 '18 at 10:57
  • consider playing withFetchSize https://beam.apache.org/releases/javadoc/2.6.0/org/apache/beam/sdk/io/jdbc/JdbcIO.Read.html#withFetchSize-int- – Tanveer Uddin Dec 17 '18 at 20:33
  • Yeap, I explored those and setMaxRows worked for me @Tanveer Uddin – Poornima Jasti Dec 18 '18 at 06:08