2

I have spring batch application which reads and writes into the same table. I have used pagination for reading the items from the table as my data volume is quite high. When I set the chunk size as more than 1 then my pagination number is getting updated wrongly and hence failing to read some items from the table. Any idea?

@Bean
      public Step fooStep1() {
        return stepBuilderFactory.get("step1")
            .<foo, foo>chunk(chunkSize)
            .reader(fooTableReader())
            .writer(fooTableWriter())
            .listener(fooStepListener())
            .listener(chunkListener())
            .build();
      }

Reader

 @Bean
 @StepScope
 public ItemReader<foo> fooBatchReader(){
    NonSortingRepositoryItemReader<foo> reader = new NonSortingRepositoryItemReader<>();
    reader.setRepository(service.getRepository());
    reader.setPageSize(chunkSize);
    reader.setMethodName("findAllByStatusCode");
    List<Object> arguments = new ArrayList<>(); 
    reader.setArguments(arguments);
    arguments.add(statusCode);
    return reader;
}
DarkCrow
  • 785
  • 2
  • 8
  • 29

1 Answers1

2

Don't use a pagination reader. The problem is, that this reader executes a new query for every chunk. Therefore, if you add items or change items in the same table during writing, the queries will not produce the same result.

Dive a little bit into the code of the pagination reader, it is clearly obvious in there.

If you modify the same table you are reading from, then you have to ensure that your result set doesn't change during the processing of the whole step, otherwise, your results may not be predictable and very likely not what you wanted.

Try to use a jdbccursoritemreader. This one creates the query at the beginning of your step, and hence, the result set is defined at the beginning and will not change during the processing of the step.

Editet

Based on your code to configure the reader which you added, I assume a couple of things:

  1. this is not a standard springbatch item reader

  2. you are using a method called "findAllByStatusCode". I assume, that this is the status field that gets updated during writing

  3. Your Reader-Class is named "NonSortingRepositoryItemReader", hence, I assume that there is no guaranteed ordering in your result list

If 3 is correct, then this is very likely the problem. If the order of the elements is not guaranteed, then using a paging reader will definitely not work. Every page executes it's own select and then moves to the pointer to the appropriate position in the result.

E.g., if you have a pagesize of 5, the first call will return elements 1-5 of its select, the second call will return elements 6-10 of its select. But since the order is not guaranteed, element at position 1 in the first call could be at position 6 in the second call and therefore be processed 2, whilst element 6 in the first call, could be at position 2 in the second call and therefore never been processed.

Hansjoerg Wingeier
  • 4,274
  • 4
  • 17
  • 25
  • You're assuming that the new data meets the requirement of the query. That may not be the case. In fact, you may want to use the `JdbcPagingItemReader` if you are attempting to partition the work. A process indicator field is a better option for addressing the use case you're describing than changing readers IMHO. – Michael Minella Oct 11 '16 at 18:51
  • Yes, that is my assumption based on what @Deepak wrote. Generally, I would also prefer to have an additional state field. However, this may not always be possible. What I also understand, especially from your answer here http://stackoverflow.com/questions/20386642/spring-batch-which-itemreader-implementation-to-use-for-high-volume-low-laten, is that the PagingItemReader is threadsafe whilst the CursorItemReader is not and this is an important point to consider. Especially when using a partitioning approach or doing parallel chunk processing. – Hansjoerg Wingeier Oct 12 '16 at 06:00
  • I am not sure if I can use JdbcCursorItemReader as I am using JAP repository and I need to set the method name and repositoy in my reader. JdbcCursorItemReader doesn't have methods to set the method name and repositories. hence I am ising ItemReader. Please see the reader code updated above. – DarkCrow Oct 12 '16 at 10:49
  • I can see thats the problem clearly. What is the solution you are proposing? – DarkCrow Oct 12 '16 at 12:40
  • option 1: use a jdbcCursorItemReader and access the table directly. option 2: use a HibernateCursorItemReader and define the query in HibernateQueryLanguage. option 3: use a JpaPagingItemReader and define the query in JPQL. option 4: implement your own SortingRepositoryItemReader and add a method to your Repository that returns the entries in specific order ("findAllByStatusCodeSorted"). – Hansjoerg Wingeier Oct 12 '16 at 12:48
  • Option 1 and 2 are ruled out as I am using JPA and I need to map the JDBC resultset to my JPA entity object. Thats reinventing the whole code. For option 3 will the JpaPagingItemReader will map to JPA entity object directly? – DarkCrow Oct 12 '16 at 12:52
  • I am looking for some option to use my JPA repository method and maps to my JPA entity object. Is that possible? – DarkCrow Oct 12 '16 at 12:57
  • Yes, it will if you provide the correct queryString or an appropriate QueryProvider. – Hansjoerg Wingeier Oct 12 '16 at 12:58
  • Thats the problem. I dont want to write the queries in my reader. I want to use my JPA repository methods in my reader. – DarkCrow Oct 12 '16 at 12:59
  • You could add a method to your repository which creates the querystring or the queryProvider. That is probably the most simple thing to do. This way, you would have your query definitions in one place. – Hansjoerg Wingeier Oct 12 '16 at 13:06
  • Thanks for that. Last question...Will RepositoryItemReader help in my case? – DarkCrow Oct 12 '16 at 13:30
  • I don't have any experience with it. When I look at the java api, it seems that this is what you are looking for. But your repository will need to implement the interface PagingAndSortingRepository from spring data. – Hansjoerg Wingeier Oct 12 '16 at 14:02