2

I am using Spring Batch to read data from Postgres and write it into the MongoDB. In my case, Employee has 3 different types of email address 1) Home Address 2) Office Address 3) Social Address from Employee Email Table.

Since we've almost 10 lacs employees in DB, hence using Custom Partitions to pull the data from Postgres and making the join with employee_email (later employee_phone as well) so that in Processor will have a mapping for Mongo POJO and save into MongoDB.

Now the issue is I need to embed Employee email record into the Contact as a array, but with the current logic its saving as a separate collection

How can we solved this issue ?

select * from root.employees c
full outer join root.employee_email ce
on c.employee_id = ce.employee_id
order by c.employee_id limit 1000 offset 0;

Now when data saved into DB, only email is getting saved and appears that other 2 are getting override.

How would I need to handle that, looks like EmployeeRowMapper is overriding all other email address. How will I solve this issue ?

Employee.Job

@Configuration
public class EmployeeJob {
    private static Logger logger = LoggerFactory.getLogger(EmployeeJob.class);

    private static final Integer CHUNK_SIZE = 1000;

    @Autowired
    private DataSource dataSource;

    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Bean
    public EmployeesPartitions EmployeesPartition() {
        return new EmployeesPartitions();
    }

    @Bean
    public EmployeesJobListener EmployeesJobListener() {
        return new EmployeesJobListener();
    }

    @Bean("readEmployeeJob")
    public Job readEmployeeJob() throws Exception {
        return jobBuilderFactory.get("readEmployeeJob")
                .incrementer(new RunIdIncrementer())
                .start(EmployeeStepOne())
                .listener(EmployeesJobListener())
                .build();
    }

    @Bean
    public SimpleAsyncTaskExecutor simpleAsyncTaskExecutor() {
        SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor("fac-thrd-");
        taskExecutor.setConcurrencyLimit(Runtime.getRuntime().availableProcessors());
        taskExecutor.setThreadGroupName("Employees-Thread");
        taskExecutor.setDaemon(false);
        taskExecutor.setThreadPriority(5);
        return taskExecutor;
    }

    @Bean
    public Step EmployeeStepOne() throws Exception {            
        return stepBuilderFactory.get("EmployeeStepOne")
                .partitioner(slaveStep().getName(), EmployeesPartition())
                .step(slaveStep())
                .gridSize(10)
                .taskExecutor(simpleAsyncTaskExecutor())
                .build();
    }

    // slave step
    @Bean
    public Step slaveStep() throws Exception {
        return stepBuilderFactory.get("slaveStep")
                .<EmployeesDTO, EmployeesDTO>chunk(CHUNK_SIZE)
                .reader(EmployeeReader(null, null))
                .writer(EmployeeWriter())
                .build();
    }


    // Readers
    @Bean(destroyMethod = "")
    @StepScope
    public JdbcCursorItemReader<EmployeesDTO> EmployeeReader(
            @Value("#{stepExecutionContext['limit']}") Long limit,
            @Value("#{stepExecutionContext['offset']}") Long offset) throws Exception {

        String sql = "select * from root.Employees c "
                + "full outer join root.Employee_email ce "
                + "on c.Employee_id = ce.Employee_id "
                + "order by c.Employee_id limit " + limit +" offset "+ offset;
        logger.info("Employees SQL = {} ", sql);
        JdbcCursorItemReader<EmployeesDTO> reader = new JdbcCursorItemReader<>();
        reader.setDataSource(this.dataSource);
        reader.setSql(sql);
        reader.setRowMapper(new EmployeeRowMapper());
        reader.afterPropertiesSet();
        return reader;
    }

    // Processors
    @Bean
    public ItemProcessor<EmployeesDTO, EmployeesDTO> EmployeeProcessor() {
        return new EmployeesProcessor();
    }

    // Writers
    @Bean
    public ItemWriter<EmployeesDTO> EmployeeWriter() {
        return new EmployeeWriter();
    }
}

RowMapper.java

public class EmployeeRowMapper implements RowMapper<Employee> {

    @Override
    public Employee mapRow(ResultSet rs, int rowNum) throws SQLException {
        // EmployeeEmail email = new EmployeeEmail();
        ....
        ....
        ....
        ....
        ....
        List<EmployeeEmail> employeeEmails = new ArrayList<>();
        employeeEmails.add(email);

        Employee dto = Employee.builder()
                .businessTitle(rs.getString(""))
                ...........
                ...........
                ...........
                ...........
                ...........
                ...........
                ...........
                .employeeEmails(employeeEmails)
                .build();

        return dto;
    }
}
  • You mapper sets only one email. Does your query return all emails? What is the structure of a database record returned by your jdbc item reader? The `RowMapper` is intended to map a database row to a pojo, so it is expected that the result set contains all emails (in a flat structure, like a column for each email). Is this the case for you? – Mahmoud Ben Hassine May 18 '20 at 11:06
  • Right, mapper will only return single object. I've SQL join between Employee & email table which yields 3 email records for single employee so somehow its only setting one email to 1 employee in mapper, how would I convert 3 Employee object into Single and embed email object into Employee –  May 18 '20 at 13:50

1 Answers1

1

A RowMapper is used to map a single database row to a POJO. So unless each row contains all emails in different columns, for example id,name,email1,email2,email3, what you are trying to do won't work.

If for each item you have 3 rows for emails, you need to make your query return only id,name and use an additional query to grab emails. This additional query can be done either in the mapper itself or in an item processor as described in the driving query pattern.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Hey what if I load all employee object and saved it into context, then load all emails object and set to employee and again put it into context, then load all departments and set departments to respective employees object and then finally saved the object ? In this approach I loose partitiions? –  May 19 '20 at 08:44
  • No, but it is not recommended to load a lot of data in the execution context as it is persisted between steps. If you want to avoid additional queries, you can try to use a cache, see example here https://stackoverflow.com/a/52644962/5019386. – Mahmoud Ben Hassine May 19 '20 at 09:03
  • Can you share complete example if you've it in your repo? –  May 19 '20 at 09:31
  • Sure will do that once I understood it how to implement it in my case, Thanks –  May 19 '20 at 09:45
  • Can we access this cache at processor and writers ? –  May 19 '20 at 10:10
  • Yes, the cache is defined as a bean. The example shows how to inject it in a tasklet, but you can inject it in any other spring bean (like an item processor or item writer if defined as beans). – Mahmoud Ben Hassine May 19 '20 at 11:01