0

I have a csv with [id,info] and I have access to a service which returns me a list of ids(say ActiveIds). I would like to read from the csv only those ids that are present in ActiveIds and would like to access info from the first csv for the info of these selected ids. what are multiple optimised approach for these?

Chaithra Rai
  • 37
  • 1
  • 5

1 Answers1

0

First create the following beans for reading, processing and writing the data to a new csv file

@Bean
public FlatFileItemReader<MyObject> reader() {
    FlatFileItemReader<MyObject> reader = new FlatFileItemReader<>();
    reader.setResource(new ClassPathResource("data.csv"));
    reader.setLineMapper(new DefaultLineMapper<MyObject>() {{
        setLineTokenizer(new DelimitedLineTokenizer() {{
            setNames("id", "info");
        }});
        setFieldSetMapper(new BeanWrapperFieldSetMapper<MyObject>() {{
            setTargetType(MyObject.class);
        }});
    }});
    return reader;
}

@Bean
public ItemProcessor<MyObject, MyObject> processor(List<String> activeIds) {
    return item -> {
        if (activeIds.contains(item.getId())) {
            return item;
        } else {
            return null;
        }
    };
}

@Bean
public FlatFileItemWriter<MyObject> writer() {
    FlatFileItemWriter<MyObject> writer = new FlatFileItemWriter<>();
    writer.setResource(new FileSystemResource("filtered_data.csv"));
    writer.setLineAggregator(new DelimitedLineAggregator<MyObject>() {{
        setDelimiter(",");
        setFieldExtractor(new BeanWrapperFieldExtractor<MyObject>() {{
            setNames(new String[]{"id", "info"});
        }});
    }});
    return writer;
}

@Bean
public Job filterJob(JobBuilderFactory jobs, StepBuilderFactory steps,
                     FlatFileItemReader<MyObject> reader, ItemProcessor<MyObject, MyObject> processor,
                     FlatFileItemWriter<MyObject> writer) {
    Step step = steps.get("filterStep")
            .<MyObject, MyObject>chunk(10)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .build();

    return jobs.get("filterJob")
            .incrementer(new RunIdIncrementer())
            .flow(step)
            .end()
            .build();
}

Now create runJob method as below

@Autowired
private JobLauncher jobLauncher;

@Autowired
private Job filterJob;

public void runJob(List<String> activeIds) {
    JobParameters jobParameters = new JobParametersBuilder()
            .addString("ids", String.join(",", activeIds))
            .toJobParameters();
    jobLauncher.run(filterJob, jobParameters);
}

Now, this runJob method can read the data from the CSV file, filter out any data that has an id not present in the list of active ids, and write the filtered data to a new CSV file as below if you pass activeIds

runJob(activeIds);
  • The read, process and write would be executed each time for every 10 chunks right? is there any way I would be able to do it in such a manner where create a tasklet and probably fetch list of ids and use that as the source for read? when i did thing of this approach I am not able to find a way to access info for this. I am not sure if I am making much sense do let me know if this made sense. basically I had implemented this approach but i would like to optimise it in a better way. – Chaithra Rai Apr 05 '23 at 12:57