As you could see here http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#faultTolerant
When a chunk is rolled back, items that have been cached during reading may be reprocessed. If a step is configured to be fault tolerant (uses skip or retry processing typically), any ItemProcessor used should be implemented in a way that is idempotent
This means that in Michael's example, the first time a user is Processed the user is cached in the Set and if there is a failure Writing the item, if the step is fault tolerance the Processor will be executed again for the same User and this Filter will filter out the user.
Improved code:
/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users. Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {
// This assumes that User.equals() identifies the duplicates
private Set<User> seenUsers = new HashSet<User>();
public User process(User user) {
if(seenUsers.contains(user) && !user.hasBeenProcessed()) {
return null;
} else {
seenUsers.add(user);
user.setProcessed(true);
return user;
}
}
}