1

Context: Microservice that has exposed REST API and handles input requests in JSON format sequentially. Handled request is deserialized, validated and put into Kafka topic for futher processing.


Given example of class to be validated:

// autogenerated from openapi schema using custom templates
public class DataPayload {

  @NotNull
  @Size(min=1, max=100)
  private String description;

  @Valid
  @Size(max=1024)
  private List<DataLine> dataLines; 

  // getters, setters, etc.

  public static class DataLine {
    // lots of fields to be validated..
  }
}

We run validation using jsr303/jsr380 Bean Validation:

public static void main(String[] args) {
  var validator = Validation.buildDefaultValidatorFactory().getValidator();
  var violations = validator.validate(getDataPayload());
}

Does anybody have an idea how validation of List<DataLine> dataLines could be parallelized with minimal efforts?


Several (obvious) options I have so far:

  1. Manually run in parallel validator.validate(dataLine) from the list along with validation DataPayload without dataLines validator.validate(withoutDataLines(dataPayload)).
  2. Similar to 1st option but with some tricks around Validation Groups.
  3. (Not sure if it is possible). Custom ConstraintValidator that runs validation for container objects in parallel. Open question - how to delegate nested/cascaded validation to default mechanism?

Despite options are viable I am wondering is there more smart and elegant way to solve this problem..

etric
  • 83
  • 6
  • I kind of feel like the custom validator is your way to go about this. But I don't really see how making validations in parallel will help. 1024 items is not that much. – M. Prokhorov Feb 19 '20 at 15:17
  • We have already tried 1st option and it works **~2.5 times faster** in our case. We didn't make very deep investigation but it is probably because of deep cascading. – etric Feb 19 '20 at 16:07
  • Did you try the 1st option in production code under similar load? Because while it may run faster for single input request, it also saturates JVM with threads, which makes concurrent processes receive less resources. – M. Prokhorov Feb 19 '20 at 16:09
  • Yes, we have tried on non-prod env. And it showed higher throughput as expected. We have some constraints and as for now the **processing of input REST API requests is sequential**, thus there is a single active request to be processed at the time. – etric Feb 24 '20 at 16:27

0 Answers0