I am using Univocity parser version 2.7.3. I have a CSV file that has 1 Million records and might grow in future. I am reading only a few specific columns from the file and below are my requirements:
DO NOT store the CSV contents into memory at any point
Ignore/skip bean creation if either of latitude or longitude columns in CSV are null/blank
To meet these requirements, I tried implementing CsvRoutines so that the CSV data is not copied over to memory. I am using @Validate annotation on both "Latitude" and "Longitude" fields and have used error handler to not throw back any exception so that the record will be skipped on validation failure.
Sample CSV:
#version:1.0
#timestamp:2017-05-29T23:22:22.320Z
#brand:test report
network_name,location_name,location_category,location_address,location_zipcode,location_phone_number,location_latitude,location_longitude,location_city,location_state_name,location_state_abbreviation,location_country,location_country_code,pricing_type,wep_key
"1 Free WiFi","Test Restaurant","Cafe / Restaurant","Marktplatz 18","1233","+41 263 34 05","1212.15","7.51","Basel","test","BE","India","DE","premium",""
"2 Free WiFi","Test Restaurant","Cafe / Restaurant","Zufikerstrasse 1","1111","+41 631 60 00","11.354","8.12","Bremgarten","test","AG","China","CH","premium",""
"3 Free WiFi","Test Restaurant","Cafe / Restaurant","Chemin de la Fontaine 10","1260","+41 22 361 69","12.34","11.23","Nyon","Vaud","VD","Switzerland","CH","premium",""
"!.oist*~","HoistGroup Office","Office","Chemin de I Etang","CH-1211","","","","test","test","GE","Switzerland","CH","premium",""
"test","tess's Takashiro","Cafe / Restaurant","Test 1-10","870-01","097-55-1808","","","Oita","Oita","OITA","Japan","JP","premium","1234B"
TestDTO.java
@Data
@NoArgsConstructor
@AllArgsConstructor
@JsonIgnoreProperties(ignoreUnknown = true)
public class TestDTO implements Serializable {
@Parsed(field = "location_name")
private String name;
@Parsed(field = "location_address")
private String addressLine1;
@Parsed(field = "location_city")
private String city;
@Parsed(field = "location_state_abbreviation")
private String state;
@Parsed(field = "location_country_code")
private String country;
@Parsed(field = "location_zipcode")
private String postalCode;
@Parsed(field = "location_latitude")
@Validate
private Double latitude;
@Parsed(field = "location_longitude")
@Validate
private Double longitude;
@Parsed(field = "network_name")
private String ssid;
}
Main.java
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.detectFormatAutomatically();
parserSettings.setLineSeparatorDetectionEnabled(true);
parserSettings.setHeaderExtractionEnabled(true);
parserSettings.setSkipEmptyLines(true);
parserSettings.selectFields("network_name", "location_name","location_address", "location_zipcode",
"location_latitude", "location_longitude", "location_city","location_state_abbreviation", "location_country_code");
parserSettings.setProcessorErrorHandler(new RowProcessorErrorHandler() {
@Override
public void handleError(DataProcessingException error, Object[] inputRow, ParsingContext context) {
//do nothing
}
});
CsvRoutines parser = new CsvRoutines(parserSettings);
ResultIterator<TestDTO, ParsingContext> iterator = parser.iterate(TestDTO.class, new FileReader("c:\\users\\...\\test.csv")).iterator();
int i=0;
while(iterator.hasNext()) {
TestDTO dto = iterator.next();
if(dto.getLongitude() == null || dto.getLatitude() == null)
i++;
}
System.out.println("count=="+i);
Problem:
I actually expected the count to be zero since I have added error handler and haven't thrown back the data validation exception but seems thats not the case. I thought @Validate will throw back an exception when it encounters a record with either Latitude or Longitude as null (both the columns may be null in same record as well) which is handled and ignored/skipped at error handler.
Basically I do not want UniVocity to create and map unnecessary DTO objects in heap (and lead to out of memory) since there are chances that the incoming CSV file might have more than 200 or 300k records with either Latitude/Longitude as null.
I even tried adding custom validator in @Validate as well but in vain.
Could someone please let me know what am I missing here?