My attempt uses some libraries:
- OpenCSV
- apache.commons.collections4 (MultiValuedMap & ArrayListValuedHashMap)
To quickly learn how to use OpenCSV I recommend reading the official documentation. Took me a day to read half of it and was enough for me to know how to read from a file:
https://opencsv.sourceforge.net/index.html#developer_documentation
collections4 is to support OpenCSV operations.
First step is to read csv file using OpenCSV. In my attempt I use annotations to quickly read the file into a class object.
@CsvBindByName(column="Country/Region",required=true)
private String country;
@CsvBindAndJoinByName(column="[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,4}", elementType = String.class, mapType = ArrayListValuedHashMap.class)
private MultiValuedMap<String,String> casesByDate;
Firstly, because OpenCSV does not read the file in sorted order (from my knowledge autosort function is not available for MultiValuedMap), the Date Columns and Country rows will not be sorted.
My solution is to create a new variable which stores the sorted data:
private TreeMap<LocalDate, Integer> sortedCasesByDate = new TreeMap<>();
Below is the method used to populate sortedcasesBydate:
public CasesByCountry addToSortedCasesByDate(MultiValuedMap<String,String> map) {
DateTimeFormatter dateFormat = DateTimeFormatter.ofPattern("M/d/yy");
for(String key:map.keys()){
sortedCasesByDate.put(LocalDate.parse(key,dateFormat),Integer.valueOf(map.get(key).toString().replaceAll("[\\[\\]]","")));
}
return this;
}
Full code of class file (annnotated for rows/Beans of Countries):
imports ...
public class CasesByCountry {
@CsvBindByName(column="Country/Region",required=true)
private String country;
@CsvBindAndJoinByName(column="[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,4}", elementType = String.class, mapType = ArrayListValuedHashMap.class)
private MultiValuedMap<String,String> casesByDate;
private TreeMap<LocalDate, Integer> sortedCasesByDate = new TreeMap<>();
public CasesByCountry(){}
public String getCountry() {
return country;
}
public MultiValuedMap<String, String> getCasesByDate() {
return casesByDate;
}
public TreeMap<LocalDate, Integer> getSortedCasesByDate() {
return sortedCasesByDate;
}
public CasesByCountry addToSortedCasesByDate(MultiValuedMap<String,String> map) {
DateTimeFormatter dateFormat = DateTimeFormatter.ofPattern("M/d/yy");
for(String key:map.keys()){
sortedCasesByDate.put(LocalDate.parse(key,dateFormat),Integer.valueOf(map.get(key).toString().replaceAll("[\\[\\]]","")));
}
return this;
}
//merges sortedCasesByDate for each CaseOfCountry.
//Used in reduce() by Reader to merge sortedCasesByDate of 2 provinces.
public BinaryOperator<CasesByCountry> setSortedCasesByDate = (country1,country2) ->{
country1.getSortedCasesByDate()
.forEach(
(date, numOfCases) ->
country1.getSortedCasesByDate()
.put(
date,
numOfCases + country2.getSortedCasesByDate().get(date)
)
);
return country1;
};
}
Once annotated class is completed, read file using code shared in OpenCSV docmentation. Also add processInput() to process the data later:
public static Function<String, List<CasesByCountry>> readFile = (path) -> {
try {
List<CasesByCountry>l = new CsvToBeanBuilder(new FileReader(path))
.withType(CasesByCountry.class)
.build()
.parse();
l = processInput.apply(l);
l.forEach(System.out::println);
return l;
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
}
};
In processInput(), Date sorting is performed. Then duplicates of countries are removed using reduce.
Stack Overflow Questions I referred to to get this answer:
Java 8 stream sum entries for duplicate keys
Apply reduction only if certain condition is met
The problem with reduce is it cannot accept a condition. For example, it cannot perform the following:
if(country1.getName().equals(country2.getName()){
//reduce()
}else{
//go to next.
}
therefore, .groupingBy is used to create a map of Lists (Map<String,List<CaseByCountry>>). Each list has items of country duplicates. Then reduce is performed on each individual Lists<CaseByCountry> and joined together again:
/**
* @.map: sort cases by ascending date.
* @.groupingBy: split into lists of countries to identify duplicates.
* @.reduce: reduce CasesByCountry by merging sortedCasesByDates TreeMaps.*/
public static UnaryOperator<List<CasesByCountry>> processInput = casesByCountryList -> {
BinaryOperator<TreeMap<LocalDate, Integer>> mergeMaps = (Old, New) -> {
Old.forEach((date, numOfCases) -> Old.put(date, numOfCases + New.get(date)));
return Old;
};
List<CasesByCountry> toR = new ArrayList<>();
casesByCountryList.stream().map(
casesByCountry ->
casesByCountry.addToSortedCasesByDate(
casesByCountry.getCasesByDate()
)
).collect(
Collectors.groupingBy(CasesByCountry::getCountry)
).forEach(
(country, casesByCountry) ->
toR.add(casesByCountry.stream().reduce(
null,
(country1, country2) ->
country1!=null
? country1.setSortedCasesByDate.apply(country1, country2)
:country2
))
);
//.sort to sort by countries.
toR.sort(Comparator.comparing(CasesByCountry::getCountry));
return toR;
};
Full code of Reader class:
imports...
public class Reader{
private static List<CasesByCountry> confirmedCases;
public Reader(){
//CaseType.CONFIRMED.getPath() is just an enum to store the file path.
confirmedCases = readFile.apply(CaseType.CONFIRMED.getPath());
}
/**
* @.map: sort cases by ascending date.
* @.groupingBy: split into lists of countries to identify duplicates.
* @.reduce: reduce CasesByCountry by merging sortedCasesByDates TreeMaps.*/
public static UnaryOperator<List<CasesByCountry>> processInput = casesByCountryList -> {
BinaryOperator<TreeMap<LocalDate, Integer>> mergeMaps = (Old, New) -> {
Old.forEach((date, numOfCases) -> Old.put(date, numOfCases + New.get(date)));
return Old;
};
List<CasesByCountry> toR = new ArrayList<>();
casesByCountryList.stream().map(
casesByCountry ->
casesByCountry.addToSortedCasesByDate(
casesByCountry.getCasesByDate()
)
).collect(
Collectors.groupingBy(CasesByCountry::getCountry)
).forEach(
(country, casesByCountry) ->
toR.add(casesByCountry.stream().reduce(
null,
(country1, country2) ->
country1!=null
? country1.setSortedCasesByDate.apply(country1, country2)
:country2
))
};
public static Function<String, List<CasesByCountry>> readFile = (path) -> {
try {
List<CasesByCountry>l = new CsvToBeanBuilder(new FileReader(path))
.withType(CasesByCountry.class)
.build()
.parse();
l = processInput.apply(l);
l.forEach(System.out::println);
return l;
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
}
};
public List<CasesByCountry> getConfirmedCases() {
return confirmedCases;
}
}