I have a very large file (10^8 lines) with counts of events as follows,
A 10
B 11
C 23
A 11
I need to accumulate the counts for each event, so that my map contains
A 21
B 11
C 23
My current approach:
Read the lines, maintain a map, and update the counts in the map as follows
updateCount(Map<String, Long> countMap, String key,
Long c) {
if (countMap.containsKey(key)) {
Long val = countMap.get(key);
countMap.put(key, val + c);
} else {
countMap.put(key, c);
}
}
Currently this is the slowest part of the code, (takes around 25 ms). Note that the map is based on MapDB, but I doubt that updates are slow due to that (are they?)
This is the mapdb configs for the map,
DBMaker.newFileDB(dbFile).freeSpaceReclaimQ(3)
.mmapFileEnablePartial()
.transactionDisable()
.cacheLRUEnable()
.closeOnJvmShutdown();
Are there ways to speed this up?
EDIT:
The number of unique keys is of the order of the pages in wikipedia. The data is actually page traffic data from here.