I am trying to create a simple multithreaded dictionary/index using a group of Documents which contain words. The dictionary is stored in a ConcurrentHashMap with String keys and Vector values. For each word in the dictionary there is an appearance list which is a vector with a series of Tuple objects (custom object).( Tuple is a combination of 2 numbers in my case).
Each thread takes one document as input, finds all the words in it and tries to update the ConcurrentHashMap. Also, i have to point out that 2 threads may try to update the same key of the Map by adding on its value, a new Tuple. I only do write operations on the Vector.
Below you can see the code for submitting new threads. As you can see i give as input the dictionary which is a ConcurrentHashMap with String keys and Vector values
public void run(Crawler crawler) throws InterruptedException {
while (!crawler.getFinishedPages().isEmpty()) {
this.INDEXING_SERVICE.submit(new IndexingTask(this.dictionary, sources,
crawler.getFinishedPages().take()));
}
this.INDEXING_SERVICE.shutdown();
}
Below you can see the code of and indexing thread :
public class IndexingTask implements Runnable {
private ConcurrentHashMap<String, Vector<Tuple>> dictionary;
private HtmlDocument document;
public IndexingTask(ConcurrentHashMap<String, Vector<Tuple>> dictionary,
ConcurrentHashMap<Integer, String> sources, HtmlDocument document) {
this.dictionary = dictionary;
this.document = document;
sources.putIfAbsent(document.getDocId(), document.getURL());
}
@Override
public void run() {
for (String word : document.getTerms()) {
this.dictionary.computeIfAbsent(word, k -> new Vector<Tuple>())
.add(new Tuple(document.getDocId(), document.getWordFrequency(word)));
}
}
}
The code seems to be correct but the dictionary is not updated properly. I mean some words (keys) are missing from the original dictionary and some other keys have less items in their Vector.
I have done some debugging and i found out that before a thread instance is terminated, it has calculated the correct keys and values. Though the original dictionary which is given in the thread as input (look on the first piece of code) is not updated correctly.Do you have any idea or suggestion?