My current project has us using TreeSet and TreeMap in Java, with an input array of 10514 Song elements read in from a text file. Each Song contains a Artist, Title and Lyric fields. The aim of this project is to conduct fast searches on the lyrics using sets and maps.
First, I iterate over the input Song array, accessing the lyrics field and creating a Scanner object to iterate over the lyric words using this code: commonWords
is a TreeSet of words that should not be keys, and lyricWords
is the overall map of words to Songs.
public void buildSongMap() {
for (Song song:songs) {
//method variables
String currentLyrics= song.getLyrics().toLowerCase();
TreeSet<Song> addToSet=null;
Scanner readIn= new Scanner(currentLyrics);
String word= readIn.next();
while (readIn.hasNext()) {
if (!commonWords.contains(word) && !word.equals("") && word.length()>1) {
if (lyricWords.containsKey(word)) {
addToSet= lyricWords.get(word);
addToSet.add(song);
word=readIn.next();
} else
buildSongSet(word);
} else
word= readIn.next();
}
}
In order to build the songSet, I use this code:
public void buildSongSet(String word) {
TreeSet<Song> songSet= new TreeSet<Song>();
for (Song song:songs) {
//adds song to set
if (song.getLyrics().contains(word)) {
songSet.add(song);
}
}
lyricWords.put(word, songSet);
System.out.println("Word added "+word);
}
Now, since buildSongSet is called from inside a loop, creating the map executes in N^2 time. When the input array is 4 songs, searches run very fast, but when using the full array of 10514 elements, it can take over 15+ min to build the map on a 2.4GHz machine with 6 GiB RAM. What can I do to make this code more efficient? Unfortunately, reducing the input data is not an option.