0

i want to count word frequency from multiple files.

Moreover, i have these words in these files

a1.txt = {aaa, aaa, aaa} 
a2.txt = {aaa} 
a3.txt = {aaa, bbb} 

so, the results must be aaa = 3, bbb = 1.

Then, i have define the above data structures,

LinkedHashMap<String, Integer> wordCount = new LinkedHashMap<String, Integer>();
Map<String, LinkedHashMap<String, Integer>>
fileToWordCount = new HashMap<String,LinkedHashMap<String, Integer>>();

and then, i read the words from files and put them in wordCount and fileToWordCount:

/*lineWords[i] is a word from a line in the file*/
if(wordCount.containsKey(lineWords[i])){
   System.out.println("1111111::"+lineWords[i]);
   wordCount.put(lineWords[i], wordCount.
   get(lineWords[i]).intValue()+1);
   }else{
   System.out.println("222222::"+lineWords[i]);
   wordCount.put(lineWords[i], 1);
}
fileToWordCount.put(filename, wordCount); //here we map filename
and occurences        of       words

and finally, i print the fileToWordCount with the above code,

Collection a;
Set filenameset;

        filenameset = fileToWordCount.keySet();    
        a = fileToWordCount.values();          
        for(Object filenameFromMap: filenameset){
                   System.out.println("FILENAMEFROMAP::"+filenameFromMap);                                 
                System.out.println("VALUES::"+a);                                                
        }

and prints,

FILENAMEFROMAP::a3.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]
FILENAMEFROMAP::a1.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]
FILENAMEFROMAP::a2.txt
VALUES::[{aaa=5, bbb=1}, {aaa=5, bbb=1}, {aaa=5, bbb=1}]

So, how i can use the map fileToWordCount to find word frequency in the files?

Fahim Parkar
  • 30,974
  • 45
  • 160
  • 276
chkontog
  • 31
  • 2
  • 3
    Why not just hold a `Map>` to map a word to a set of files it appears in? – Itay Karo Nov 25 '12 at 09:16
  • 1
    @Itay.. And why not just post it as answer? It seems to be a valid answer. :) – Rohit Jain Nov 25 '12 at 09:19
  • @Rohit - because the question was how to use `fileToWordCount` and my answer doesn't use `fileToWordCount` :) – Itay Karo Nov 25 '12 at 09:21
  • I think Map> is more useful with the implementation that i have. – chkontog Nov 25 '12 at 09:29
  • @chktong - Your current code is very close to being correct. You have a problem with ***scope***, and with the way you are printing. Correct those two problems and you are ok with what you are doing now. – Perception Nov 25 '12 at 09:30

2 Answers2

1

You're making it harder than necessary. Here's how I would do it:

Map<String, Counter> wordCounts = new HashMap<String, Counter>();
for (File file : files) {
    Set<String> wordsInFile = new HashSet<String>(); // to avoid counting the same word in the same file twice
    for (String word : readWordsFromFile(file)) {
        if (!wordsInFile.contains(word)) {
            wordsInFile.add(word);
            Counter counter = wordCounts.get(word);
            if (counter == null) {
                counter = new Counter();
                wordCounts.put(word, counter);
            }
            counter.increment();
        }
    }
}
JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • At counter.incremenet();, it show me an error because increment() method doesn't exist – chkontog Nov 25 '12 at 13:48
  • Counter would be a class of your own, wrapping an int value that can be incremented. Sorry if that wasn't clear. You can use an Integer instead, but since it's immutable, you need to replace it in the map each time you want to increment it. – JB Nizet Nov 25 '12 at 13:50
0

If I may suggest another approach :)

use a Map<String, Set<String>> map.

foreach file f in files
  foreach word w in f
    if w in map.keys()
      map[w].add(f)
    else
      initialize map w to be a set with the only element file
Itay Karo
  • 17,924
  • 4
  • 40
  • 58