3

I have set of documents. I want to know the frequency count of each word in each document (i.e) term frequency using java program. thanks in advance. I know how to find the frequency count for each word. My question is about how to take the unique words in each document from the list of documents

Karthi
  • 213
  • 2
  • 5
  • 16

2 Answers2

2

You can split your documents on spaces and punctuation, go through the resulting array and then count frequency for each word (a Map<String, Integer> would really help you with this).


Resources :

On the same topic :

Community
  • 1
  • 1
Colin Hebert
  • 91,525
  • 15
  • 160
  • 151
1

If it's more than a one time problem to solve, you should consider using Lucene to index your documents. Then this post would help you answer your question.

Community
  • 1
  • 1
Damien
  • 2,254
  • 1
  • 22
  • 30