0

I have a repository of files. The files are in plain English text created by humans. Each file contains few paragraphs describing some incident. Now, since each person is different, two or more incidents can be written in different wordings, having different grammar. Even a same person may tend to write about an incident in different words with different grammar.

How can I find and cluster similar files together?

Viraj Pai
  • 205
  • 3
  • 11

1 Answers1

0

There are various approaches. You can try Clustering text documents using k-means. See also the discussion here.

Community
  • 1
  • 1
Miriam Farber
  • 18,986
  • 14
  • 61
  • 76