Find similar files from repository

Question

I have a repository of files. The files are in plain English text created by humans. Each file contains few paragraphs describing some incident. Now, since each person is different, two or more incidents can be written in different wordings, having different grammar. Even a same person may tend to write about an incident in different words with different grammar.

How can I find and cluster similar files together?

score 0 · Answer 1 · edited May 23 '17 at 12:32

0

There are various approaches. You can try Clustering text documents using k-means. See also the discussion here.

edited May 23 '17 at 12:32

Community

1
1

answered Mar 18 '17 at 04:02

Miriam Farber

18,986
14
61
76

Find similar files from repository

1 Answers1