I am using solr 6 and my requirement is to find documents which have 5 consecutive words (seperated by space) duplicated in them.
So to achieve this I am planning to index the contents in the inverval of 5 words for example if my content is "The quick brown fox jumps over the lazy dog", it should index as "The quick brown fox jumps", "quick brown fox jumps over", "brown fox jumps over the".
To configure tokenizer, I referred this wiki but didn't found any provided tokenizer that can solve this problem. So I am searching a way to create new tokenizer class or any other way by using provided tokenizer that could solve my problem. It would be appreciable if one could help me to solve this.