0

I need some help with the below problem statement

Problem is to implement auto look up functionality with below mentioned requirement

Input -> BTech
Output ->
 BTech in cse
 b.tech in computer science
 b tech in computer

Input -> B.Tech
Output ->
 BTech in cse
 b.tech in computer science
 b tech in computer

Input -> Tec
Output ->
 Technological Advance 
 Artificial Technology
 BTech in cse
 b.tech in computer science
 b tech in computer

But using text=b.tech, my results also getting matched with "b.e." tokens and this is coming on top. Results should be having btech to be on top. (I have used Ngrams, word_delimiter etc.)

I am not showing my query intenionally here, because it has got much complex which will create confusion. Would appreciate if someone writes from the scratch with a fresh mind.

Can anyone please help me out with the desired query? :|

pankaj
  • 1,643
  • 4
  • 22
  • 35
  • You can achieve this by creating a custom synonym analyzer. [This answer](https://stackoverflow.com/questions/55481799/elasticsearch-custom-file-based-analyzer/55486950#55486950) might help you in getting started. – Nishant May 20 '19 at 03:42
  • Thanks Nishant for the reply! Actually here, data is dynamic i.e. why I can not define synonyms in a file. What I am looking for to ignore special characters like "." in query text and indexed data both. – pankaj May 20 '19 at 04:38
  • But as per the expected output its not just about ignoring `.`. What about `b tech`? This can't match `BTech` unless you use synonyms. – Nishant May 20 '19 at 04:48
  • Yes. Right. But we can ignore "spaces" too as "." . All non alphanumeric characters should be ignored. – pankaj May 20 '19 at 04:52
  • Then a string like `BTech in cse` will result into `BTechincse` or `btechincse` which won't be acceptable. In this can n-gram might work but n-gram will result into generating a lot of terms which will have impact on disk space. – Nishant May 20 '19 at 04:56
  • Disk Space is fine for me. Actually I tried Ngram too, but could not combine ignoring alphanumeric chars and Ngram in one single configuration. After applying Ngram "B.Tech" is getting matched with "B.E" and "Btech in cse" both but "B.E" is coming on the top of the results instead of "Btech in cse" – pankaj May 20 '19 at 04:59
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/193611/discussion-between-pankaj-and-nishant-saini). – pankaj May 20 '19 at 05:37

0 Answers0