I'm writing a search tool for searching code but I'm having a hard time finding the right analyzer to use. I've tried doing a whitespace analyzer but you end up with issues where you might have dbo.My_Procedure
and searching "my_procedure" should work as well as searching ".My_Procedure". My idea is to split on special characters but store them into their own tokens as well. But then if you write my_procedure as a search it will just look for my, _ and procedure anywhere in the file unless you wrap it in quotes (even though to the user it looks like it's just one word). What approach have people taken for analyzing code?
Asked
Active
Viewed 57 times
1

Nived
- 1,804
- 1
- 15
- 29
1 Answers
0
If your code is in Java, according to Java naming conventions your methods and classes should be camel-case so you should not run into names like my_search
but rather mySearch
.
If that is the case - you can use the (default) standard analyzer which uses word boundaries as delimiters for split.
That said, if there's no way around it and you have to consider names like my_search
in the tokenizing part, you can implement your own custom analyzer.
This answer shows an example of setting a custom-analyzer.

Community
- 1
- 1

Nir Alfasi
- 53,191
- 11
- 86
- 129