Different software projects have different coding convention; even in the same project there may be different languages used and will have different convention. What is good for searching documentation (which appear outside the source files), with identifier tokens from the source code?
For example if the source has self._def_passwd, or this.defPasswrd, a query on the documentation tree should strive to match default password.
So far I've been trying to sort by Levenshtein distance, which works nicely for small edit distances, but there are too many false positives when I increase the threshold, which is problematic with white spaces in documentation.
8 0.666667 announcement getContent AnnouncementBean.java(Token.Name.Function )
8 0.666667 announcement getPercent DataObservation.java (Token.Name.Function)
8 0.666667 announcement GroupBean GroupBean.java (Token.Name.Class)
where the first value is the Levenshtein distance, second one the distance divided by the length of the word matched. I'm thinking to
- look into Jaccard, Tanimoto algorithms
- intellisence/suggest kinda code
- Somewhere in SO there were posts on some algorithms that bio guys use for matching sequences
- Come up with regular expressions chain rules based on http://en.wikipedia.org/wiki/Naming_convention_%28programming%29
the last one being literally the last option. Which other algorithms do you think would could give better results for this kinda stuff?