First of all, as your mkdocs.yml
file does not specify a theme, it is assumed that you are using the default theme, which uses the default search implementation. Note that some other themes (especially material
) implement their own search solution which is different than the default. This answer does not apply to those themes.
The search tokenizer setting is being ignored because you are defining it incorrectly. As documented, the setting is named separator
not tokenizer
and it needs to be defined as a sub-section of the search
plugin. Like this:
plugins:
- search:
separator: '[\s\-\.]+'
Regarding the search terms, note that MkDocs uses [lunr.js] as its search engine. Lunr.js documents how the end user can modify the search in various ways.
By the way, your search for auto-filling
will not match as you expect because the hyphen (-
) is a separator character. In other words, when the search index is created, the hyphen is treated the same as a space and the words auto
and filling
are indexed as two separate words. If you don't want that behavior, you need to remove the hyphen from your setting. But that is probably not what you want.
The default is to use an OR search. If any one of the terms (each term being separated by any one of the separator characters) exists within a document, then that document is returned as a search result. If multiple terms exist within a document, then that document is ranked higher. However, an OR search does not consider the terms in relation to each other within the document.
You might find an AND search to be more effective. Simply prepend an +
to each term (+do +not +select +auto +filling
) and then you will only get results which contain all of the terms. Notice that I also left the hyphen out of the search terms as it is a separator as explained above.
However, while that will only return results which contain all of the terms, it does not favor results which contain the terms grouped together in that specific order. A common solution which search engines employ is to require terms enclosed in quotes to match the specific order. However, as per livernn/lunr.js#62, lunr.js does not support that feature at this time.
Additionally, the search engine ignores stop words. Specifically, some words are so common that they are ignored completely by the search engine. For example, words like the
or a
occur multiple times in every English language document. Therefore, the search engine ignores them.
Then there is the issue of stemming, which is explained in lunr.js' documentation:
Stemming is the process of reducing inflected or derived words to
their base or stem form. For example, the stem of “searching”,
“searched” and “searchable” should be “search”. This has two benefits:
firstly the number of tokens in the search index, and therefore its
size, is significantly reduced, and in addition, it increases the
recall when performing a search. A document containing the word
“searching” is likely to be relevant to a query for “search”.
Given the above, you will probably find that the search for select auto fill
will most likely return the exact same results as do not select auto-filling
. However, using +filling
should help as it forces an exact match for the term filling
rather than the stem word fill
.
Finally, you ask...
How to implement a good search system
Note that such a question is too broad and off-topic here. However, the lunr.js documentation linked to above provides a nice summary of many of the basic concepts used by most search engines. While you would likely make some different choices in your implementation (as would I), the basic concepts should give you a starting point for terms to search in your research if you really are interested in creating an entire search engine of your own.