0

I have a web app that searches through a few databases, some of the data saved is in uppercase and some a mix of upper and lowercase, however when searching the keyword I want it to ignore the case and just bring up results that match the word. for example I want to search "document_reference" without having to write the proper way it is saved which is "Document_Reference"

I was told to add case insensitivity in my index hwoever not sure what to do or add in there, I tried this (found it in whoosh documentation)

class CaseSensitivizer(analysis.Filter):
    def __call__(self, tokens):
        for t in tokens:
            yield t
            if t.mode == "index":
                low = t.text.lower()
                if low != t.text:
                    t.text = low
                    yield t

this what my index and query parser looks like

def open_index(indexDirectory):
    # open index and return a idex object
    ix = index.open_dir(indexDirectory)
    return ix


def search_index(srch, ix):
    #  Search the index and print results
    #  ix = open_index(indexDirectory)
    results = ''
    lst = []
    qp = MultifieldParser(['Text', 'colname',
        'tblname', 'Length', 'DataType', 'tag_name'],
        schema=ix.schema, group=qparser.OrGroup)
    # qp = QueryParser('Text', schema=ix.schema)
    q = qp.parse(srch)
    with ix.searcher() as s:
        results = s.search(q, limit=None)
        for r in results:
            print('\n', r)
            lst.append(r.fields())
        if(DEBUG):
            print('Search Results:\n', lst)
            print('\nFinished in search.py')
        return lst

currently it only ever gives results that exactly match what I typed in search bar, so If I type "document" but the source is actually stored as "DOCUMENT" I wouldnt get any results

2 Answers2

0

I know this is an older issue but thought would reply in case somebody like me came here looking for a solution.

The CaseSensitivizer class needs to be used when you define your schema. This is how you would use it to create the schema from the quickstart example from the docs

>>> from whoosh.index import create_in
>>> from whoosh.fields import *
>>> from whoosh import analysis
>>> class CaseSensitivizer(analysis.Filter):
        def __call__(self, tokens):
            for t in tokens:
                yield t
                if t.mode == "index":
                    low = t.text.lower()
                    if low != t.text:
                        t.text = low
                        yield t
>>> myanalyzer = analysis.RegexTokenizer() | CaseSensitivizer()
>>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT(analyzer=myanalyzer))

Now you can use this schema to create your index and do what you were doing before to search. That worked for me.

-1

instead of using lower() or upper(), you can use casefold() for string comparison.

A very good example given here.

In short, example is:

s1 = 'Apple'
s3 = 'aPPle'
s1.casefold() == s3.casefold()

returns True.