2

Article Schema:

Below is the article schema what I have created.

class ArticleSchema(SchemaClass):
    title = TEXT(
        phrase=True, sortable=True, stored=True,
        field_boost=2.0, spelling=True, analyzer=StemmingAnalyzer())
    keywords = KEYWORD(
        commas=True, field_boost=1.5, lowercase=True)
    authors = KEYWORD(stored=True, commas=True, lowercase=True)
    content = TEXT(spelling=True, analyzer=StemmingAnalyzer())
    summary = TEXT(spelling=True, analyzer=StemmingAnalyzer())
    published_time = DATETIME(stored=True, sortable=True)
    permalink = STORED
    thumbnail = STORED
    article_id = ID(unique=True, stored=True)
    topic = TEXT(spelling=True, stored=True)
    series_id = STORED
    tags = KEYWORD(commas=True, lowercase=True)

Search Query

FIELD_TIME = 'published_time'
FIELD_TITLE = 'title'
FIELD_PUBLISHER = 'authors'
FIELD_KEYWORDS = 'keywords'
FIELD_CONTENT = 'content'
FIELD_TOPIC = 'topic'

def search_query(search_term=None, page=1, result_len=10):
    '''Search the provided query.'''
    if not search_term or search_term == '':
        return None, 0
    if not index.exists_in(INDEX_DIR, indexname=INDEX_NAME):
        return None, 0
    ix = get_index()
    parser = qparser.MultifieldParser(
            [FIELD_TITLE, FIELD_PUBLISHER, FIELD_KEYWORDS, FIELD_TOPIC],
            ix.schema)
    query = parser.parse(search_term)
    query.normalize()
    search_results = []
    with ix.searcher() as searcher:
        results = searcher.search_page(
            query,
            pagenum=page,
            pagelen=result_len,
            sortedby=[sorting_timestamp, scores],
            reverse=True,
            terms=True
        )
        if results.scored_length() > 0:
            for hit in results:
                search_results.append(append_to(hit))
            return (search_results, results.pagecount)

    parser = qparser.MultifieldParser(
            [FIELD_TITLE, FIELD_PUBLISHER, FIELD_TOPIC],
            ix.schema, termclass=FuzzyTerm)
    parser.add_plugin(qparser.FuzzyTermPlugin())
    query = parser.parse(search_term)
    query.normalize()
    search_results = []
    with ix.searcher() as searcher:
        results = searcher.search_page(
            query,
            pagenum=page,
            pagelen=result_len,
            sortedby=[sorting_timestamp, scores],
            reverse=True,
            terms=True
        )
        if results.scored_length() > 0:
            for hit in results:
                search_results.append(append_to(hit))
            return (search_results, results.pagecount)
    return None, 0

When I am trying the title search is working, but for author and keyword the search is not working. I am not able to understand what wrong I am doing here. I am getting data from api and then running the index. It's all working fine. But when I am searching through keywords like authors and keywords it's not working.

Rahul Shrivastava
  • 1,391
  • 3
  • 14
  • 38

1 Answers1

0

Both authors and keywords are of type KEYWORD which does not support phrase search which mean that you should search with the exact keyword or one of its derivatives since you are using a stemmer.

For authors, I think you should use TEXT.

From whoosh documentation

whoosh.fields.KEYWORD

This type is designed for space- or comma-separated keywords. This type is indexed and searchable (and optionally stored). To save space, it does not support phrase searching.

Assem
  • 11,574
  • 5
  • 59
  • 97