3

I have an issue when trying to do partial search using the kuromoji plugin.

When I index full sentence, like ホワイトソックス with analyzer like:

{
  "tokenizer": {
    "type": "kuromoji_tokenizer",
    "mode": "search"
  },
  "filter": ["lowercase"],
  "text" : "ホワイトソックス"
}

then the word is properly split into ホワイト and ソックス as it should, I can search for both words separately, and that's correct.

But, when user didn't provide full sentence yet and is missing last letter (ホワイトソック), any kuromoji analyzer treats it as one word. Because of that, result is empty.

My question is, is there something I can do about it? Either by indexing or searching this query in different fashion? I'm sure there is japan partial search but I can't find the right settings.

Example index settings:

{
    analyzer: {
        ngram_analyzer: {
            tokenizer: 'search_tokenizer',
            filter: ['lowercase', 'cjk_width', 'ngram_filter'],
        },
        search_analyzer: {
            tokenizer: 'search_tokenizer',
            filter: ['asciifolding'],
        }
    },
    filter: {
        ngram_filter: {
            type: 'edge_ngram',
            min_gram: '1',
            max_gram: '20',
            preserve_original: true,
            token_chars: ['letter', 'digit']
        }
    },
    tokenizer: {
        search_tokenizer: {
            type: 'kuromoji_tokenizer',
            mode: 'search'
        }
    }
}

Search query:

query_string: {
    fields: [
       "..."
    ],
    query: "ホワイトソック",
    fuzziness: "0",
    default_operator: "AND",
    analyzer: "search_analyzer"
 }

Any help appreciated!

sdooo
  • 1,851
  • 13
  • 20

0 Answers0