Preserving word order in Vespa in non-English

Question

I am creating a schema for Vespa mainly for English, but with two fields in Wylie transliteration of Tibetan, which looks like this

'jam dpal smra ba'i seng ge la bstod pa ut+pal dmar po'i do shal

Typically users want to match every token and preserve the word order, and preferably in the beginning of the field.

For example, to find the field above, user might enter "'jam dpal smra ba'i seng ge". They would not appreciate results where these tokens would appear in different order, even if that would rank high with BM25. BM25 would still be needed for fallback.

Could you give me an example of the schema field / ranking expression to rank in this order:

exact match in the beginning of field
exact match anywhere
bm25

Naturally, I'll turn off stemming. Also, apostrophes and, less importantly, plus signs should be preserved.

I have read especially the Schema Reference of Vespa docs, but I did not find a solution.

Please define what you mean by `exact match` and how this compares to bm25. BM25 is a ranking function for text, relying on exact token matching (possible with stemming, etc). — Jo Kristian Bergum, Nov 01 '22 at 07:14
@JoKristianBergum, I changed "exact match" to "preserve word order" — Roope K, Nov 01 '22 at 08:34

score 2 · Answer 1 · answered Nov 06 '22 at 17:07

I got the best results with

field wylie type string {
    indexing: index | summary
    index: enable-bm25
    stemming: none
}
rank-profile native_rank_and_wylie {
    first-phase {
        expression: nativeRank(title, body) + fieldMatch(wylie).earliness + fieldMatch(wylie).longestSequence * 0.4
    }
}

Note that longestSequence is not normalized and can affect scores a lot.

Preserving word order in Vespa in non-English

1 Answers1