Have a look at the delimited payload token filter which you can use to store the scores as payloads, and at text scoring in scripts which gives you access to the payloads.
UPDATED TO INCLUDE EXAMPLE
First you need to setup an analyzer which will take the number after |
and store that value as a payload with each token:
curl -XPUT "http://localhost:9200/myindex/" -d'
{
"settings": {
"analysis": {
"analyzer": {
"payloads": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
" delimited_payload_filter"
]
}
}
}
},
"mappings": {
"mytype": {
"properties": {
"text": {
"type": "string",
"analyzer": "payloads",
"term_vector": "with_positions_offsets_payloads"
}
}
}
}
}'
Then index your document:
curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'
{
"text": "James|2.14 Bond|2.14 world|0.86 somemore|3.15"
}'
And finally, search with a function_score
query that iterates over each term, retrieves the payload and incorporates it with the _score
:
curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'
{
"query": {
"function_score": {
"query": {
"match": {
"text": "james bond"
}
},
"script_score": {
"script": "score=0; for (term: my_terms) { termInfo = _index[\"text\"].get(term,_PAYLOADS ); for (pos : termInfo) { score = score + pos.payloadAsFloat(0);} } return score;",
"params": {
"my_terms": [
"james",
"bond"
]
}
}
}
}
}'
The script itself, when not compressed into one line, looks like this:
score=0;
for (term: my_terms) {
termInfo = _index['text'].get(term,_PAYLOADS );
for (pos : termInfo) {
score = score + pos.payloadAsFloat(0);
}
}
return score;
Warning: accessing payloads has a significant performance cost, and running scripts also has a performance cost. You may want to experiment with it using dynamic scripts as above, then rewrite the script as a native Java script when you're satisfied with the result.