2

Just wanted to know. Is it possible to highlight text in ElasticSearch on an index with _source = false ?

I mean i know if ES doesn't have the document he can't do the highlight but is there a way to just use ES as an highlight engine instead of a full search engine with highlights? (I provide the full document in the highlight query)

Thanks

Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419

3 Answers3

3

I don't believe it's possible.

However you can use _analyze on your search query and document and then compare tokens to highlight in your code.

For example:

curl -XGET 'localhost:9200/test/_analyze?analyzer=snowball' -d 'some search query keywords'

{"tokens":[{"token":"some","start_offset":0,"end_offset":4,"type":"","position":1},{"token":"search","start_offset":5,"end_offset":11,"type":"","position":2},{"token":"query","start_offset":12,"end_offset":17,"type":"","position":3},{"token":"keyword","start_offset":18,"end_offset":26,"type":"","position":4}]}

curl -XGET 'localhost:9200/test/_analyze?analyzer=snowball' -d '$document_text'

{"tokens":..}

Then look for those token matches in document and offsets should provide you with correct highlight location in document.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
farid
  • 379
  • 1
  • 9
  • Can you check http://stackoverflow.com/questions/11303660/elasticsearch-edgengram-highlight-term-vector-bad-highlights please – Sebastien Lorber Jul 03 '12 at 15:10
  • I've looked into it, but I don't have enough experience with NGram analyzer, but you can ask on elasticsearch mailing list. – farid Jul 11 '12 at 22:12
1
{
  "query": {
    "query_string": {
      "query": "**",
      "fields["
      sometext "]}},"
      highlight {
        "pre_tags": ["<em>"],
        "post_tags[</em>"],
      "order": "score",
      "require_field_match": true,
      "fields": {
        "sometext": {
          "fragment_size": 180,
          "number_of_fragments": 1
        }
      }
    }
  }
taras
  • 6,566
  • 10
  • 39
  • 50
  • we use highlighting becauseA match_phrase query search in the content.text field takes from 5 to 30 seconds. Highlight retrieval for the content.text field takes in average 10 seconds per hit – AMIT KUMAR Sep 18 '17 at 08:12
0

If the source is not deactivated by default you can:

{
    "_source" :  ["_id"],
    "query": {
        "match" : {
            "attachment.content" : "Setup"
        }
    },
    "highlight": {
        "fields" : {
            "attachment.content" : {}
        }
    }
}

You have to put something in the _score. It still returns every "metadata" about the document it found:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.2919385,
        "hits": [
            {
                "_index": "test",
                "_type": "_doc",
                "_id": "xpto",
                "_score": 0.2919385,
                "_source": {},
                "highlight": {
                    "attachment.content": [
                        "<em>Setup</em> the [GenericCommand.properties] file\n\nThe commands that ought to be recognized have to be defined"
                    ]
                }
            }
        ]
    }
}
jo se
  • 43
  • 1
  • 5