0

I have an ElasticSearch index where documents look like the following:

{
  "labels": ["Common label for doc 1", "Other possible label"],
  "year": 1923,
  "boolProp": true
},
{
  "labels": ["Only one label here"],
  "year": 1812,
  "boolProp": true
},
...

As I query on the labels field, I would like to retrieve the best document but also the matching label.

I've read that this field is actually indexed as one single aggregated string... Do I have to convert my labels field to nested objects for this kind of query? I'm wondering it there's a more direct approach I'm missing...

Kevin
  • 63
  • 3
  • For each of these labels such as `"Common label for doc 1"`, `"Other possible label"`, `"Only one label here"`, are you trying to do full-text search, or only exact matching? That is, when you query for `possible label`, do you expect `"Other possible label"` to be returned? – kgf3JfUtW Dec 20 '17 at 15:53
  • I need a full-text search on this field – Kevin Dec 21 '17 at 08:57

1 Answers1

0

One way would be to use Highlighting.

This is a fairly rich feature, but the following example may help you achieve your goal.

{
    "query": {
        "match": {
            "myfield": "another"
        }
    },
    "highlight": {
        "fields": {
            "myfield": {
                "type": "plain"
            }
        },
        "pre_tags": [""],
        "post_tags": [""]
    }
}

You may choose to keep the matching text highlighted, or specify empty pre_tags and post_tags to just show the original text.

The highlight field in the response will only include the hits in the original source array that match.

{
  ...
    "hits": {
        "total": 1,
        "max_score": 0.28582606,
        "hits": [
            {
                "_index": "test",
                "_type": "mytype",
                "_id": "AWB6-u6V3-7fA7oZt-aX",
                "_score": 0.28582606,
                "_source": {
                    "myfield": [
                        "My favorite toy",
                        "Another toy for me"
                    ]
                },
                "highlight": {
                    "myfield": [
                        "Another toy for me"
                    ]
                }
            }
        ]
    }
}

If more than one value in the array matches, they are all returned.

{
    ...
    "hits": {
        "total": 1,
        "max_score": 0.3938048,
        "hits": [
            {
                "_index": "blah",
                "_type": "mytype",
                "_id": "AWB6-u6V3-7fA7oZt-aX",
                "_score": 0.3938048,
                "_source": {
                    "myfield": [
                        "My favorite toy",
                        "Another toy for me"
                    ]
                },
                "highlight": {
                    "myfield": [
                        "My favorite toy",
                        "Another toy for me"
                    ]
                }
            }
        ]
    }
}

There are certainly other options, as you mentioned, using a nested document or a parent-child relationship and obtaining the inner hits from those. Highlighting was the only solution I could find that maintains your original document structure.

derickson82
  • 456
  • 2
  • 11
  • Didn't know about this feature which seems really promising. I will give it a try! Thanks for the detailed answer! – Kevin Dec 22 '17 at 11:00