2

I am using elasticsearch-py (es version is 2.3) and would like to return just the 'title' field from all documents in an index with the mapping: actors, director, genre, plot, title, year.

I'm currently trying messages = es.search(index="movies", _source=['hits.hits.title']) and the resulting response is:

{u'hits': {u'hits': [{u'_score': 1.0, u'_type': u'movie', u'_id': u'tt0116996', u'_source': {}, u'_index': u'movies'}, {u'_score': 1.0, u'_type': u'movie', u'_id': u'1', u'_source': {}, u'_index': u'movies'}], u'total': 2, u'max_score': 1.0}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 2, u'timed_out': False}

I've tried different versions of filter paths and source field lists but can't seem to get it right.

nmacc
  • 57
  • 2
  • 6
  • You probably want just `_source: ['title']` – Ryan Walker Nov 29 '16 at 16:46
  • Tried `messages = es.search(index="movies", _source=['title'])` which returned `{u'hits': {u'hits': [{u'_score': 1.0, u'_type': u'movie', u'_id': u'tt0116996', u'_source': {u'title': u'Mars Attacks!'}, u'_index': u'movies'}, {u'_score': 1.0, u'_type': u'movie', u'_id': u'1', u'_source': {u'title': u'I saw a movie once a tale'}, u'_index': u'movies'}], u'total': 2, u'max_score': 1.0}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}`so i'm still missing something, somewhere – nmacc Nov 29 '16 at 16:52

3 Answers3

4

You can apply source filtering with:

messages = es.search(index="movies", _source=["title"])

but you'll still need to parse the response. For this you can do something like:

titles = [hit["title"] for hit in messages["hits"]["hits"]["_source"]]]

There is nothing in the elasticsearch-py API (as far as I know) that will flatten down the rather verbose response you get from Elasticsearch.

Ryan Walker
  • 3,176
  • 1
  • 23
  • 29
1

You can now use the _source_exclued and _source_include kwargs in the search function to limit what fields are returned.

So something like:

messages = es.search(index="movies", _source=["title"], _source_include=['title'])
Metropolis
  • 2,018
  • 1
  • 19
  • 36
1

I had similar problem and this is how I solved it. I needed it in a bit different context - I had to use information about the title later in the loop:

res = es.search(index="movies", body={"query": {"match_all": {}}})
for hit in res['hits']['hits']:
    title = hit['_source'].get('title')
Amela
  • 23
  • 5