2

I am trying to get data from Elasticsearch using python. I was able to connect to the data source using python.

Connect To Elasticsearch

from elasticsearch import Elasticsearch

try:
  es = Elasticsearch(
      ['https:/movies-elk-prod-elastic-ssl.pac.com'],
      http_auth=('xxxxx', 'xxxx'),
      port=10202,
  )
  print ("Connected", es.info())
except Exception as ex:
  print ("Error:", ex)
es.info(pretty=True)

The Input Data

   {
                "_index": "movies",
                "_type": "movie",
                "_id": "4444169",
                "_score": null,
                "_source": {
                    "actor": [
                        "Josh Duhamel",
                        "Megan Fox",
                        "Shia LaBeouf"
                    ],
                    "director": [
                        "Michael Bay"
                    ],
                    "full_text": "When Sam Witwicky learns the truth about the ancient origins of the Transformers, he must accept his fate and merge with Optimus Prime and Bumblebee in their epic battle against the Decepticons, who are stronger back than ever and plan to destroy our world.!",
                    "title": "Transformers: Revenge of the Fallen",
                    "type": "movie"
                },
                "sort": [
                    1544310000000,
                    4.05
                ]
            },
            {
                "_index": "movies",
                "_type": "movie",
                "_id": "4051",
                "_score": null,
                "_source": {
                    "actor": [
                        "Josh Duhamel",
                        "Shia LaBeouf",
                        "Patrick Dempsey"
                    ],
                    "director": [
                        "Michael Bay"
                    ],
                    "full_text": "A mysterious event from the Earth's past threatens to unleash such a devastating war that the Transformers can not possibly save the planet on its own. Sam Witwicky and the Autobots fight the darkness to defend our world against the devastating evil of the Decepticons.",
                    "title": "Transformers: Dark of the Moon",
                    "type": "movie"
                },
                "sort": [
                    1544310000000,
                    4.03949
                ]
            },

Next, I want to write a python function which queries the elasticsearch based on actor e.g suppose if I write Josh Duhamel. It should give me all the movies containing Josh Duhamel.

As a next step, I want to convert the data into python data frame. I tried a few things based on functions I have found this but it's not working for me (Ps. I am new to elasticsearch as well as python -:))

def search(uri, term):
    """Simple Elasticsearch Query"""
    query = json.dumps({
        "query": {
            "match": {
                "content": term
            }
        }
    })
    response = requests.get(uri, data=query)
    results = json.loads(response.text)
    return results

def format_results(results):
    """Print results nicely:
    doc_id) content
    """
    data = [doc for doc in results['hits']['hits']]
    for doc in data:
        print("%s) %s" % (doc['_id'], doc['_source']['content'])

def create_doc(uri, doc_data={}):
    """Create new document."""
    query = json.dumps(doc_data)
    response = requests.post(uri, data=query)
    print(response)

from elasticsearch import Elasticsearch

es = Elasticsearch()
res = es.search(index="test", doc_type="articles", body={"query": {"match": {"content": "fox"}}})
print("%d documents found" % res['hits']['total'])
for doc in res['hits']['hits']:
    print("%s) %s" % (doc['_id'], doc['_source']['content']))

I would appreciate your help and insights. Thanks in advance

James Taylor
  • 484
  • 1
  • 8
  • 23

1 Answers1

0

The piece of code you posted is looking in ES for the field named "content", in order to make this work for what you want to achieve you need to change the content part to the desirable field within your document e.g. "actor". Same goes for the other parts where you look for the field "content" in the rest of the code.

 "query": {
        "match": {
            "actor": term
        }
    }
MDah
  • 424
  • 5
  • 10