1

I am newbie in the real-time distributed search engine elasticsearch, but I would like to ask a technical question.

I have written a python module-crawler that parses a web page and creates JSON objects with native information. The next step for my module-crawler is to store the native information, using the elasticsearch.

The real question is the following. Which technique is better for my occasion? The elasticsearch RESTful API or the python API for elastic search (elasticsearch-py) ?

nbompetsis
  • 61
  • 1
  • 7

2 Answers2

0

If you already have Python code, then the most natural way for you would be to use the elasticsearch-py client.

After installing the elasticsearch-py library via pip install elatsicsearch, you can find a simple code example to get you going:

# import the elasticsearch library
from elasticsearch import Elasticsearch

# get your JSON data
json_page = {...}

# create a new client to connect to ES running on localhost:9200
es = Elasticsearch()

# index your JSON data
es.index(index="webpages", doc_type="webpage", id=1, body=json_page)
Val
  • 207,596
  • 13
  • 358
  • 360
  • ok, your answer is much informative but than I've been looking for the elasticsearch py client that is nothing more than a Wrapper above the HTTP API of the elasticsearch. I am right ? Thanks for your answer – nbompetsis Dec 01 '15 at 12:56
  • That's correct. Although, in the end it all depends on what you mean exactly by "better", i.e. easier for coding, more performant, more feature-full, etc. Using the python library, you have less coding to do and you can benefit from lots of boilerplate (connection, retries, etc) than if you have to code it all again yourself. – Val Dec 01 '15 at 12:58
0

You may also try elasticsearch_dsl it is a high level wraper of elasticsearch.

buxizhizhoum
  • 1,719
  • 1
  • 23
  • 32