I want to know whether Elasticsearch native SQL will be parsed as Elasticsearch DSL (json query)? And how to get the parsing time??
I found a link about that.
An Introduction to Elasticsearch SQL with Practical Examples - Part 1
You can find a picture about Elasticsearch SQL implementation consists of 4 execution phases
at Implementation Internals.
I suppose the answer is "YES". But I can not find a way to get the "parsed time" which ES native SQL will be parsed as ES DSL (json query). I think "the way" for a long time, then I write following codes.
My local environment:
- macOS Mojave Version 10.14.2
- MacBook Pro(Retina, 13-inch, Early 2015)
- Processor 2.7 GHz Intel Core i5
- Memory 8 GB 1867 MHz DDR3
- Elasticsearch 6.5.4
- python 2
Requirements:
- pip install elasticsearch
- pip install requests
"""
SQL Access (X-Pack)
https://www.elastic.co/guide/en/elasticsearch/reference/6.5/xpack-sql.html
"""
import random
import requests
import datetime
import elasticsearch
URL_PREFIX = 'http://localhost:9200'
def get_url(url):
return URL_PREFIX + url
def translate_sql(sql):
return requests.post(url=get_url('/_xpack/sql/translate/?pretty'),
json={'query': sql}).content
def search_with_query(query):
headers = {'Content-type': 'application/json'}
start_time = datetime.datetime.now()
result = requests.post(url=get_url('/_search?pretty'),
headers=headers,
data=query).content
return datetime.datetime.now() - start_time
def search_with_sql(sql):
# https://www.elastic.co/guide/en/elasticsearch/reference/6.6/sql-rest.html
query = '{"query":"' + sql.replace('\n', '') + '"}'
headers = {'Content-type': 'application/json'}
start_time = datetime.datetime.now()
result = requests.post(url=get_url('/_xpack/sql?format=txt&pretty'),
headers=headers,
data=query).content
return datetime.datetime.now() - start_time
def gen_data(number):
es = elasticsearch.Elasticsearch()
for i in range(number):
es.index(index='question_index', doc_type='question_type', body={
'a': random.randint(0, 10),
'b': random.randint(0, 10),
'c': random.randint(0, 10),
'd': random.randint(0, 10),
'e': random.randint(0, 10),
})
if i % 10000 == 0:
print i
if __name__ == '__main__':
# gen_data(100000)
sql = '''
select max(a) as max_a
from question_index
group by a
having max_a > 1
order by a limit 1
'''
json_query = translate_sql(sql)
sql_cost_time = search_with_sql(sql)
query_cost_time = search_with_query(json_query)
print 'sql :', sql_cost_time
print 'query:', query_cost_time
if query_cost_time < sql_cost_time:
print 'parsing time is:', sql_cost_time - query_cost_time, ' ???'
else:
print 'parsing time is :', query_cost_time - sql_cost_time, ' ???'
Actual results:
I don't know whether the code is right?
Expected:
I expect the code is right.
Your answer may save my hair!
Thanks