2

I want to know whether Elasticsearch native SQL will be parsed as Elasticsearch DSL (json query)? And how to get the parsing time??

I found a link about that.
An Introduction to Elasticsearch SQL with Practical Examples - Part 1

You can find a picture about Elasticsearch SQL implementation consists of 4 execution phases at Implementation Internals.

I suppose the answer is "YES". But I can not find a way to get the "parsed time" which ES native SQL will be parsed as ES DSL (json query). I think "the way" for a long time, then I write following codes.

My local environment:
- macOS Mojave Version 10.14.2
- MacBook Pro(Retina, 13-inch, Early 2015)
- Processor 2.7 GHz Intel Core i5
- Memory 8 GB 1867 MHz DDR3
- Elasticsearch 6.5.4
- python 2

Requirements:
- pip install elasticsearch
- pip install requests

"""
SQL Access (X-Pack)
https://www.elastic.co/guide/en/elasticsearch/reference/6.5/xpack-sql.html
"""
import random
import requests
import datetime
import elasticsearch

URL_PREFIX = 'http://localhost:9200'


def get_url(url):
    return URL_PREFIX + url


def translate_sql(sql):
    return requests.post(url=get_url('/_xpack/sql/translate/?pretty'),
                         json={'query': sql}).content


def search_with_query(query):
    headers = {'Content-type': 'application/json'}
    start_time = datetime.datetime.now()
    result = requests.post(url=get_url('/_search?pretty'),
                           headers=headers,
                           data=query).content
    return datetime.datetime.now() - start_time


def search_with_sql(sql):
    # https://www.elastic.co/guide/en/elasticsearch/reference/6.6/sql-rest.html
    query = '{"query":"' + sql.replace('\n', '') + '"}'
    headers = {'Content-type': 'application/json'}
    start_time = datetime.datetime.now()
    result = requests.post(url=get_url('/_xpack/sql?format=txt&pretty'),
                           headers=headers,
                           data=query).content
    return datetime.datetime.now() - start_time


def gen_data(number):
    es = elasticsearch.Elasticsearch()
    for i in range(number):
        es.index(index='question_index', doc_type='question_type', body={
            'a': random.randint(0, 10),
            'b': random.randint(0, 10),
            'c': random.randint(0, 10),
            'd': random.randint(0, 10),
            'e': random.randint(0, 10),
        })
        if i % 10000 == 0:
            print i


if __name__ == '__main__':
    # gen_data(100000)
    sql = '''
        select max(a) as max_a 
        from question_index 
        group by a 
        having max_a > 1
        order by a limit 1
    '''
    json_query = translate_sql(sql)

    sql_cost_time = search_with_sql(sql)
    query_cost_time = search_with_query(json_query)

    print 'sql  :', sql_cost_time
    print 'query:', query_cost_time

    if query_cost_time < sql_cost_time:
        print 'parsing time is:', sql_cost_time - query_cost_time, ' ???'
    else:
        print 'parsing time is :', query_cost_time - sql_cost_time, ' ???'

Actual results:
I don't know whether the code is right?

Expected:
I expect the code is right.

Your answer may save my hair!
Thanks

Alvin Nguyen
  • 250
  • 4
  • 10
Lin Liu
  • 21
  • 1

0 Answers0