3

I have written a parser using python PLY library.

Elastic search mapping schema looks like below:

{
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "type1": {
      "properties": {
        "prop1": {
          "type": "keyword"
        },
        "prop2": {
          "type": "keyword"
        },
        "query": {
          "properties": {
            "regexp": {
              "properties": {
                "prop1": {
                  "type": "keyword"
                },
                "prop2": {
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    },
    "type2": {
      "properties": {
        "prop3": {
          "type": "keyword"
        },
        "prop4": {
          "type": "keyword"
        },
        "prop5": {
          "type": "keyword"
        }
      }
    }
  }
}

Parser looks like below:

import ply.lex as lex

tokens = (  
        'LP',
        'RP',
        'FUNC1',
        'FUNC2',
        'OP',
        'PARAM',
)

t_PARAM = r'[^ \/\(\),&:\"~]+'

def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

t_ignore = ' \t'

def t_OP(t):
    r' INTERSECT | UNION | MINUS '
    return t

def t_LP(t):
    r'\('
    return t

def t_RP(t):
    r'\)'
    return t

def t_FUNC1(t):
    r'FUNC1'
    return t

def t_FUNC2(t):
    r'FUNC2'
    return t

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

lexer = lex.lex()

import ply.yacc as yacc

def p_expr_op_expr(p):
    'expression : expression OP expression'
    if p[2] == 'INTERSECT':
        -- form elastic-search AND query (no idea how to write es query 
           here)
    elif p[2] == 'MINUS':
        -- form elastic-search MINUS query (no idea how to write es 
           query here)
    elif p[2] == 'UNION':
        -- form elastic-search OR query (no idea how to write es query 
           here)

def p_expr_paren(p):
    'expression : LPAREN expression RPAREN'
    p[0] = p[2]

def p_expr_func1(p):
    'expression : FUNC1 LPAREN PARAM RPAREN'
    -- form elastic-search query
    -- such that in query we are referring to 
    -- prop1 from es schema and looking for PARAM
    -- eg : {'prop1':'PARAM'} or
    -- {'regexp': {'prop1': 'PARAM'}}

def p_expr_func2(p):
    'expression : FUNC2 LPAREN PARAM RPAREN'
    -- form elastic-search query
    -- such that in query we are referring to 
    -- prop1 from es schema and looking for PARAM
    -- eg : {'prop2':'PARAM'} or
    -- {'regexp': {'prop2': 'PARAM'}}


def p_expr_param(p):
    'expression : PARAM'
    -- form elastic-search query
    -- such that in query we are referring to 
    -- prop1 from es schema and looking for PARAM
    -- eg : {'prop3':'PARAM'} or
    -- {'regexp': {'prop3': 'PARAM'}}

def p_error(p):
    print("Syntax error at '%s'" % p.value)

parser = yacc.yacc()

while True:
    try:
        s = input('input > ')
    except EOFError:
        break;
    parser.parse(s)

Input query examples:

1.) func1(foo) UNION func2(bar) => union is OR
2.) (func1(foo) UNION func2(bar.*)) MINUS baz
3.) 2.) (func1(foo) UNION func2(bar.*)) MINUS func1(boo)
4.) (foo.* UNION bar) INTERSECT baz.*

My parser works totally fine, but i am not sure after parsing how should i form the es queries. For eg:

If my input is func1(foo) UNION func2(bar)

func1(foo) will be parsed by function as {'regexp': {'prop1': 'foo'}} func2(bar) will be parsed by function as {'regexp': {'prop1': 'bar'}}

Now, it will come to function p_expr_op_expr because it contains operator UNION, so how do i write final union es query here?

Please advise or point me to some examples where they form es queries after parsing through ply or what is the best way to parse these sort of expressions and form es queries?

here, func1, func2 in the expressions helps to decide which property of es schema is to be referred.

zubug55
  • 729
  • 7
  • 27

0 Answers0