10

We are building our own query language similar to Mysql using antlr4. Except we only use where clause, in other words user does not enter select/from statements.

I was able to create grammar for it and generate lexers/parsers/listeners in golang.

Below our grammar file EsDslQuery.g4:

grammar EsDslQuery;

options {
language = Go;
}

query
   : leftBracket = '(' query rightBracket = ')'                             #bracketExp
   | leftQuery=query op=OR rightQuery=query                                 #orLogicalExp
   | leftQuery=query op=AND rightQuery=query                                #andLogicalExp
   | propertyName=attrPath op=COMPARISON_OPERATOR propertyValue=attrValue   #compareExp
   ;

attrPath
   : ATTRNAME ('.' attrPath)?
   ;

fragment ATTR_NAME_CHAR
   : '-' | '_' | ':' | DIGIT | ALPHA
   ;

fragment DIGIT
   : ('0'..'9')
   ;

fragment ALPHA
   : ( 'A'..'Z' | 'a'..'z' )
   ;

attrValue
   : BOOLEAN           #boolean
   | NULL              #null
   | STRING            #string
   | DOUBLE            #double
   | '-'? INT EXP?     #long
   ;

...

Query example: color="red" and price=20000 or model="hyundai" and (seats=4 or year=2001)

ElasticSearch supports sql queries with plugin here: https://github.com/elastic/elasticsearch/tree/master/x-pack/plugin/sql.

Having hard time to understand java code.

Since we have Logical Operators I am quite not sure how to get parse tree and convert it to ES query. Can somebody help/suggest ideas?

Update 1: Added more examples with corresponding ES query

Query Example 1: color="red" AND price=2000

ES query 1:

{
    "query": {
      "bool": {
        "must": [
          {
            "terms": {
              "color": [
                "red"
              ]
            }
          },
          {
            "terms": {
              "price": [
                2000
              ]
            }
          }
        ]
      }
    },
    "size": 100
  }

Query Example 2: color="red" AND price=2000 AND (model="hyundai" OR model="bmw")

ES query 2:

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": {
              "terms": {
                "color": ["red"]
              }
            }
          }
        },
        {
          "bool": {
            "must": {
              "terms": {
                "price": [2000]
              }
            }
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "model": "hyundai"
                }
              },
              {
                "term": {
                  "region": "bmw"
                }
              }
            ]
          }
        }
      ]
    }
  },
  "size": 100
}


Query Example 3: color="red" OR color="blue"

ES query 3:

{
    "query": {
      "bool": {
        "should": [
          {
            "bool": {
              "must": {
                "terms": {
                  "color": ["red"]
                }
              }
            }
          },
          {
            "bool": {
              "must": {
                "terms": {
                    "color": ["blue"]
                }
              }
            }
          }
        ]
      }
    },
    "size": 100
  }

omurbek
  • 742
  • 1
  • 7
  • 23
  • Consider adding some example output. How does `color="red" and price=20000 or model="hyundai" and (seats=4 or year=2001` look like in ES syntax? You want the JSON syntax, or the short query string syntax, or something all together different? It also helps if you add more than 1 example. Also, have you already tried something yourself? – Bart Kiers Feb 20 '20 at 18:26
  • I have added more examples. Yes, I want to build json syntax from parse tree. I am in progress of doing it with golang, have not finished it yet – omurbek Feb 21 '20 at 04:05

3 Answers3

8

Working demo url: https://github.com/omurbekjk/convert-dsl-to-es-query-with-antlr, estimated time spent: ~3 weeks

After investigating antlr4 and several examples I found simple solution with listener and stack. Similar to how expressions are calculated using stack.

We need to overwrite to default base listener with ours to get triggers for each enter/exit grammar rules. Important rules are:

  1. Comparison expression (price=200, price>190)
  2. Logical operators (OR, AND)
  3. Brackets (in order to correctly build es query we need to write correct grammar file remembering operator precedence, that's why brackets are in the first place in the grammar file)

Below my custom listener code written in golang:

package parser

import (
    "github.com/olivere/elastic"
    "strings"
)

type MyDslQueryListener struct {
    *BaseDslQueryListener
    Stack []*elastic.BoolQuery
}

func (ql *MyDslQueryListener) ExitCompareExp(c *CompareExpContext) {
    boolQuery := elastic.NewBoolQuery()

    attrName := c.GetPropertyName().GetText()
    attrValue := strings.Trim(c.GetPropertyValue().GetText(), `\"`)
    // Based on operator type we build different queries, default is terms query(=)
    termsQuery := elastic.NewTermQuery(attrName, attrValue)
    boolQuery.Must(termsQuery)
    ql.Stack = append(ql.Stack, boolQuery)
}

func (ql *MyDslQueryListener) ExitAndLogicalExp(c *AndLogicalExpContext) {
    size := len(ql.Stack)
    right := ql.Stack[size-1]
    left := ql.Stack[size-2]
    ql.Stack = ql.Stack[:size-2] // Pop last two elements
    boolQuery := elastic.NewBoolQuery()
    boolQuery.Must(right)
    boolQuery.Must(left)
    ql.Stack = append(ql.Stack, boolQuery)
}

func (ql *MyDslQueryListener) ExitOrLogicalExp(c *OrLogicalExpContext) {
    size := len(ql.Stack)
    right := ql.Stack[size-1]
    left := ql.Stack[size-2]
    ql.Stack = ql.Stack[:size-2] // Pop last two elements
    boolQuery := elastic.NewBoolQuery()
    boolQuery.Should(right)
    boolQuery.Should(left)
    ql.Stack = append(ql.Stack, boolQuery)
}

And main file:

package main

import (
    "encoding/json"
    "fmt"
    "github.com/antlr/antlr4/runtime/Go/antlr"
    "github.com/omurbekjk/convert-dsl-to-es-query-with-antlr/parser"
)

func main() {
    fmt.Println("Starting here")
    query := "price=2000 OR model=\"hyundai\" AND (color=\"red\" OR color=\"blue\")"
    stream := antlr.NewInputStream(query)
    lexer := parser.NewDslQueryLexer(stream)
    tokenStream := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)
    dslParser := parser.NewDslQueryParser(tokenStream)
    tree := dslParser.Start()

    listener := &parser.MyDslQueryListener{}
    antlr.ParseTreeWalkerDefault.Walk(listener, tree)

    esQuery := listener.Stack[0]

    src, err := esQuery.Source()
    if err != nil {
        panic(err)
    }
    data, err := json.MarshalIndent(src, "", "  ")
    if err != nil {
        panic(err)
    }

    stringEsQuery := string(data)
    fmt.Println(stringEsQuery)
}

/**     Generated es query
{
  "bool": {
    "should": [
      {
        "bool": {
          "must": [
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "must": {
                        "term": {
                          "color": "blue"
                        }
                      }
                    }
                  },
                  {
                    "bool": {
                      "must": {
                        "term": {
                          "color": "red"
                        }
                      }
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "must": {
                  "term": {
                    "model": "hyundai"
                  }
                }
              }
            }
          ]
        }
      },
      {
        "bool": {
          "must": {
            "term": {
              "price": "2000"
            }
          }
        }
      }
    ]
  }
}

*/

omurbek
  • 742
  • 1
  • 7
  • 23
3

Have you thought about converting your sql-like statements to query string queries?

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "query_string" : {
            "query" : "(new york city) OR (big apple)",
            "default_field" : "content"
        }
    }
}
'

If your use-cases stay simple like color="red" and price=20000 or model="hyundai" and (seats=4 or year=2001), I'd go with the above. The syntax is quite powerful but the queries are guaranteed to run more slowly than the native, spelled-out DSL queries since the ES parser will need to convert them to the DSL for you.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • The thing here is I will need to validation of passed properties. Say if user mistyped "pricee" or instead of passing number it passes invalid value. (ex: "price=adfasdf") – omurbek Feb 27 '20 at 04:50
  • Well that's another story. You may wanna get your mapping first (`GET index_name/_mapping`), identify which fields you'll want to expose to the users to search (so you can build your validator or a "did-you-mean" functionality). If you wanna enforce the field value data types, you can extract that information from the mapping too... – Joe - GMapsBook.com Feb 27 '20 at 09:39
2

There is a software called Dremio https://www.dremio.com/

It can translate SQL query to elastic search query

https://www.dremio.com/tutorials/unlocking-sql-on-elasticsearch/

user1392853
  • 269
  • 1
  • 6
  • 19