How to differentiate a Cypher clause from an SQL clause in C?

Question

I am working on adding support for Cypher clauses on Postgres psql. So far, we have added if clauses with string comparison to separate Cypher clauses from SQL clauses, with one parser for each. The HandleCypherCmds() function calls the Cypher parser, and the SendQuery() function calls the SQL parser.

/* handle cypher match command */
        if (pg_strncasecmp(query_buf->data, "MATCH", 5) == 0 ||
                pg_strncasecmp(query_buf->data, "OPTIONAL", 8) == 0 ||
                pg_strncasecmp(query_buf->data, "EXPLAIN", 7) == 0 ||
                pg_strncasecmp(query_buf->data, "CREATE", 6) == 0)
        {
            cypherCmdStatus = HandleCypherCmds(scan_state,
                                cond_stack,
                                query_buf,
                                previous_buf);

            success = cypherCmdStatus != PSQL_CMD_ERROR;

            if (cypherCmdStatus == PSQL_CMD_SEND)
            {
                success = SendQuery(convert_to_psql_command(query_buf->data));
            }
        }
        else
            success = SendQuery(query_buf->data);

The problem with this approach is that, for example, CREATE could be a SQL clause or a Cypher clause. Also, if the user inserts a typo in the clause, like "MATH" instead of "MATCH," the clause will not reach the parser. To solve this problem, I am thinking of a better way to differentiate a Cypher clause from a SQL one. Is there a way to do this in C?

If you match on keyword and want to handle misspellings then maybe you need an approximate match, or match on other elements? — Allan Wind, Jun 30 '23 at 05:54
@AllanWind I updated my question. I forgot to mention that we implemented the parsers to handle the misspellings. The functions that call the parser are inside the condition for the commands that end with a semicolon. A SQL clause can't enter the Cypher clause parser and vice versa. — Carla, Jun 30 '23 at 15:30

score 1 · Accepted Answer · answered Jul 12 '23 at 00:07

We have solved this if anyone is interested. Instead of doing the string comparison in the C file, we have used variable checking which is done from the parser file instead.

The user input will be passed into the Cypher parser regardless if it is a Cypher or an SQL query, and only sends the input to the server as a Cypher command if the parser returns a success. For the parser to return a success, we have assigned each Cypher clause with a boolean variable which will be set to true only if the grammar rules are satisfied for the specific command entered. If no match has occurred, the variables will stay false which the parser will then return unsuccessful.

For clarification, here is a snippet of the parser:

%{
/* include statements/*
...
bool match = false;
bool set = false;
bool set_path = false;
bool create = false;
bool drop = false;
bool alter = false;
bool load = false;
...
%}

...

%%
statement:
    query
    | statement query
    | statement SEMICOLON { YYACCEPT; }
    ;

query:
    match_clause
    | create_clause { create = true; }
    | drop_clause { drop = true; }
    | alter_clause { alter = true; }
    | load_clause { load = true; }
    | set_clause { set = true; }
    ...
    ;

...
%%

...

bool
psql_scan_cypher_command(char* data)
{
    ...

    YY_BUFFER_STATE buf = yy_scan_string(data);
    yypush_buffer_state(buf);
    yyparse();

    if (match || optional || explain || create || drop || alter || load ||
        set || set_path || merge || rtn || unwind || prepare || execute)
        return true;

    return false;
}

...

Refer to the 'cypher.y' and 'mainloop.c' files for complete reference.

score 0 · Answer 2 · answered Jun 30 '23 at 13:48

In case of any typos or errors in the query, it is recommended to inform the user through an error message. This way, the user can easily identify and correct any issues within the query.

To differentiate between SQL clauses and Cypher clauses, you can use a delimiter, for instance, you can enclose all Cypher queries within a function called cypher(query) or BEGIN_CYPHER and END. Another way is a naming convention, such as marking Cypher functions with a prefix, like an underline: _MATCH.

score 0 · Answer 3 · answered Jun 30 '23 at 14:21

0

A method could be useful if you can try writing lexer and/or parser rules.

For instance, you can have lexer rules to match keywords such as "CREATE" or "MATCH".

Similarly, you can define the rules for parsing the token generated by the lexer and build an abstract syntax tree on the basis of the grammar rules.

Hope these help.

answered Jun 30 '23 at 14:21

Ahmad Tashfeen

21
2

I updated my question. I forgot to mention that we implemented the parsers to handle the misspellings. The functions that call the parser are inside the condition for the commands that end with a semicolon. A SQL clause can't enter the Cypher clause parser and vice versa. – Carla Jun 30 '23 at 15:31

score 0 · Answer 4 · answered Aug 16 '23 at 05:26

To differentiate Cypher clauses from SQL ones in your Postgres psql project, try using this approach.
Instead of direct string comparison, utilize variable checks within the parser. Pass user input to the Cypher parser regardless of type, and only proceed as Cypher if the parser confirms valid grammar rules. Assign boolean variables to each Cypher clause, setting them true upon successful parsing, otherwise, they stay false.

score 0 · Answer 5 · answered Aug 23 '23 at 06:04

A more reliable approach than a basic string comparison is required to distinguish between Cypher and SQL clauses, especially when the keywords may overlap and there may be mistakes.

Using a lexer or tokenizer to find keywords and syntax components in the input query is a typical strategy. The context can then be examined to identify whether it is a Cypher or a SQL query.

Hope it is helpful.

How to differentiate a Cypher clause from an SQL clause in C?

5 Answers5

Linked