1

I have the following data:

dbCon= {
    main = {
        database = "db1",
        hostname = "db1.serv.com",
        maxConnCount = "5",
        port = "3306",
        slaves = [
            {
                charset = "utf8",
                client = "MYSQL",
                compression = "true",
                database = "db1_a",
                hostname = "db1-a.serv.com",
                maxConnCount = "5",
                port = "3306",
            }
            {
                charset = "utf8",
                client = "MYSQL",
                compression = "true",
                database = "db1_b",
                hostname = "db1-b.serv.com",
                maxConnCount = "5",
                port = "3306",
            }
        ]
        username = "user-1"
    }
}

I'm trying to use Grako to convert this into JSON, but I can't get the EBNF format correct. Here's what I have:

import grako
import json

grammar_ebnf = """
    final = @:({ any } | { bracketed } | { braced });
    braced = '{' @:( { bracketed } | { braced } | { any } ) '}' ;
    bracketed = '[' @:( { braced } | { bracketed } | { any } ) ']' ;
    any = /^[^\[\{\]\}\n]+/ ;
"""

model = grako.genmodel("final", grammar_ebnf)
with open('out.txt') as f:
    ast = model.parse(f.read())
    print (json.dumps(ast, indent = 4))

However, this just prints out:

[
    "dbCon = "
]

Where am I going wrong? I've never used Grako. I just want to be able to parse this into something usable/accessible, without designing a static parser in case the format changes. If the format changes later, it seems easier to update the EBNF rather than reworking a whole parser.

Jay Kominek
  • 8,674
  • 1
  • 34
  • 51
MrDuk
  • 16,578
  • 18
  • 74
  • 133
  • 1
    Cross-posted: http://cs.stackexchange.com/questions/43060/how-can-i-represent-this-in-ebnf. – André Souza Lemos May 28 '15 at 21:26
  • 1
    well, your grammar appears to be incorrect ... but I'm guessing you know that already ;) Some things I noticed: Nowhere in your grammar appears a token for '=', which is a fundamental part of your input. I would try adding a rule for a key=value pair. I will think about this a bit more tomorrow. – PeterE May 28 '15 at 22:32
  • 2
    Do you need to use grako? Or you just want to convert that to JSON in Python? – paulotorrens May 30 '15 at 20:29
  • Yeah I don't have to use grako - just looking for something a little more easily changed than writing a custom parser. – MrDuk May 30 '15 at 22:54

1 Answers1

3

It's hard to be sure what the real grammar is with just one example, but hopefully this is enough that you'll be able to finish tweaking it to deal with any weirdness.

We need the Semantics class to deal with converting the key/value pairs and lists of them into dictionaries. Careful use of @: otherwise does the job.

As a suggestion, when naming rules in a grammar, name them after what they are (list, dict, etc) not what they look like (braced, bracketed). Also, split things up into lots of rules to start with. You can always coalesce them later.

#!/usr/bin/python

import grako
import json

grammar = """
final = kvpair;
kvpair = key '=' value;
key = /[^\s=]+/;
value = @:(dict | list | string) [','];
list = '[' @:{ value } ']';
string = '"' @:/[^"]*/ '"';
dict = '{' @:{ kvpair } '}';
"""

class Semantics(object):
    def kvpair(self, arg):
        key, ignore, value = arg
        return { key: value }
    def dict(self, arg):
        d = { }
        for v in arg:
            d.update(v)
        return d

model = grako.genmodel("final", grammar)

with open('out.txt') as f:
    ast = model.parse(f.read(), semantics=Semantics())
    print json.dumps(ast, indent=4)

This produces output of:

{
    "dbCon": {
        "main": {
            "username": "user-1",
            "maxConnCount": "5",
            "slaves": [
                {
                    "maxConnCount": "5",
                    "hostname": "db1-a.serv.com",
                    "compression": "true",
                    "database": "db1_a",
                    "charset": "utf8",
                    "port": "3306",
                    "client": "MYSQL"
                },
                {
                    "maxConnCount": "5",
                    "hostname": "db1-b.serv.com",
                    "compression": "true",
                    "database": "db1_b",
                    "charset": "utf8",
                    "port": "3306",
                    "client": "MYSQL"
                }
            ],
            "database": "db1",
            "hostname": "db1.serv.com",
            "port": "3306"
        }
    }
}
Jay Kominek
  • 8,674
  • 1
  • 34
  • 51
  • 2
    The use of ``@+:(`` (and then ``arg[0]``) in ``kvpair`` is unnecessary. By default, **Grako** will return a list of the parsed elements if no decorations are added to the rule. – Apalala May 31 '15 at 15:29
  • 1
    I'm not sure what the `@:(` stuff even does - I just saw it in an example; would you (or someone) mind giving me the tl;dr version of when to use it and when not to? – MrDuk May 31 '15 at 20:28
  • 2
    The default semantic action associated with every rule is to take all the values from the right hand side of the rule, stuff them in a list, and return that value. `@:` says that the semantic action should instead return the value of whatever the `@:` is attached to. For instance, on the string rule, it indicates that the value returned should just be the stuff between the quotes, instead of a list made up of a quote, the stuff between the quotes, and another quote. (So `'foo'` instead of `['"', 'foo', '"']`.) – Jay Kominek May 31 '15 at 21:32
  • Your answer works great for the example I gave, but not on the "real" file (which I'd condensed for brevity sake) -- any idea how I can find the mixup? I'd like to understand how your EBNF works in relation to the real data, rather than just asking for help every time. – MrDuk Jun 01 '15 at 16:05
  • Hmm, I don't see anything in that file that shouldn't parse, and it does parse for me. Old version of grako? Python file mode or unicode weirdness? You might try editing the file down to smaller chunks and see which chunk causes the problem. – Jay Kominek Jun 01 '15 at 16:24
  • As for how to learn how to produce the grammar from example data, I'm not really sure what to suggest, I've done it so many times I just... write it out. 95% of the time I spent on this was just learning how to use grako. I'll think about it, though, and get back to you. – Jay Kominek Jun 01 '15 at 16:26
  • Ah, I found the issue - it was something I'd removed (again). It doesn't handle blank entries, e.g.,: `loggingDir = "",` – MrDuk Jun 01 '15 at 19:35