1

Let's say i'd like to re-invent CoffeeScript :) Or Python. Or Stylus, or YAML :) I need some tool, which will turn my indentation-base syntax into abstract syntax tree. Google unfortunately knowns nothing about [indentation-based sytntax to AST]. Do you guys know any tool like this? To be more specific, what I have

===source===
Lorem ipsum:
    dolor sit amet:
        consectetuer adipiscing elit
    sed diam nonummy
nibh euismod tincidunt:
    ut laoreet dolore

...and what I need:

===result===
[
    {
        directive: "Lorem ipsum", 
        content: [
            {
                directive: "dolor sit amet", 
                content: [
                    {directive: "consectetuer adipiscing elit", content: []}
                ]
            },
            {directive: "sed diam nonummy", content: []}
         ]
     }, {
        directive: "nibh euismod tincidunt",
        content: [
            {directive:"ut laoreet dolore", content: []}
        ]
     }
]

It would be great, if you could recommend some tool like this. It would be awesome if this tool is written on python/javascript and display result as JSON. It would be also cool if you can give a piece of advice about how to create this tool-of-a-dream by myself :) Thanx!

Grundiss
  • 153
  • 9
  • NB. I can not use YAML parsers for my purposes as my "language" contains a lot of "forbidden" characters (!@#$%^&*) – Grundiss May 13 '14 at 20:15
  • Afaik there is no off-the-shelf tool for this (and even if there were, tool-requests are off-topic). It's fairly straight-forward to create a `flex` scanner which implements Python's INDENT/DEDENT rules (see the python language reference for the algorithm), particularly if you're actually looking for YAML which has no exceptions. (For Python, you have to turn the indent/dedent algo off inside of brackets of any kind.) – rici May 13 '14 at 20:18
  • This is sometimes called "offside rule". You may want to google that – salezica May 13 '14 at 20:47
  • possible duplicate of [How would you go about implementing off-side rule?](http://stackoverflow.com/questions/232682/how-would-you-go-about-implementing-off-side-rule) – Robᵩ May 13 '14 at 21:15
  • I spent a lot time on this topic, see here http://script.cirru.org/ (some examples are outdated), and here https://github.com/Cirru/cirru-script – jiyinyiyong Dec 16 '15 at 05:16

1 Answers1

1

It's simple enough to write this yourself using recursion. Here is one that creates a list -- I'll leave the dict version as an exercise for you.

import sys
import re

def DentArthurDent(fp, dents = 0, nextline = None):
    '''Read from FP until EOF or an exdent
       Return dict and next line'''

    tree = []
    while True:
        line, nextline = nextline or fp.readline(), None
        if not line:
            return tree, ''
        parts = re.match(r'(^ *)(.*)', line).group(1,2)
        dent = len(parts[0])
        if dent == dents:
            tree.append(parts[1])
        elif dent > dents:
            child_tree, nextline = DentArthurDent(fp, dent, line)
            tree.append(child_tree)
        else:
            return tree,line


import json
tree, _ = DentArthurDent(sys.stdin)
print json.dumps(tree, indent=4)

This input:

line 1
line 2
  line 3
    line 4
    line 5
  line 6

yields this output:

[
    "line 1", 
    "line 2", 
    [
        "line 3", 
        [
            "line 4", 
            "line 5"
        ], 
        "line 6"
    ]
]
Robᵩ
  • 163,533
  • 20
  • 239
  • 308