2

I am researching to implement a DSL in Python and i am looking for a small DSL language that is friendly for someone who had no experience with designing and implementing languages. So far, i reviewed two implementations which are Hy and Mochi. Hy is actually a dialect of lisp and Mochi seems to be very similar to Elixir. Both are complex for me, right now as my aim is to prototype the language and play around with in in order to find if it really helps in solving the problem and fits to the style that problem requires or not. I am aware that Python has good support via language tools provided in Standard library. So far i implemented a dialect of lisp which is very simple indeed, i did not used python AST whatsoever and it is purely implemented via string processing which is absolutely not flexible for what i am looking for.

Are there any implementations rather than two languages mentioned above, small enough to be studied ?

What are some good books ( practical in a sense that does not only sticks to theoritical and academic aspect ) on this subject ?

What would be a good way into studying Python AST and using it ?

Are there any significant performance issues related to languages built upon Python ( like Hy ) in terms of being overhead on the actual produced bytecode ?

Thanks

fferri
  • 18,285
  • 5
  • 46
  • 95
mitghi
  • 889
  • 7
  • 20
  • Are you interested in a language that can be embedded into Python code (i.e. it is recognized by the Python parser) or you want to design a completely new syntax independently from the Python parser? – fferri Jun 26 '15 at 19:34
  • I want to design a new syntax. – mitghi Jun 26 '15 at 19:36
  • Well, then there is nothing specific to Python. Grab a parser (LL(k), LR(k), Yacc, whatever), write your grammar, and write the operational/transition semantics for it. – fferri Jun 26 '15 at 19:38
  • 1
    I added some details to my answer in case you may want to skip the syntax part and concentrate in the operational semantics part... – fferri Jun 26 '15 at 20:13

2 Answers2

7

You can split the task of creating a (yet another!) new language in at least two big steps:

  • Syntax
  • Semantics & Interpretation

Syntax

You need to define a grammar for your language, with production rules that specify how to create complex expressions from simple ones.

Example: syntax for LISP:

expression ::= atom   | list
atom       ::= number | symbol    
number     ::= [+-]?['0'-'9']+
symbol     ::= ['A'-'Z''a'-'z'].*
list       ::= '(' expression* ')'

How to read it: an expression is either an atom or a list; an atom is a number or a symbol; a number is... and so on.

Often you will define also some tokenization rules, because most grammars work at token level, and not at characters level.

Once you defined your grammar, you want a parser that, given a sentence (a program) is able to build the derivation tree, or the abstract syntax tree.

For example, for the expression x=f(y+1)+2, you want to obtain the tree:

abstract syntax tree for <code>x=f(y+1)+2</code>

There are several parsers (LL, LR, recursive descent, ...). You don't necessarily need to write your language parser by yourself, as there are tools that generate the parser from the grammar specification (LEX & YACC, Flex & Bison, JavaCC, ANTLR; also check this list of parsers available for Python).

If you want to skip the step of designing a new grammar, you may want to start from a simple one, like the grammar of LISP. There is even a LISP parser written in Python in the Pyperplan project. They use it for parsing PDDL, which is a domain specific language for planning that is based on LISP.

Useful readings:

Semantics & Interpretation

Once you have the abstract syntax tree of your program, you want to execute your program. There are several formalisms for specifying the "rules" to execute (pieces of) programs:

  • Operational semantics: a very popular one. It is classified in two categories:
    • Small Step Semantics: describe individual steps of computation
    • Big Step Semantics: describe the overall results of computation
  • Reduction semantics: a formalism based on lambda calculus
  • Transition semantics: if you look at your interpreter like a transition system, you can specify its semantics using transition semantics. This is especially useful for programs that do not terminate (i.e. run continuously), like controllers.

Useful readings:

Community
  • 1
  • 1
fferri
  • 18,285
  • 5
  • 46
  • 95
  • Thank you a lot, this is very helpful. I have the book Structure and interpretation of computer programs but skipped it after a chapter now there is a good reason to finish it. I think its a good start to re-implement Lisp interpreter with the tools you have mentioned. I used to implement it to query my machines and express the logic in which the data would transform. My goal is to target Python VM, yet the idea is still foggy how ast would transform into the bytecode without having runtime cost. I mean right now i am recursively mapping the tree into functions inside python global frame. – mitghi Jun 26 '15 at 20:37
  • Better to say, i should be able to compile it down somehow to python code so there would be no overhead. What would be your advice about this ? – mitghi Jun 26 '15 at 20:38
  • 1
    You can use Python's `ast` module. Transform your DSL AST into a Python `ast`'s AST, and `compile()` it. – fferri Jun 26 '15 at 20:40
  • 1
    I added "A Structural Approach to Operational Semantics" with PDF link, and I strongly recommend reading it before any other – fferri Jun 26 '15 at 22:30
6

You don't really need to know a lot about parsing to write your own language.

I wrote a library that lets you do just that very easily: https://github.com/erezsh/lark

Here's a blog post by me explaining how to use it to write your own language: http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/

I hope you don't mind my shameless plug, but it seems very relevant to your question.

Erez
  • 1,287
  • 12
  • 18
  • Hello, @Erez! Thanks for your library, but is there some docs, connecting with creating conditions? – Petr Petrov Mar 26 '18 at 13:37
  • There is a lot of documentation. There is also a full syntax for Python in the examples directory. Can you be more specific? – Erez Mar 27 '18 at 16:36
  • can you explain, what means ? and ! before declaring smth ?action_operator: ACTION_OPERATOR ACTION_OPERATOR : "<"|">"|"="|">="|"<="|"!=" and can you say how fix an error `FileNotFoundError: [Errno 2] "dot" not found in path.` using `pydot__tree_to_png(tree, "tree_1.png")` @Erez – Petr Petrov Mar 28 '18 at 09:12
  • I have written issue https://github.com/erezsh/lark/issues/118 , I'll be grateful if you answer me @Erez – Petr Petrov Mar 28 '18 at 13:13