30

How to define a grammar (context-free) for a new programming language (imperative programming language) that you want to design from scratch.

In other words: How do you proceed when you want to create a new programming language from scratch.

Book Of Zeus
  • 49,509
  • 18
  • 174
  • 171
Ayoub M.
  • 4,690
  • 10
  • 42
  • 52
  • define for programming? for define for reference? – Baget Feb 23 '10 at 17:46
  • Perhaps if you gave us more information. You mention compiler as a tag. Are you writing a compiler for a new programming language? – tzenes Feb 23 '10 at 17:49

5 Answers5

35

One step at a time.

No seriously, start with expressions and operators, work upwards to statements, then to functions/classes etc. Keep a list of what punctuation is used for what.

In parallel define syntax for referring to variables, arrays, hashes, number literals, string literals, other builtin literal. Also in parallel define your data naming model and scoping rules.

To check whether your grammar makes sense focus on a level (literal/variable, operator, expression, statement, function etc) and make sure that punctuation and tokens from other levels interspersed or appended/prepended is not gonna cause an ambiguity.

Finally write it all out in EBNF and run it through ANTLR or similar.

Also best not to reinvent the wheel. I normally start off by choosing sequences to start and end statement blocks and functions, and mathematical operators, that are usually fundamentally C-like, ECMAScript-like, Basic-like, command-list based or XML-based. This helps a lot cos this is what people are used to working with.

Of course you have to come up with a pretty compelling reason not to abandon writing a new language and just stick with C, ECMAScript, or Basic which are well tested and much used.

I've often started defining new language only to find someone else has already implemented a feature somewhere in some existing language.

If your goal is speed of development for some specific project, you might be better off prototyping in something like Python, Lua or SpiderMonkey if you're looking to get up and running quickly and want to reduce the amount of typing necessary in most compiled languages.

martinr
  • 3,794
  • 2
  • 17
  • 15
  • Can you explain about 3rd step(`make sure that punctuation and tokens from other levels interspersed or appended/prepended is not gonna cause an ambiguity.`) how to achieve this ? Individual constructs works fine but when they are combined together then it leads to conflict . I am using `YACC/BISON` as compiler construction tool. – sonus21 Mar 28 '15 at 13:08
11

You'll want to have a look at EBNF (Extended Backus-Naur Form).

(Assuming you want to write a context free grammar, that is.)

Chris Tonkinson
  • 13,823
  • 14
  • 58
  • 90
  • EBNF is only useful for expressing a CFG, not in actually designing it. – David Kanarek Feb 23 '10 at 18:07
  • 4
    OP wants to define a grammar; a question asking how to implement it would surely involve a healthy dose of answers containing lex/yacc (or flex/bison) - in which case, yacc/bison syntax is just a stone's throw from EBNF. Additionally, implementing a language is not the same (on an academic or a practical level) as implementing, say, a linked list. There needs to be a strong theoretical foundation, or all those `shift/reduce` conflicts are going to be baffling. And EBNF is a great place to get your feet wet, IMHO. – Chris Tonkinson Feb 23 '10 at 18:17
3

If you mean defining a grammar, you would be best served by starting with an existing language and modifying its grammar to match what it is that you are after. Creating a grammar specification is a fairly mechanical exercise, using a set of patterns in your own head. For instance, what does an if statement look like? Does it look like C

if <- if(exp) block

if <- if(exp) block else block2

or like ML?

if <- if exp then block else block end

or maybe you want to use elseifs like Lua:

if <- if exp then exp end

if <- if exp then exp (elseif exp)* else exp end

The grammar and semantics codify these decisions. Note that none of these are quite suitable for implementation in a LALR or LL(*) compiler generator yet, and would have to be massaged for implementation because they are ambiguous.

Programming Language Pragmatics by Michael Scott is a good introduction to the design of programming languages. It's available on Amazon here

Joel
  • 5,618
  • 1
  • 20
  • 19
1

Have a look at Bison, maybe that's what you are looking for?

tanascius
  • 53,078
  • 22
  • 114
  • 136
0

You'll need to know quite a lot about programming languages before you start designing one. I recommend Programming Languages: Application and Interpretation by Shriram Krishnamurthi.

David Kanarek
  • 12,611
  • 5
  • 45
  • 62