Here is a toy macroexpander written in Racket which deals with CL-style macros. I have used Racket macros and other facilities in the process of writing this so it is not bootstrapped in terms of itself. Obviously such a thing can be done, but it would be significantly more hairy to do so.
The purpose of this is solely to demonstrate how a simple macroexpander might work: it is not in any sense meant to be something suitable for real use.
Special forms
First of all we need to deal with special forms. Special forms are things which have magic semantics. This expander has a very simple notion of how they work:
- a special form is a compound form whose first element is a special operator;
- each element of the rest of the form is either some kind of special thing which is not expanded, or it's expanded normally, which is done by saying
expr
in the definition;
- the way this is done is by a fairly mindless pattern matcher, which is probably adequate only because there are a minute number of special forms the expander knows about.
So here is how special forms are defined, and the definition of three of them:
(define special-patterns (make-hasheqv))
(define (special-pattern? op)
(and (symbol? op)
(hash-has-key? special-patterns op)))
(define (special-pattern op)
(hash-ref special-patterns op))
(define-syntax-rule (define-special-pattern (op spec ...))
(hash-set! special-patterns 'op '(op spec ...)))
(define-special-pattern (quote thing))
(define-special-pattern (lambda args expr ...))
(define-special-pattern (define thing expr ...))
(define-special-pattern (set! thing expr))
Now we can ask if something is a special form (special pattern in the code) and retrieve its pattern:
> (special-pattern? 'lambda)
#t
> (special-pattern 'lambda)
'(lambda args expr ...)
Note that things like if
are not special operators to the macro expander, even though they are in fact special: in a form like (if test then else)
all of the subforms should be expanded, so there's no reason for the macroexpander to know about them. It's only things like lambda
where some subforms should not be expanded that the macroexpander needs to know about.
Macro definitions
Macros are compound forms whose first element is recognised as naming a macro. For each such macro there is a macro expander function which is going to be responsible for expanding the form: that function is passed the whole form. There is a bit of syntax, which is define-macro
, which wraps this function in a similar way that defmacro
does in CL (but there is no &whole
support, or support for arglist destructuring or any of that).
(define macros (make-hasheqv))
(define (macro? op)
(and (symbol? op)
(hash-has-key? macros op)))
(define (macro op)
(hash-ref macros op))
(define-syntax-rule (define-macro (m arg ... . tail) form ...)
(hash-set! macros 'm (lambda (whole)
(apply (lambda (arg ... . tail) form ...)
(rest whole)))))
With this we can define a simple macro: here are four definitions for let
.
First of all here is the most rudimentary one: this doesn't even use define-macro
but is what it turns into: the outer function gets the whole form, and then calls the inner one on the bit of it which is not the macro name. The inner function then laboriously turns (let ((x y) ...) ...)
into ((lambda (x ...) ...) y ...)
, which is the right expansion for let
. (Note that none of this deals with CL's (let (x) ...)
).
(hash-set! macros 'let
;; this is what define-macro turns into
(lambda (whole)
(apply (lambda (bindings . body)
(cons (cons 'lambda
(cons (map first bindings) body))
(map second bindings)))
(rest whole))))
Now here is that but using define-macro
to reduce the pain:
(define-macro (let bindings . body)
;; Really primitive version
(cons (cons 'lambda (cons (map first bindings) body))
(map second bindings)))
And another version using list*
to make things a little less horrible:
(define-macro (let bindings . body)
;; without backquote, but usung list* to make it a bit
;; less painful
(list* (list* 'lambda (map first bindings) body)
(map second bindings)))
And finally a version using backquote (aka quasiquote).
(define-macro (let bindings . body)
;; with backquote
`((lambda ,(map first bindings) ,@body)
,@(map second bindings)))
Here is a version of a macro definition for prog1
which is broken due to hygiene failure:
(define-macro (prog1 form . forms)
;; Broken
`(let ([r ,form])
,@forms
r))
And here is how you need to write it to be more hygienic (although it is still unhygienic by Scheme's somewhat extreme standards):
(define-macro (prog1 form . forms)
;; Working
(let ([rn (string->uninterned-symbol "r")])
`(let ([,rn ,form])
,@forms
,rn)))
Note that this macro turns into another macro: it expands to let
: the expander needs to deal with this (and it does).
The macro expander
The macro expander consists of two functions: expand-macros
is the thing that actually does the expansion, and it dispatches to expand-special
for special forms.
Here is expand-macros
:
(define (expand-macros form)
;; expanding a form
(if (cons? form)
;; only compound forms are even considered
(let ([op (first form)])
(cond [(macro? op)
;; it's a macro: call the macro function & recurse on the result
(expand-macros ((macro op) form))]
[(special-pattern? op)
;; it's special: use the special expander
(expand-special form)]
[else
;; just expand every element.
(map expand-macros form)]))
form))
Notes about this:
- only compound forms can be macro forms;
- this is a lisp-1, so the cars of compound forms are evaluated completely normally and can be macro forms:
((let (...) ...) ...)
is fine;
- macros are expanded recursively until there is nothing left to do.
Here is expand-special
: this is much fiddlier than expand-macro
and probably buggy: what it's trying to do is match the definition of a special form against the form it's been given.
(define (expand-special form)
;; expand a special thing based on a pattern.
(match-let* ([(cons op body) form]
[(cons pop pbody) (special-pattern op)])
(unless (eqv? op pop)
(error 'expand-special "~s is not ~s" pop op))
(let pattern-loop ([accum (list op)]
[tail body]
[ptail pbody]
[context 'expr])
(cond [(null? tail)
(unless (or (null? ptail)
(eqv? (first ptail) '...))
(error 'expand-special "~s is not enough forms for ~s"
body op))
(reverse accum)]
[(null? ptail)
(error 'expand-special "~s is too many forms for ~s"
body op)]
[else
(match-let* ([(cons btf btr) tail]
[(cons ptf ptr) ptail]
[ellipsis? (eqv? ptf '...)]
[ctx (if ellipsis? context ptf)]
[ptt (if ellipsis? ptail ptr)])
(pattern-loop (cons (if (eqv? ctx 'expr)
(expand-macros btf)
btf)
accum)
btr ptt ctx))]))))
The fiddly bit here is the handling of ellipsis (...
) which are used in the matcher to indicate 'more stuff here': I can't remember whether it can deal with ellipsis which are not the last thing in a pattern but I suspect strongly not. Note that although the underlying macro system also uses ellipsis these are unrelated: this is just relying on the fact that ...
is a legal symbol name.
Note also that this recurses back into expand-macros
where needed, of course.
Given these definitions we can now expand some macros:
> (expand-macros '(let ((x y)) x))
'((lambda (x) x) y)
> (expand-macros '(prog1 a b))
'((lambda (r) b r) a)
Note that Racket's printer doesn't print uninterned specially but the r
above is uninterned.
With a simple tracing utility you can define a traced version of the macroexpander:
> (expand-macros '(let ([x 1]) (prog1 x (display "1"))))
[expand-macros (let ((x 1)) (prog1 x (display "1")))
[expand-macros ((lambda (x) (prog1 x (display "1"))) 1)
[expand-macros (lambda (x) (prog1 x (display "1")))
[expand-special (lambda (x) (prog1 x (display "1")))
[expand-macros (prog1 x (display "1"))
[expand-macros (let ((r x)) (display "1") r)
[expand-macros ((lambda (r) (display "1") r) x)
[expand-macros (lambda (r) (display "1") r)
[expand-special (lambda (r) (display "1") r)
[expand-macros (display "1")
[expand-macros display
-> display]
[expand-macros "1"
-> "1"]
-> (display "1")]
[expand-macros r
-> r]
-> (lambda (r) (display "1") r)]
-> (lambda (r) (display "1") r)]
[expand-macros x
-> x]
-> ((lambda (r) (display "1") r) x)]
-> ((lambda (r) (display "1") r) x)]
-> ((lambda (r) (display "1") r) x)]
-> (lambda (x) ((lambda (r) (display "1") r) x))]
-> (lambda (x) ((lambda (r) (display "1") r) x))]
[expand-macros 1
-> 1]
-> ((lambda (x) ((lambda (r) (display "1") r) x)) 1)]
-> ((lambda (x) ((lambda (r) (display "1") r) x)) 1)]
'((lambda (x) ((lambda (r) (display "1") r) x)) 1)
A version of this code is available here.