Go to the first, previous, next, last section, table of contents.

Defining New Special Forms

Sometimes we want to write stereotyped code, not just stereotyped data structures. As with data, we sometimes want part of our stereotyped piece of code to vary. We can do this with syntax extensions, also known as macros.

(If you're familiar with macros from C, don't scoff. Macros in C are stunningly lame and hard to use compared to Lisp or Scheme macros. Read on to find out what you've been missing. If you're familiar with Lisp macros, but have never done advanced programming with them, you probably don't realize how powerful they are--Lisp macros are so error-prone that people often avoid them. Scheme macros are very powerful, but automate away some of the tricky parts.)

Macros are syntax extensions to a programming language, expressed as a translation of expressions. By writing a macro, what you're really doing is extending the functionality of the compiler or interpreter--you're telling it how to compile (or interpret) a new construct, by telling it how to rewrite that construct into something it already knows how to compile or interpret.

(Conceptually, defining a macro is extending the compiler--you're telling the parser how to recognize a new construct, to change the grammar of the language, and also specifying how to generate code for the new construct. This is something you can't do in most languages, but it's easy in Scheme.)

Scheme recognizes macro definitions, and then uses them to recognize and translate the new constructs into other constructs. The interpreter or compiler's process of translating a level constructs is often called "macro expansion," despite the fact that the resulting expression may not be bigger than the original expression. Macroexpansion can be recursive, because macros can use macros, and a macro can even use itself, like a recursive procedure.

Syntax extension is powerful, and hence somewhat dangerous when used too casually. Be aware that when you write a macro, you can change the syntax of your programming language, and that can be a bad thing--you and others may no longer be able to easily understand what the program does. Used judiciously, however, such syntactic extensions are often just what you need to simplify your programs. They are especially useful for writing programs that write programs, so that you can avoid a lot of tedious repetitive coding.

Macros are so useful that they're usually used in the implementation of Scheme itself. Most Scheme compilers actually understand only a few special forms, and the rest are written as macros.

In a later chapter, I'll describe some advanced uses of macros, which let your "roll your own" language with powerful new features.

Macros vs. Procedures

Why do we want macros? In Scheme, the main code abstraction mechanism is procedural abstraction, e.g. using define or lambda to write procedures that do stereotyped things. In a sense, we "specialize" procedures by passing them argument values--a procedure can do different things depending on the values it's given to work with. We can also "specialize" procedures by creating closures in different environments. Isn't this enough?

Not in general. While procedural abstraction is very powerful, there are times when we may want to write stereotyped routines that can't be written as procedures.

Suppose, for example, you have a Scheme system which gives you things like let and if, but not or. (Real Schemes all provide or, but pretend they don't. It makes a nice, simple example.)

You want an or construct (rather like the one actually built into Scheme). This or can take two arguments; it evaluates the first one and returns the result if it's a true value, otherwise it evaluates the second one and returns that result.

Notice that you can't write or as a procedure. If or were a procedure, both of its arguments would always be evaluated before the actual procedure call. Since or is only supposed to evaluate its second argument if the first one returns #f, it just wouldn't work.

If Scheme didn't have or, you could fake it at any given point in your program, by writing an equivalent let expression with an if statement in it.

For example, suppose you wanted to write the equivalent of (or (foo?) (bar?)).

As a first try, you might do this:

(if (foo?)
    (foo?)
    (bar?))

That is, test (foo?), and return its value if it's a true value. That's not really quite right though, because this if statement evaluates foo? twice: once to test it, and once to return it.

We really only want to evaluate it once--if (foo?) is an expression with side effects, evaluating it twice could make the program incorrect as well as inefficient.

To avoid this, we can always evaluate the first expression just once to get the value, and store that value in a temporary variable so that we can return it without evaluating it again:

You could instead write

(let ((temp (foo?)))
   (if temp
       temp
       (bar?)))

This let expression gives the same effect as (or (foo?) (bar?)), because it evaluates foo exactly once, and then tests the value; if the value is true, it returns that value. (The use of a let variable to stash the value allows us to test it without evaluating (foo?) again.) If the value is #f, it evaluates (bar?) and returns the result.

This is the transformation we'd like to be able to automate by defining or as a macro.

Here's a simple version of or written as a macro. I've called it or2 to distinguish it from Scheme's normal or.

(define-syntax or2
   (syntax-rules ()
      ((or2 a b)        ; pattern
       (let ((temp a)) ; template
          (if temp
              temp
              b)))))

What we're saying to Scheme is that we want to define the syntax of or by giving syntax rules for recognizing it and translating it. For this simple version of or, we only need one rule, which says to translate (or a b) into the desired let expression.

(or a b) is called the rule's pattern, which specifies what counts as a call to the or macro, and the let expression is the rule's template, which specifies the equivalent code that Scheme should translate calls into. The variables a and b are called pattern variables. They stand for the actual expressions passed as arguments to the macro. They are "matched" to the actual expressions when the pattern is recognized, and when the template is interpreted or compiled, the actual expressions are used where the pattern variables occur.

You can think of this in two steps. When a macro is used,

  1. the template is copied, except that the pattern variables are replaced with the macro's argument expressions,
  2. the result is interpreted (or compiled) in place of the call expression.

(It's really not quite this simple, but that's the basic idea.)

In some ways, macro arguments are a lot like procedure arguments, but in other ways they're very different. The pattern variables are not bound at run time, and don't refer to storage locations. They're only used in translating a macro call into the equivalent expression.

Always remember that arguments to a macro are expressions used in transforming the code, and then the code is executed. (For example, the output of the or macro doesn't contain a variable named a; a is just a shorthand for whatever expression is passed as an argument to the macro. In the example use (or (foo?) (bar?)), the expression (foo?) is what gets evaluated at the point where a is used in the macro body.)

This is why our macro has to use a temporary variable, like the hand-written equivalent of or. If we tried to write the macro like a procedure, without using a temporary variable, like this

(define-syntax or
   (syntax-rules ()
      ((or a b)
       (if a
           a
           b))))

then (or (foo?) (bar?)) would be translated into

(if (foo?)
    (foo?)
    (bar?))

As with the buggy handwritten version, (foo?) would be evaluated twice when this expression was evaluated.

(This is the most common mistake in writing macros--forgetting that while macros give you the ability to control when argument expressions are evaluated, they also require you to control it. It's safe to use a procedure argument multiple times, because that's just referring to a value in a run-time binding. Using a macro argument causes evaluation of the entire argument expression at that point.)

We can make a better or by using more rules. We might want or to work with any number of arguments, so that

  1. or of zero arguments returns #f, because it has zero true arguments,
  2. or of one argument is equivalent to that argument--it's true if and only if that argument is true.
  3. or of two or more arguments evaluates its first argument, and returns its value if it's true. Otherwise, it computes the or of the rest of its arguments, and returns its result.

Here's the Scheme definition with these three rules:

(define-syntax or
   (syntax-rules ()
      ((or)             ; OR of zero arguments
       #f)              ;   is always false
      ((or a)           ; OR of one argument
       a)               ;   is equivalent to the argument expression
      ((or a b c ...)   ; OR of two or more arguments 
       (let ((temp a))  ;   is the first or the OR of the rest
          (if temp           
              temp             
              (or b c ...)))))) 

Notice that this definition is recursive. (The third rule's template uses the or macro recursively.) If we hand or four arguments, like this: (or foo bar baz quux), it should be equivalent to (or foo (or bar (or baz (or quux)))).

Scheme will use recursion to translate the expression, one step at a time. When Scheme encounters a macro call, it transforms the call into the equivalent code, using the appropriate rule. It then interprets (or compiles) the resulting expression. If the result itself includes a macro call, then the interpreter (or compiler) calls itself recursively to translate that before evaluating it. For a correctly written macro, the recursive translation will eventually "bottom out" when no more macro calls result, and the code will be evaluated in the usual way.

(As I'll show later, this fits in very neatly with the interpreter's or compiler's recursive evaluation of expressions.)

This recursion is recursion in Scheme's transformation of the call expression into equivalent code--it doesn't mean that the resulting code is recursive. A Scheme compiler will do all of the recursive transformation at compile time, so there's no runtime overhead. Of course, the recursion has to terminate, or the compiler will not be able to finish the translation.

In this definition of or, the third rule contains the symbols c .... The Scheme identifier ... is treated specially, to help you write recursive rules. (In previous examples, I used ... as an ellipsis to stand for code I didn't want to write out, but here we're usuing the actual Scheme identifier ...; it's actually used in the Scheme code for macros.)

Scheme treats a pattern variable followed by ... as matching zero or more subexpressions. In this or macro, c ... matches all of the arguments after the first two.

Scheme matches (or foo bar baz quux) by the third rule, whose pattern (or a b c ...), because it has at least two arguments. In applying the rule, Scheme matches a to foo, b to bar, and c ... to the sequence of expressions baz bleen.

[ This is similar to how you use unquote-splicing inside backquote--you can splice a list into a list at the same level, rather than nesting it. ]

The result of processing this or is

(let ((temp foo))
      (if temp
          temp
          (or bar baz quux)))

Now Scheme evaluates this expression.

But there's another or in there--when Scheme gets to (or bar baz quux) it will match the third rule again, with a matched to bar, b matched to baz, and c ... being matched to just quux. The result of this macro-processing step is

(let ((temp foo))
   (if temp
       temp
       (let ((temp bar))
          (if temp
              temp
              (or baz quux)))))

And the new let expression is evaluated.

There or is again, so Scheme will treat (or baz quux) the same way, again using the third rule--this time matching a to baz, b to quux, and c ... to nothing at all, producing

(let ((temp foo))
   (if temp
       temp
       (let ((temp bar))
          (if temp
              temp
              (let ((temp baz)
                 (if temp
                     temp
                     (or quux))))))))

And this will be evaluated.

Now the resulting or matches the second rule in the or macro, because it has only one argument quux, which is matched to a. The whole translation is therefore:

(let ((temp foo))
   (if temp
       temp
       (let ((temp bar))
          (if temp
              temp
              (let ((temp baz)
                 (if temp
                     temp
                     quux)))))))

There are no more macro calls here, so the recursion terminates.

You might have noticed that the example translation of (or foo bar baz quux) has several different variables named temp in it. You might have wondered if this could cause problems--is there a potential for accidentally referring to the wrong variable in the wrong place in the code generated by a macro?

The answer no. Scheme's macro system actually does some magic to avoid this, which I'll discuss later in a later section. Scheme actually keeps track of which variables are introduced by different applications of macros, and keeps them distinct--the different variables named temp are treated as though they had different names, so that macros follow the same scope rules as procedures. (Scheme macros are said to be hygienic; what that really means is that they respect lexical scope.)

You can think of this as a renaming, as though Scheme had sneakily changed the names each time the macro was applied to transform the expression, and the result were

(let ((temp_1 foo))
   (if temp_1
       temp_1
       (let ((temp_2 bar))
          (if temp_2
              temp_2
              (let ((temp_3 baz)
                 (if temp_3
                     temp_3
                     quux)))))))

Scheme implements the same scoping rule for macros and their arguments as for procedures and their arguments. When you call a procedure, the argument expressions are evaluated at the call site, i.e., in the call site environment, and the values are passed to the procedure--the environment inside the called procedure doesn't affect the meaning of the argument expressions. Likewise

In writing macros like or, we want to control when and whether the arguments are evaluated, but otherwise we want them to mean the same thing they would if they were arguments to a procedure.

For example, suppose we call or with an argument expression that happens to use a name that's used inside or. or uses a local variable named temp, and we might just happen to pass it an expression using the name temp.

Consider the following procedure, which uses local variables perm and temp, and calls or in their scope.

(define (employee? person)
  (let ((perm (member person permanent-employees))
        (temp (member person temporary-employees)))
     (or perm temp))

If we translated the or macro naively, without worrying about accidental naming conflicts, we'd get this:

(define (employee? person)
  (let ((perm (member person permanent-employees))
        (temp (member person temporary-employees)))
     (let ((temp perm)
        (if temp
            temp
            temp)))

(This is not what R5RS Scheme macros do!)

Note what's wrong here. The name temp was passed into the macro from the call site, but it appeared in the body of the macro inside the let binding of temp. At the call site, it referred to the "outer" temp, but inside the macro, it turne out to refer to something else--in the process of moving the expression around, we accidentally changed its meaning.

Implementing More Scheme Special Forms

As examples of Scheme macros, I'll show how to implement several special forms in terms of lambda. This is how most real Scheme compilers work--the compiler itself only "understands" how to compile a few special forms directly, but the others can be defined as macros.

Traditionally, the compiler understands lambda, and all other binding forms are implemented in terms of lambda and procedure calling. The compiler must also understand a few other special forms, if, set!, quote, a simple version of define [ did I leave one out? ].)

let

Recall that in chapter [ whatever ], I explained how the semantics of let can be explained in terms of lambda. For any let expression, which binds variables and evaluates body expressions in that scope, there is an exactly equivalent expression using lambda and procedure calling. The lambda creates a procedure which will bind the variables as its argument variables, and execute the body of the let. This lambda is then used in a combination--calling the procedure makes it bind variables when it accepts arguments.

(define-syntax let ()
   (syntax-rules
     ((_ ((var value-expr) ...) body-expr ...)  ; pattern
      ((lambda (var ...)
          body-expr ...)
       (value-expr ...)))))

Here I've used an underscore to stand for the keyword let in the macro call pattern. This is allowable, and recommended, because it avoids having to write the keyword in several places. (If you had to write out the keyword in each pattern, it would make it more difficult and error-prone to change the name of a macro.)

I've also taken advantage of the fact that Scheme is pretty smart about patterns using the ... (ellipsis) symbol. The pattern has two ellipses. One matches any number of binding forms (variable names and initial value expressions); the other matches any number of body expressions.

The body expressions matched by body-expr ... are simply used in the body of the lambda expression.

The expressions matched by (var value-expr) ... are used differently, however--they are not simply substituted into the macro template. Instead, (var ...) is used to generate the argument list for the lambda, and value-expr ... is used to generate the list of initial expressions.

Scheme's macro system is smart enough to figure out what's going on. If the pattern contains an ellipsis following a compound form (like (var init-expr) ..., and the template uses one of the pattern variables from that compound form (followed by an ellipsis), then Scheme assumes you want the corresponding part of each form matched by the pattern form.

If we think of the expressions as s-expressions, we've matched a pattern that is one list of two-element lists, and restructured it into two separate lists of elements. (That is, we're going from a list of cars and cadrs to a list of cars and a list of cadrs.)

As an example of use, consider

(let ((var-a (some-procedure foo))
      (var-b (some-procedure bar)))
   (quux var-a)
   (quux var-b))

which translates to

((lambda (var-a var-b)
    (quux var-a)
    (quux var-b))
 (some-procedure foo)
 (some-procedure bar))

[ The following is out of place--here I should just be showing some uses of macros. The problem is that I don't want to lie and pretend that it's all very simple--Scheme does something sophisticated when you write binding contstructs as macros... This stuff will all be clearer after I've talked about hygiene problems with Lisp macros, and laziness and call-by-name... how to fwd ref gracefully? ] An extraordinarily astute and thoughtful reader might wonder if there's something wrong here. (Luckily, there's actually nothing to worry about.) Recall that when discussing or, I said that Scheme is careful to treat names introduced by a macro as though they were distinct, effectively renaming variables introduced in a macro. What about the argument variables to lambda in this example? One might think var-a and var-b would just be renamed and we'd get:

((lambda (var-a-1 var-b-1)
    (quux var-a)
    (quux var-b))
 (some-procedure foo)
 (some-procedure bar))

Clearly, this isn't what we want--we want var-a and var-b in the lambda body to refer to the variables introduced in by lambda---that's what it's for.

Scheme's macro processor is smart enough to infer that this is what you want. When you write a macro that accepts a name as an argument and binds it, Scheme assumes you're doing that for a good reason. If you then take another argument to the same macro and use it in the scope of that new variable, Scheme assumes you want occurrences of the name to refer to the new variable.

That is, Scheme uses an algorithm that checks what you do with names in forms that get passed as arguments into a macro. If you just use them in the normal ways, evaluating or assigning to them as variable names, Scheme assumes you mean to refer to whatever those names refer to at the call site of the macro. (That's normal lexical scope.) But if you take the name and use it as the name of a new variable, Scheme assumes you're defining a binding construct, and that any other arguments you put in that scope should see the new binding, instead of being scoped at the call site.)

Scheme can generally assume this, because if you're not implementing a scoping binding form (like let or do), there's no reason for a macro to accept a name as an argument and then turn around and bind it.

let*

Once we have let, we can implement let* in terms of that. We simply write a recursive macro that peels off one binding form at a time and generates a let, so that we get a nested set of lets that each bind one variable.

(define-syntax let* ()
   (syntax_rules
      ((_ () body-expr ...)
       (begin body-expr ...))
      ((_ ((var1 value-expr1)(var value-expr) ...)
       (let ((var1 value-expr))
          (_ ((var value-expr) ...)
             body-expr ...)))))

This macro uses two syntax rules. The first is the termination case for recursive macroexpansion. A let* that has an empty binding form (i.e., binds zero variables) should be translated into a begin; it will just sequence the body expressions.

The recursive rule says that a let* with one or more binding subforms should translate into a let that performs the first binding and another let* to bind the rest and evaluate the body. (Note that I've used the _ shorthand for let* in the recursive call, as well as in the pattern.)

As with let, Scheme recognizes this as a binding construct, and does the right thing--it notices that the var argument passed into the macro is used as a name of a new binding in the macro, so it assumes that the new binding should be visible to the body expressions.

cond

Discussion

Scheme macros also have several features I haven't demonstrated, to make it easier to write more sophisticated macros than or, and I'll demonstrate those later, too.

In the next section, though, I will discuss a different and simpler kind of macro system, which is not standard Scheme, and does have problems with variable names.

Lisp-style Macros

In this section, I'll discuss a simple kind of macro system that isn't standard in Scheme (and you might be able to skim this section without too much loss) but is interesting for several reasons.

Ultra-simple Lispish Macros

The classic macro system is the Lisp macro system, which allows the user to define an arbitrary Lisp procedure to rewrite a new construct. (Most dialects of Lisp, e.g., Common Lisp, have a macro facility of the same general kind, called defmacro.) We'll talk for a moment about a simplified version of Lisp-style macros. Later we'll explain why and how Scheme macros are better, at least for most purposes.

Suppose we have a macro system that we can use to tell the interpreter or compiler that when it sees an expression that's a list starting with a particular symbol, it should call a particular routine to rewrite that expression, and use the rewritten version in its place.

For the or example, we want to tell the compiler that if it sees an expression of the form (or a b) it should rewrite that into an expression of the form

(let ((temp a)
   (if temp
       temp
       b))

So now we want to tell the compiler how to rewrite expressions like that. Since Lisp expressions are represented as lists, we can use normal list operations to examine the expression and generate the new expression. Let's assume our system has a special form called define-rewriter that lets us specify a procedure of one argument to write a particular kind of expression.

Here's a rather ugly rewriter macro for or:

; OR with subtle scope bug
(define-rewriter 'or          ; tell compiler how to rewrite (or ...)
   (lambda (expr)
      (let ((a (cadr expr))              
            (b (caddr expr)))
         (cons 'let                                 ; make LET form
               (cons (list (list 'temp a)))         ; make let binding form
                     (append '(if temp temp)        ; make IF form
                             (list b))

There's actually a scoping problem with this macro, which I'll ignore for now--it's the problem that define-syntax fixes. Later, I'll show what's wrong and fix it, but for a while I just want to talk about basic syntax of Lisp-style macros.

Now when the interpreter or compiler is about to evaluate an expression represented as a list, it will check to see if it starts with or. If so, it will pass the expression to the above rewriter procedure, and interpret or compile the resulting list instead.

(Actually, macroexpansion doesn't have to happen just before interpreting or compiling a particular expression. The system might rewrite all of the macro calls in a whole procedure, or a whole program, before feeding the procedure or program to the normal part of the compiler. It's easier to understand macros if they're interleaved with expression evaluation or compilation, though--it's just an extra case in the main dispatch of your interpreter or compiler.)

Implementing define-rewriter is easy. (We'll show an implementation for our example interpreter in a later section.) We only need to do two simple things:

That's all.

The above system works, but it has several awkwardnesses. One is that it is tedious to write routines that construct s-expressions directly. We can use quasiquote to make this easier. It will allow us to simply write the s-expression we want the macro to produce, and use unquote to fill in the parts we get from the arguments to the macro.

; OR with subtle scope bug
(define-rewriter 'or       ; tell compiler how to rewrite (or ...)
   (lambda (expr)
      (let ((a (cadr expr))              
            (b (caddr expr)))
         `(let ((temp ,a)) ; return an s-expression of this form
             (if temp
                 temp
                 ,b))

This is much easier to read. The backquoted expression is now readable as code--it tells us the general structure of the code produced by the macro, and the commas indicate the parts that vary depending on the arguments passed to the macro.

Note that there is no magic here: define-rewriter and quasiquotation can be used independently, and are very different things. It just happens that quasiquotation is often very useful for the things you want to do in macros--returning an s-expression of a particular form.

This simple rewriting system is still rather tedious to use, for several of reasons. First, we always have to quote the name of the special form we're defining. Second, it's tedious to write a lambda every time. Third, it's tedious to always have to destructure the expression we're rewriting to get the parts we want to put into the expression we generate. ("Destructure" means take apart to get at the components--in this case, subexpressions.)

Better Lisp-style Macros

It would be nice if the macro facility allowed you to declare the argument pattern to the macro, and automatically destructured it for you. Most Lisp systems have a special form called defmacro that does this for you, and avoids the need to write a lambda expression every time. For consistency with Scheme naming conventions, we'll call our equivalent define-macro.

define-macro implicitly creates a transformation procedure whose body is the body of the define-macro form. It also implicitly destructures the expression to be transformed, and passes the subexpressions to the transformation procedure.

Using define-macro, we can write or this way, specifying that or takes two arguments:

; OR with subtle scope bug
(define-macro (or a b)
   `(let ((temp ,a))
       (if temp
           temp
           ,b))

We didn't have to write the code that destructures the form into a and b---define-macro did that for us. We also didn't have to explicitly write a lambda to generate the transformation procedure; define-macro did that too.

This makes the syntax of define-macro similar to a procedure-defining define form. Still, you should always remember that you're not writing a normal procedure: you're writing a procedure to transform code before it is interpreted or compiled. The combination of automatic argument destructuring and template-filling (using backwuote and comma) makes it easier to do this in many cases.

Like a procedure, a macro can take a variable number of arguments, with the non-required ones automatically packaged up into a rest list. We can define a variable-arity or with define-macro:

[ need to check this example--it's off the top of my head ]

; variable arity OR with subtle scope bug
(define-macro (or . args)
   (if (null? args)  ; zero arg or?
       #f
       (if (null? (cdr? arg-exprs)) ; one arg or?
           (car arg-exprs)          
           `(let ((temp ,(car arg-exprs)))
               (if temp
                   temp
                   (or ,@(cdr arg-exprs)))))))

Here we're just accepting the list of argument expressions to the or expression as the rest list args.

If it's an empty list, we return #f. Keep in mind that we're returning the #f object, which will be used in place of the or expression, i.e. as the literal #f to use in the resulting code. (Conceptually, it's a fragment of a program code, even though that program fragment will in fact return the value #f when it executes, because #f is self-evaluating. We could have quoted it to make that clearer.)

If it's a one-element list, we just return the code (s-expression) for the first argument.

If it's a list of more than one argument expression, we return an s-expression for the let with a nested if. (Note the use of unquote-splicing (,@) to splice the cdr of the expression list into the or form as its whole list of arguments.)

You should be aware, though, that what you're really doing is specifying a procedure for transforming expressions before they're compiled or interpreted, and that quasiquote is just syntactic sugar for procedural code that constructs an s-expression.

define-macro is easy to write, once we've got define-rewriter; we don't have to modify the interpreter or compiler at all. We just use define-rewriter to write define-macro as a simple macro. We'll make define-macro a macro that generates transformation procedures, and uses define-rewriter to register them with the interpreter.

Problems With Lisp-Style Macros

Earlier we alluded to a lurking bug in our define-rewriter and define-macro definitions for or.

Suppose we use the or macro this way--we check to see if someone is employed as either a permanent or temporary employee, and generate a w2 tax form if either of those is true.

  (let ((temp (member x temporary-employees))
        (perm (member x permanent-employees)))
     (if (or temp perm)
         (generate-w2 x)))

The expansion of this is:

(let ((temp (member x temporary-employees))
      (perm (member x permanent-employees)))
   (if (let ((temp temp))
          (if temp
              temp
              temp)))    ;BUG! (should refer to outer temp, not inner)
       (generate-w2 x)))

The problem here is that we happened to use the same name, temp, at both the call site and inside the macro definition. The reference to temp in (or temp perm) gets "captured" by the binding of temp introduced in the macro.

This occurs because a normal macro facility does not understand issues of name binding--the name temp refers to one program variable at the call site, and another at the site of its use inside the macro--and the macroexpander doesn't know the difference. To the macroexpansion mechanism, the symbol temp is just a symbol object, not a name of anything in particular, i.e., a particular program variable.

There are two ways to get around this problem. One is for the macro-writer to be very careful to use names that are very unlikely to conflict with other names. This makes code very ugly, because of the unnatural names given to variables, but more importantly, it's harder to get right than it may seem. The other way around the problem is to get a much smarter macro facility, like the new Scheme define-syntax macro system.

Ugly Hacks Around Name Conflicts

One way to avoid name conflicts is to pick names for variables used inside macros in such a way that they're unlikely to conflict with names that users of the macros might pick, e.g.,

(define-macro (or first-arg second-arg)
   `(let ((temp!in!or!macro ,first-arg)
       (if temp!in!or!macro
           temp!in!or!macro
           ,second-arg)))

It's unlikely that anyone will name a different variable temp!in!or!macro someplace else, so the problem is solved, right? Not necessarily.

Besides the fact that this is incredibly tacky, there's still a situation where this kind of solution is likely to fail--when people nest calls to the same macro. Each nested call will use the same name for different variables, and things can go nuts. (Food for thought: is this true of the or macro above? Does it nest properly?)

The standard hack around that problem is to have each use of the macro use a different name for each local variable that might get captured. This requires some extra machinery from the underlying system--there has to be a procedure that generates new, unique symbols, and which can be called by the macro code each time the macro is expanded. The traditional Lisp name for this procedure is gensym, but we'll call it generate-symbol for clarity.

Now we can write a fixed version of the two-argument OR macro.

; Version of 2-arg OR with scope bug fixed

(define-macro (or first-arg second-arg)
   (let ((temp-name (generate-symbol))) 
      `(let ((,temp-name ,first-arg)
          (if ,temp-name
              ,temp-name
              ,second-arg)))

Notice that the outer let is outside the backquote--it will be executed when the macro is used (i.e., once each time an or expression is rewritten; the quasiquoted part is the code to be interpreted or compiled (after the comma'd holes are filled in).

Each time a call to or is processed by the compiler (or interpreter), this let will generate a new symbol before translating it; quasiquote will fill in the holes for the new symbol. (Be sure to get your metalevels right here: temp-name is the name of a variable in the macro transformation procedure, whose binding will hold a pointer to the the actual name symbol that will be used for the variable.)

Isn't this ugly? To some degree, Lisp macros are nice because you can use the same language (Lisp) in macros as you can in normal code. But due to these funky scoping issues, you effectively end up having to learn a new language--one with lots of generate-symbol calls and commas.

On the other hand, maybe it builds character and abstract reasoning abilities, because you have to think a lot about names of names and things like that. Fun, maybe, but not for the faint of heart.

Implementing Simple Macros and Quasiquote

Implementing Simple Macros

Implementing quasiquote and unquote

[ This section is particularly rough and needs to be reworked. Sorry. ]

quasiquote is a special form that (like quote) has a very special sugared syntax. Part of this syntax is recognized by the reader, rather than the compiler or interpreter proper; the rest of the work is done by the compiler or interpreter.

Translating backquotes to quasiquote

A backquote character is interpreted very specially by the Scheme (or Lisp) reader, and backquoted expressions are converted into quasiquote expressions with a normal-looking nested-prefix-expression syntax. The expression `(a b c) is actually just shorthand for (quasiquote (a b c)) Similarly, comma'd expressions are converted, e.g. `(a ,b ,c) is read in as (quasiquote (a (unquote b) (unquote c))). Notice that as far as the reader is concerned, these are just lists--it is up to the compiler or interpreter to process them further, and the reader just preprocesses them into lists that the compiler or interpreter can deal with.

quasiquote

The quasiquote special form may be built into the compiler or interpreter, but it can be implemented as a macro, in Scheme. That's the easy way to do it, and it's what we'll do.

I'll show a simplified version of quasiquote that only deals with commas at the top level of a list, e.g.,

(quasiquote (foo ,bar (baz x y)))

but not

(quasiquote (foo ,bar (baz ,x y)))

Notice that (quasiquote (foo ,bar (baz x y))) should expand to something like

(list 'foo bar '(baz x y))

We'll actually generate an expression that uses cons instead of list, because we want to write quasiquote recursively; if its argument is a list, it will peel one element at a time of off the list of arguments, and either quote it or not before using it in the resulting expression that is the rewritten version of the macro call.

Given this strategy, (quasiquote (foo ,bar (baz x y))) should expand to

(cons 'foo
      (cons bar
            (cons '(baz x y))
                  '()))

Notice that what we've done is generate an expression to generate a list whose components are explicitly quoted where necessary, as opposed to the original backquoted list where things are quoted by default and explicitly unquoted. And since 'thing is just a shorthand for (quote thing), we'll really generate an ugly expression like

(cons (quote foo)
      (cons bar
            (cons (quote baz x y)
                  '())))

written as a straighforward low-level macro. We'll define it as a trivial macro that just calls a procedure quasiquote1 to do the actual transformation.

[ NEED TO DEBUG THIS... PRW ]

(define-macro (quasiquote expr)
   (quasiquote1 expr))
(define (quasiquote1 expr)
   (if (not (pair? expr)) ; if quoted expr is not a list, it's just
       expr               ; a literal
       ; else we'll grab a subexpression and cons it (appropriately
       ; quoted or not) onto the result of recursively quasiquoting
       ; the remaining arguments
       (let ((first-subexpr (car expr))
             (rest-subexprs (cdr expr)))
          (if (and (pair? next-subexpr)
                   (eq? (car first-subexpr) 'unquote)))
              (list 'cons
                    first-subexpr       ; gen expr to eval this subexpr
                    (quasiquote1 rest-subexprs))                     
              (list 'cons
                    (list 'quote first-subexpr)    ; quote this subexpr
                    (quasiquote1 rest-subexprs)))))

A full implementation of quasiquote is a little trickier, because it must deal with nested uses of quasiquote and unquote; each subexpression that is not unquoted must be traversed and treated similarly to the top-level list--i.e., rather than just using the subexpressions as literals and quoting them, an equivalent expression should be constructed to create a similarly-structured list with the unquoted holes filled in. Also, a full implementation should handle unquote-splicing as well as unquote.

define-rewriter

In Chapter [ whatever ], I showed the code for an interpretive evaluator that was designed to support macros. In this section, I'll explain how to implement the macro processor and install it in the interpreter.

Recall that when eval encounters an expression that's represented as a list, it must determine whether the list represents a combination (procedure call), a built-in special form, or a macro call. It calls eval-list to do this dispatching.

Also recall that we implemented environments that can hold different kinds of bindings--of normal variables or macros. A macro binding holds a transformation procedure that can be used to rewrite an expression before it is interpreted.

eval-list checks to see if the list begins with a symbol, which might be the name of a macro, or the name of a procedure. It looks in the environment to find the current binding of the symbol.

If it's a syntax (macro) binding, eval-list it extracts the transfromer procedure from the binding information, and calls eval-macro-call to evaluate the list expression.

Here's eval-macro-call:

(define (eval-macro-call transformer expr envt)
   (eval (transformer expr) envt))

All it does is apply the transformation procedure to the expression, and call eval recursively to evaluate the result.

This is sufficient to be able to use macros, once they're defined. We also need to be able to define macros, so that we can use them.

For that, we'll add one special form to our interpreter, define-rewriter, which takes a name symbol and a transformation procedure as its arguments.

[ Show define-rewriter ... has to accept a closure in our language, not the underlying Scheme ]

define-macro

Once we've added define-rewriter to our interpreter, we don't have to modify the interpreter at all to add define-macro. We can simply define it as a macro in the languauge we interpret, using define-rewriter "from the inside." We had to add define-rewriter to the language implementation itself, but once that's done, we can bootstrap a better macro system with no extra help from the interpreter.

define-macro does three things:

Bear in mind that the following code is not code in the interpreter, but code to be interpreted, to create a define-macro macro, from inside our language.

[ show define-macro ... pattern matching on arg form and creating a routine to destructure and bind... ]


Go to the first, previous, next, last section, table of contents.