Study guide for 386L (Advanced Programming Languages), Part I

---

 Copyright 1994 Paul R. Wilson

Until 1998, you may use these notes freely for noncommercial purposes, as
long as this notice is preserved in all copies.  After January 1, 1996,
you must obtain permission from the author to copy or redistribute.

---

*  "Object" can mean several things.  In general, data
objects include C structs, Pascal records, and so on.  In
the context of OBJECT-ORIENTED PROGRAMMING, "objects"
have both state (the data component in the more general
sense of "object") and BEHAVIOR, a procedural component.
(In Scheme, Lisp, Smalltalk, and Self, most objects are
composed of some number of VALUE CELLS, machine word-sized
units of storage that are tagged to say what they hold ---
immediate values, or pointers to objects on the heap.)


* First-class data objects are any objects (in the general,
not necessarily object-oriented, sense) that are represented
as data objects (with object identity) that you can
manipulate within a program.  They can be treated
as data in the sense of being passed as arguments to
(or returned as values from) procedures or methods.  They
can be stored into data structures such as lists, and so
on.  The essential property of first-class objects is
OBJECT IDENTITY, which means that there is a distinct
representation of each first-class object AT RUN TIME.
(In some strongly-typed languages, however, you can't
compare objects for identity if they are not of the same
type.)


*  Scheme procedure closures are first class, as are
cons cells (also known as dotted pairs, or just pairs),
vectors, symbols, characters, booleans, and numbers.

In Scheme, all of these things can be compared
for equality in the sense of OBJECT IDENTITY --- that is,
you can always tell one from another, even if they have
the same values for their components.  So for example, 
two Scheme cons cells can be compared for identity, and 
you can tell if they are the EXACT SAME OBJECT, even if 
all of their car and cdr fields have identical values. 
This is generally implemented as a comparison of pointer 
values---the addresses of the cons cells, not the contents 
of the cons cells.  Similarly, two closures of the same 
procedure can be compared for equality---even if they are 
closures of the same procedure, and even if they are closures 
of the same procedure in the same environment, each has
its own identity that can be compared against others.
Again this is generally done by comparing the addresses
of the two closures, not their structures.  (Note that
the structure of scheme closure is not defined in the
standard---about all that you can count on is its behavior
when you call it, its type (closure) and its object
identity.)

Note that object identity is usually compared by
comparing bit patterns stored in a value cell.  Immediate
types' representations are the bit patterns themselves,
while heap-allocated types are represented as tagged
pointers.  Either way, the bit pattern in the value
cell can be tested against another by simple bitwise
logical equality---there is exactly one representation
for the integer 1 (a bit pattern like 00000001, plus some
tag bits saying its an integer);  there is also exactly
one bit pattern representing any particular object
on the heap---a bit pattern representing the object's
address, plus a tag saying it's a pointer to the
object.  The Scheme procedure used to compare objects
in this way is called "eq?".


*  Note that in Scheme, symbols like "foo" and "bar"
can exist even if there are no variables by those names.
Symbols are first-class objects used as identifiers.


*  You should understand how expressions are evaluated
in Scheme.

An expression that consists of a variable name, not
in parentheses, is assumed to be a variable reference.
The current binding of that variable (in the environment
of execution---see below) is looked up, and its value is fetched.

An expression that consists of a number or other
"self-evaluating" object, simply evaluates to that
object.  So the value of the variable foo may
be 1, but the value of 1 is also 1.

An expression in parentheses is a macro expression,
a special form, or a procedure call.
 * If the first subexpression is the name of a macro,
   it is a macro expression.
 * If the first form is the name of a special form,
   it is a special form
 * otherwise, the first subexpression is assumed
   to evaluate to a procedure that can be executed.

Review your notes about evaluation and compilation.
Note that procedure closures are first class, which is
why the first subexpression of a procedure call can be
an arbitrary computation that returns a closure object.
Macros and special forms are NOT first-class.


* Macros and special forms are NOT first class, because
the compiler typically does all macroexpansion at compile
time.  All of the macros are expanded (recursively, if
necessary) so that the resulting expression is entirely
in terms of procedure calls and special forms before it
is actually compiled.  The compiler understands procedure
calling, and has specialized knowledge of each special
form.  Macros are expanded so that the compiler need
not know anything about them after the macroexpansion
phase.  That is, when you are defining a macro, you
are defining a transformation between the syntactic
forms the programmer will see (the macro calls) and
something the compiler understands very well --- procedure
calling and a handful of special operations.


* SPECIAL FORMS are required to limit evaluation of
subforms.  The special form "quote" does ONLY this---it
takes an argument, but the argument is not evaluated.
(quote foo) evaluates to the symbol foo, not the value
of the variable foo.  This can't be done with a simple
procedure, because normal procedures have their arguments
evaluated BEFORE the procedure is called.  The procedure
can't suppress the evaluation of a argument, because
argument evaluation happens BEFORE the procedure gets
control.

*  Other special forms you should know:

   (define <varname> <expression>)  ; varname is assumed to
                   ; be a symbol, which is NOT evaluated
                   ; evaluated.  A binding for a variable
                   ; by that name is in the current
                   ; environment, and intitialized with
                   ; the value of <expression>, which IS
                   ; evaluated normally.  (See also the
                   ; alternative define syntax described
                   ; later.
 
   (lambda (arg1...argn)   ; creates a procedure closure
        <body-expr1>       ; that will execute in the
         .                 ; environment where this lambda
         .                 ; expression is evaluated, i.e.,
         .                 ; lambda CAPTURES the CURRENT 
        <body-exprn>)      ; environment and pairs it with
                           ; a procedure that lives in
                           ; that environment.

   (set! <varname> <expr>)    ; NOTE that varname is not
           ; evaluated---we're not fetching the value from
           ; a variable binding, and operating on that,
           ; we're operating on the binding itself.  The
           ; second argument, <expr> IS evaluated NORMALLY.

   (if <expr>         ; <expr> is always evaluated
       <consequent>   ; this is evaluated if <expr> is false
       <alternative>) ; and this is evaluated otherwise

   (let ((var1 <init-expr1>)  ; init-exprs are evaluated in
          .                   ; ENCLOSING environment, then
          .                   ; variables are bound to fresh
          .                   ; value cells (extending the
         (varn <init-exprn>)) ; envt), and initialized.
       <body-expr1>       ; then body expressions are
         .                ; evaluated in sequence, and
         .                ; the value of the last body-expr
         .                ; is returned as the value of the
       <body-exprn>)      ; whole LET form.  (Note that the
                          ; new environment is then exited!)

   LETREC is like LET except for one detail.  With a LETREC,
   the new environment is created BEFORE the initialization
   expressions are evaluated, and the initialization
   expressions are evaluated IN THE NEW ENVIRONMENT, not
   in the enclosing environment.  This lets you define
   local variables whose values are procedures that can
   call each other, etc., e.g.,

    (letrec ((foo (lambda (x) ...(bar 1) ...)
             (bar (lambda (y) ...(foo 2) ...))
       (foo))

   In the above code, we use letrec instead of let, so that
   the lambdas will be evaluated in the new environment, and
   the procedure closures can see each other.  If we used
   LET instead, the call to bar from foo (and the call to
   foo from bar) would refer to the bindings of foo and bar
   in the enclosing environment.


*  A VARIABLE is an occurence in a program of a symbol
that is used to name something.  So in the fragment

(define (foo x y)
   (write (+ x y))
   (let ((x 1))
      (write (+ x y)))
   (let ((x 2))
      (write (+ x y)))


there are THREE DIFFERENT variables named x.  Note
that until the procedure foo is called, this does not
create any particular variable {\em bindings}.


*  A VARIABLE BINDING is an instance of a particular
variable, "binding" the variable name TO a particular
piece of memory (not a value!) in a particular BINDING
ENVIRONMENT.  When the procedure foo, above, is called,
a variable binding is created to hold the argument variable
x.  When the first let expression is evaluated, another
BINDING ENVIRONMENT is created, having a new binding of
the inner variable x.  When the second let is evaluated,
yet another binding environment is created, binding a different
variable x.

Note that each time one of the lets is evaluated (entered)
a NEW environment is created, but it is an EXTENSION of the
environment created by calling foo, binding the arguments
x and y.  Note that the expressions evaluated in these
environments see the new binding of x, which overrides
or SHADOWS the (argument) binding created by the
call to foo.  (Keep in mind that arguments are just
local variables that are implicitly initialized to
the values passed by a procedure call.)


*  Sometimes people use the word "variable" to mean
a particular variable binding, because it's easier
to talk about "this variable foo" than always be
saying "this variable binding of the variable foo
in this particular environment." You should get used to
this sloppy usage, but know what's going on.


*  Lambda creates a procedure, and thus may define
variables as arguments or other local variabls of
the procedure, but does NOT directly create any
BINDINGS.  The following form, when evaluated at
the top level, creates a variable binding (which binds
the variable "double."  It also evaluates the argument
expression (the lambda call) to create a procedure
closure (in the top-level environment).  The closure
is the initial value of the variable "double."

  (define double (lambda (x) (+ x x)))

(Another confusing terminological point is that some
people use the term "lambda" as a shorthand for "the
closure created by evaluating this lambda expression.")


*  Suppose we have evaluated the above form to define
a variable named double (whose value is a procedure
closure taking one argument...).  If we now evaluate
the form

   (double 2)

we are calling the procedure closure with an argument
of 2.  The first thing the procedure does is bind the
argument as the variable x, in a new binding environment,
and evaluate the expression (+ x x) in that environment.


*  Scheme provides an alternative syntax for defining
variables whose intial values are procedures.  Rather
than writing

   (define double (lambda (x) (+ x x)))

we can also write

   (define (double x) (+ x x)))

where the first argument looks like a list, not a symbol.
This reflects the syntax of a procedure call (e.g., the
above call "(double 2)").  NOTE THAT THE SEMANTICS OF
EITHER FORM IS EXACTLY THE SAME, AND THEY SHOULD BOTH
BE THOUGHT OF AS DEFINING A VARIABLE, WHOSE INITIAL
VALUE IS A PROCEDURE CLOSURE.  The explicit "lambda"
may be easier to understand, because it clearly shows the
closure creation.  The other form is merely for convenience.
(In particular, either way we've bound a variable named
double.  We can say (set! double 'foo) if we want, and
install a new value---in this case, the symbol foo---in
that variable binding.


*  Top-level variables can be confusing because the "define"
form creates a variable with exactly one instance---the
top-level variable by that name.  So saying

   (define baz 22)

defines a variable named baz, and BINDS it in the top level
environment---i.e., there is actually space allocated to
hold a value.

On the other hand, if we say

   (define (fubar)
      (let ((baz 6))
          ...))

or

   (define fubar
           (lambda () (let ((baz 6)) ...))

Then we've named a variable baz---the let variable, but it
isn't BOUND until we actually call fubar, and the let
is entered, creating a new binding (i.e., a new location
named baz) and initializing to the value 6.  Note that the
binding and initializing (of a new binding of baz) happens
at EACH call to fubar.


*  Internal defines (that is, scoped inside a variable
binding form like let or define) are treated specially.
In standard Scheme, you can't really define a new variable
in an arbitrary binding environment.  The language constrains
the way internal defines work, so that implementations can
actually transform them into letrec and lambda.  So, for example

  (define (foo x)
     (define (bar y) (some-computation y))
     (define (baz z) (some-other-computation z))
     (yet-another-computation (bar y) (baz z)))

can be transformed into
 
  (define (foo x)
     (letrec ((bar) (lambda (y) (some-computation y)))
             ((baz) (lambda (z) (some-computation z))))
        (yet-another-computation (bar 1) (baz 2))))

Top-level defines are "real", but inner defines are changed
(by macro expansion or whatever) into letrecs before
actually being compiled.  The restriction that supports
this is that all internal defines must come at
the beginning of a code sequence, before any actual
executable forms.


*  Evaluating a let expression creates a binding
environment (not a closure).  The expressions making
up the body of the let are evaluated in the new
environment, and the value of the last of those
exressions is returned.  So for example,

(let ((a 2) (b 3))
   (set! a 3)
   (* a b))

returns the integer 9.  Entering the let binds the variables
a and b to fresh locations, and initializes their values
to 2 and 3, respectively.  Then the two expressions are
evaluated in sequence.  The first, (set! a 3) assigns a
new value, 3, to a.  The second, multiplies a and b, and
returns the value 9.  This is the last expression in the
body of the let, so that value is also returned as the
value of the let expression as a whole.

NOTE THAT NO PROCEDURES OR CLOSURES ARE CREATED.  (But
see below).


*  Let CAN be implemented in terms of lambda (closure
creation) and procedure calling, because calling a procedure
can also bind arguments.  A particular implementation
{\em may} implement let as a macro that expands into a
procedure call to a new closure created by a lambda.  The
preceding example might be transformed into

   ((lambda (a b)
       (set! a 3)
       (* a b))
     2
     3))

NOTE THE PARENTHESES!  This is a three-element list-like
expression, whose first element is not a variable reference,
but is itself a (closure-creating) lambda expression.
Its second and third elements are the arguments 2 and 3.
Evaluating this  expression will evaluate the 3 subforms.
The lambda expression will be evaluated to make a BRAND NEW
closure of the procedure.  2 and 3 will evaluate to
themselves because numbers are self-evaluating.  Then
the new closure will be called, with the arguments 2 and 3.

THIS CALL implements the bindings of the variables
(as arguments), and executes the set! and * expressions.

NOTE THAT THIS IS NOT NECESSARILY HOW LET IS IMPLEMENTED,
and most implementations DON'T bother to create a
closure and call it.  It's just a demonstration of the
power of lambda and procedure calling, and a way of
showing the semantics of the constructs.  While it's
important to realize that lambda and procedure calling
are sufficiently powerful that you don't NEED let,
you should probably not generally think about let that
way.  In fact it's probably better to think of calling
a closure as "doing something like let" to bind its
arguments;  procedure calling is a very powerful thing
that includes the ability to bind variables---the
arguments to the procedure.

So when you see an expression like 

   (let ((foo 3))
      (write foo))

you should realize that a variable (foo) is bound, but
no closures are created.  In some implementations, a
closure MAY be created, but it's not one that matters
to the semantics of the program.


*  Variable bindings are NOT first class.  You can't store
a pointer to a variable (or a variable binding) in another
variable, even though you CAN store a symbol (which may or
may not name a variable or variables) in a variable.  Note
that a symbol is globally unique---there is only one symbol
foo, which has object identity.  There may be any number of
strings whose values are "foo"---strings composed of the
sequence of characters f, o, and o.

There may also be different variables named foo in different
scopes, and for each of those scopes, there may be several
actual bindings of the variables, due to multiple executions
of the same code.  So, for example, if a call to foo calls
itself recursively, and foo has a local variable x, there
will be a binding of x for each activation of foo.

Note that this corresponds to the notion of a LOGICAL
VARIABLE, even if the value of a particular binding of
x cannot vary, because the program has no side effects.
In logical terms, x is a program variable, which is
"variable" in the sense that different BINDINGS may
have different values, NOT in the sense that a given
binding's value may change over time!

It just so happens that in Scheme, the value stored in a
variable binding (that is, a memory location with a given
name in a given environment) CAN change over time, because
we have the special form set!.  But even in {\em functional
languages} (without side effects), we still talk about
variables, because the location or value referred to by x,
say, depends on which call to foo we're talking about.


*  Scheme procedures you should know (and you should know
why it's no problem that they're just plain procedures
not special forms).  THAT IS, SINCE THESE ARE ALL NORMAL
PROCEDURES, ALL OF THEIR ARGUMENTS ARE ALWAYS EVALUATED!

   (cons <car-val> <cdr-val>) ; creates a data object
                              ; of type cons-cell
                              ; with CAR field = <car-val>
                              ; and CDR field = <cdr-val>
     
   (car <cons-cell>)  ; returns value of CAR field

   (cdr <cons-cell>)  ; returns value of CDR field

   (list <elem1> <elem2> ... <elemn>)

   (map <proc> <list1> <list2> ... <listn>) ; applies
                     ; closure <proc> (of n arguments)
                     ; to elements of lists <list1> through
                     ; <listn>.  Returns a list of all of
                     ; the results of calls to proc.  (That
                     ; is, it MAPs the function over the
                     ; values of the argument lists, and
                     ; returns a list that is that function
                     ; of those values.

   (apply-to-all <proc> <list1> .... <listn>) ; like
                     ; map except that the procedure is
                     ; applied to the listed values for
                     ; side effects only, rather than
                     ; returning a list of values.

   (call-with-current-continuation <closure>)


------ LEXICAL SCOPE, CLOSURES, AND INDEFINITE EXTENT ------


In this section I'll be overly specific and talk about
particular compilation and code generation details
that are relevant to Scheme-48 and RScheme, but not necessarily
every scheme implementation.  Most of the details
are common to most Scheme compilers, and all Scheme
compilers have to do something to achieve the same
semantics.

Scheme is lexically scoped, meaning that the structure of
the binding environment that exists at runtime CORRESPONDS
to the TEXTUAL nesting of environment-creating forms.

Scheme has a notion of the CURRENT binding environment.
In  Scheme-48, binding environments are represented as hash
tables, if they are TOP LEVEL environments, or as chains of
BINDING ENVIRONMENT FRAMES, if they are environments nested
in other environments.  (In RScheme, the compiler can often
allocate variables in registers rather than in the heap, but
that's an optimization we'll generally ignore for simplicity
of explanation.

In Scheme-48, the current environment is represented as a
virtual machine register, REG_ENVT, which usually holds a
pointer to the relevant binding environment frame.
(Top-level environments are treated specially at compile
time, and don't need to be explicitly pointed to at run
time.)

When we create a procedure closure (using lambda or the
special syntax for define), the current binding environment
(at the point of closure creation) is captured and paired
with a procedure object to create the closure.  Suppose we
execute the following form at the top level:

(define my-counter
  (let ((count 0))
     (lambda ()
        (set! count (+ count 1))
        count)))

The define form creates a top-level variable named
my-counter, and evaluates the let form to determine its
initial value.  (Note: no procedures or closures have
been created yet!)

Evaluating the let form binds the variable count---that
is, it creates a NEW binding environment, binding the
variable count to a fresh value cell.  The new environment
is scoped inside the current (top level) environment, and
shares all of its variable bindings.  It also has the
new binding for count, which SHADOWS (overrides) any
previous bindings of variables named count.  (Note: we
STILL haven't created any procedures or closures yet,
just a binding environment!  Evaluating a let creates
ONE binding environment.)

After creating the new binding environment, let evaluates
the expressions in the let body---in this case there is
just one---in the newly-created environment.

In RScheme, this is done by SAVING the current value
of REG_ENVT, and (temporarily) setting REG_ENV to point
to the new environment frame.  Then the code for the two
body expression is evaluated, and the OLD value of
REG_ENVT is restored.

The value of the last let body expression is returned
as the value of the whole let expression.  In this case,
there's only one, so the lambda expression is evaluated
to return a closure, which is also returned as the value
of the let.  At the point where the closure is created,
the current environment is the new environment created
by the LET.  THIS is the environment that is paired
with the procedure representing the lambda.

At this point, we have created one new environment,
binding count, and one new procedure closure which
captures that environment.

Now if we evaluate this form at the top level:

   (my-counter)

we can execute the procedure closure, which will perform
the actions in the body of the lambda, IN THE ENVIRONMENT
we created with the LET.  This will increment the variable
count and return its value.

When the compiler generates code for the lambda body, it
precomputes some of the information necessary to look up
the location of the variable count.  At run time, rather
than actually looking through a list of name/binding
pairs, the compiled code can ASSUME that the environment
will have a particular SHAPE.

In this case, the compiler can tell that the lambda
body code will execute in an environment that has
a single binding environment frame (binding one variable,
count), scoped inside the top level environment we're
doing all this in.

Since this information is available at compile time, and
the corresponding code to fetch the variable can be
generated then, there is no need to SEARCH the environment
at RUNTIME.  The compiler can simply emit code that will
dereference the environment pointer (REG_ENV), with
an appropriate offset to get the right variable out
of the environment frame.

This technique is called LEXICAL ADDRESSING, and the
information the compiler uses is called a LEXICAL ADDRESS.

A LEXICAL ADDRESS consists of two parts: a RELATIVE
ENVIRONMENT FRAME NUMBER, and a SLOT NUMBER within
that environment frame.  In this case, the lexical
address of count is 0:0.  That is, the compiler knows
that the variable binding count will be in the runtime
environment in the first frame, and its location in
that frame will be the first one (i.e., offset 0.)


----------------- DYNAMIC AND STATIC TYPING ----------------


In Scheme and other Lisp dialects, variables do not have
types---VALUES do.  That is, a particular variable binding
may hold an integer at some times, a list at others, and 
a symbol or boolean at other times.  This is called
DYNAMIC typing, because the type of a variable is not
fixed.  This is true of Smalltalk and Self as well, and
also true of several languages we haven't covered yet,
including Icon.

A major criticism of dynamic typing is that programmers
often KNOW the types they intend for variables to hold,
but can't tell the implementation about it.  In STRONGLY
TYPED languages, variables (and their bindings) have
types, so the compiler can tell at compile time what
kinds of values are legal for those variables to hold.
This lets the compiler know what operations can be
applied to a given variable value at any point in the
code---e.g., it will catch TYPE ERRORS at compile time.
This avoids runtime errors where the program attempts
an operation that is illegal for the kind of value it's
operating on.


Note that STRONGLY TYPED languages are not necessarily
STATICALLY TYPED!  Strongly typed means (for this class,
anyway) that the compiler knows which operations are
supported by different variables' values, but it may
not know the exact type of the value at compile time.

So for example, in a language with SUBTYPING, it may
be possible to have a variable of type number, but
not know at compile time whether the number will be
an integer or a floating-point number.  The compiler
can then make sure that it's legal to multiply the
variable's value by another variable's value, but
will not know at compile time how that will be
accomplished---the variable may be STRONGLY but
still DYNAMICALLY typed.

STATIC typing is a stronger constraint, meaning that
the compiler must be able to tell the actual (concrete,
not abstract) type of a given value at any point
in the code.  This allows the compiler to emit the
appropriate code for any operation without inserting
any runtime checks.


------------------ OBJECT ORIENTATION ----------------------

* Object orientation has two important threads:

  * ENCAPSULATION which you should know from ABSTRACT DATA
    TYPES that support (abstract) operations, irrespective
    of how they're (concretely) implemented, and

  * INHERITANCE, the structured reuse of code,
    allowing implementations to be defined by specifying
    differences from pre-existing implementations  (NOTE:
    this is actually a controversial interpretation of
    inheritance!  In some systems, the stress is not on
    code reuse, but on strict SUBTYPING.)

  ENCAPSULATION supports programming by allowing multiple
  implementations of the same abstractions.  It avoids
  letting client code at the "innards" of objects.  This
  INFORMATION HIDING promotes flexibility, by allowing
  later decisions to change the way things are implemented
  WITHOUT BREAKING CODE THAT ALREADY WORKS.  Modular,
  "plug compatible code" is the goal here.

  INHERITANCE, when used as a CODE REUSE mechanism, supports
  programming by allowing similar classes to share code.
  Often, existing classes can be changed slightly by adding
  or altering a few functions, to make them suitable for
  new purposes.

  INHERITANCE, when used as a SUBTYPING mechanism, often
  allows programmers to reason about programs at a higher
  level of abstraction, depending only on the aspects of
  objects that matter in a particular context.  (Note that
  this is more closely related to ENCAPSULATION than to
  CODE REUSE.)


* ENCAPSULATION means that the state of a data object can
  only be accessed through a well-defined INTERFACE---
  a set of routines that can be called without depending
  on the internal representation of the object.  The
  interface of a routine consists of the MESSAGES it
  ACCEPTS, which is another way of saying what GENERIC
  OPERATIONS it supports.  In some systems, rather than
  talking about "message passing," we may talk about
  "virtual procedures."  A particular class of objects
  has a particular implementation for a given message
  (virtual procedure).  This class-specific implementation
  is known as a METHOD.


* For the purposes of this class, VIRTUAL PROCEDURES
  are the same thing as MESSAGES.  They define the
  INTERFACE of a CLASS, saying what GENERIC OPERATIONS
  are supported.


* A MESSAGE SEND is the same thing as a VIRTUAL PROCEDURE
  CALL.  The source code specifies an operation to be
  performed, but exactly how that is carried out depends
  on the type of the RECEIVER.


* THE RECEIVER OF A MESSAGE IS THE PARTICULAR OBJECT WE
  SEND THE MESSAGE TO (apply the virtual procedure to), NOT
  ITS CLASS.  The receiver is the particular instance.


* When we send a message to an object, the appropriate
  METHOD is looked up. (This may happen at compile time
  in some systems, at run time in others.)  This is known
  as MESSAGE DISPATCHING.  The system finds the appropriate
  METHOD for the class of the receiver, which is then
  executed.


* INHERITANCE for CODE REUSE means that subclasses are
  created so that they can reuse their parents'
  IMPLEMENTATION code, NOT NECESSARILY TO CREATE SUBTYPES.
  So for example, an arbitrary-precision integer class might
  be created as a subclass of LIST, even though the class
  describes objects that are conceptually NUMBERS, NOT
  LISTS!  This is common in dynamically typed object-
  oriented languages like Smalltalk and Self, as well as
  Lisps with object-oriented extensions.


* INHERITANCE as SUBTYPING is a more principled approach
  to subclassing, preserving the conceptual notion of
  real classes of objects.  If you believe inheritance
  should define true subTYPES, you would not make
  arbitrary-precision integers a subclass of list;  you
  would make them a subclass of integer, which in turn
  is a subclass of number, even if this meant you couldn't
  re-use any code, and had to write entirely new methods.
  On this view, the inheritance hierarchy should define
  strict set-theoretic classes---the inheritance links
  should literally mean IS-A: each arbitrary-precision
  integer IS an integer, and each integer IS a number.
  (Not a list!)


* Using inheritance for strict subTYPING means that the
  INTERFACES of different classes of objects (i.e., the
  messages they accept) have WELL-DEFINED relationships.
  In particular, there is a definite inclusion relationship
  between the interfaces of classes and those of their
  subclasses.  To say that one class, say arbitrary-
  precision-integer, is a subclass (subtype) of another,
  say, integer, means that the subclass supports AT
  LEAST the operations specified in the superclass.
  So if you can multiply and add integers, you must
  be able to multiply and add arbitrary-precision-integers.


* NOTE that the inclusion relationship between supertypes
  and subtypes (every element of the subtype is an element
  of the supertype) implies a reversed relationship
  between the interfaces of the two types.  The more
  specific (sub)type's interface is BIGGER than the
  supertypes, because the subtype supports AT LEAST
  the supertype's operations.  So you should remember that
  the SMALLER SET OF INSTANCES (the more specific type)
  is described by a LARGER set of criteria (the things 
  that make it specific).  This is just the normal kind
  of inclusion you expect from mathematical sets---the
  more criteria you have for set membership, the fewer
  things qualify as members of the set.


* Using inheritance for CODE REUSE makes no such guarantees!
  If arbitrary-precision-integer is defined as a subclass
  of list, so that it can share implementation details
  with lists, that DOESN'T guarantee it will make sense
  to apply list operations to arbitrary-precision integers!
  This has important consequences when attempting to
  define generic types, as described later.


* Suppose we have a class D-E-QUEUE that implements
  double-ended queues, supporting insertion and deletion
  at either end of a list.  So we have the operations
  push (to push something onto the front of the queue),
  pop  (to remove something from the front of the queue),
  add-last (to insert something at the end of the queue),
  remove-last (to delete at the end of the queue).

  If we're using inheritance for code reuse, and we need
  a STACK class, we could define it as a subclass of 
  D-E-QUEUE.  This is ideal, in that D-E-QUEUE already
  has the necessary functionality, and more.  We simply
  override the add-last and remove-last operations
  (either explicitly, if our language supports "does not
  implement") declarations, or implicitly by just not
  using those operations.

  The problem here, in terms of SUBTYPING, is that D-E-QUEUE
  is really a subclass of STACK, not the other way around.
  All D-E-QUEUES implement stack functionality, and NOT
  vice versa.  The subclass link does not respect the
  intuitive notion of set inclusion, because some instances
  of subclasses of D-E-QUEUE are in fact NOT double-ended
  queues.  You can't write code for the superclass and
  expect it to work with the subclass, because you can't
  assume that all of the operations of the superclass are
  supported by all of the subclasses.


----------------------- GENERICITY -------------------------


* GENERIC TYPES are really type CONSTRUCTORS, which define
  a family of types.  They are TEMPLATES for the 
  construction of types.  (For the purposes of this class,
  at least, GENERIC TYPES are the same thing as
  PARAMETERIZED TYPES.  A generic type is conceptually a
  function that takes a TYPE as a PARAMETER and returns a
  concrete type---i.e., it defines a function from simpler
  concrete types onto more complex concrete types.)


* Most conventional programming languages have a very
  limited form of generic typing.  There are a couple of
  built-in generic types, like ARRAY, and POINTER-TO, that
  allow you to construct specific types by specifying
  array-of-what, and pointer-to-what.  So, in Pascal, you
  can define a record type "myrec", and instantiate the
  generic type array to give the CONCRETE type ARRAY OF
  myrec.  Similarly, you can instantiate the generic
  type POINTER and declare the concrete type POINTER TO
  myrec.


* Some languages, like Eiffel, Ada, ML, and the new
  version of C++, allow the programmer to define NEW
  generic types, like QUEUE of T, where t can be any
  type, either built in or program-defined.


* It might be better to speak of generic CLASSES, rather
  than generic TYPES.  On the other hand, for compatibility
  with genericity, classes generally SHOULD correspond to
  types, and their subclasses should be strict SUBTYPES.
  This ensures SUBSTITUTABILITY---you can always substitute
  an instance of a subclass anywhere that an instance of
  the superclass is expected.
  
  Suppose we define a generic type QUEUE OF <number-type>,
  which is a queue whose elements can be of any one
  kind of number.  We can instantiate this class with
  any concrete number type, to get a QUEUE OF FLOAT,
  QUEUE OF INT, or whatever.

  If we've defined ARBITRARY PRECISION INTEGER to be a
  subclass of NUMBER, then this will work fine with
  queue of <number-type>.  The compiler can check that
  any type we use supports the operations required by
  the generic type.  So if we instantiate the generic
  queue-of-number to give us the concrete type
  queue-of-arbitrary-precision-integer, the compiler
  can tell that arbitrary-precision-integers satisfy
  all of the criteria of numbers.  This allows us
  to define an operation on any kind of queue of
  number, such as sum-of-elements, that DEPENDS on
  the queue elements having a certain interface---e.g.,
  supporting addition.

  But suppose we defined arbitrary-precision-integer
  as a subclass of LIST (for the purpose of code reuse,
  since arbitrary precision integers can be represented
  as lists of normal integers).  That is, suppose we
  DON'T ensure SUBSTITABILITY---conceptually, arbitrary
  precision integers are numbers, and shouldn't be
  substituted for LISTS, even though they're subclassed
  from LIST.

  Then the compiler doesn't know that arbitrary precision
  integers are NUMBERS and support the NUMBER interface,
  and are NOT conceptually LISTS.  Code that expects some
  kind of LIST may apply LIST operations to arbitrary
  precision integers, which will probably not make sense.
  At the very least, it will violate encapsulation,
  allowing code to depend on the list representation
  of arbitrary precision integers, as opposed to their
  interface (being a kind of number).


--------------------- POLYMORPHISM ------------------------


* POLYMORPHISM is a GENERAL TERM that ENCOMPASSES both
  subclassing and genericity, as well as dynamic typing.
  All it means is that you can write code that is general
  and applies to more than one kind of thing.

  (polymorphism =
     poly (many) + morph (shapes) + ism (you know, ism)).


* GENERICITY is STATIC (compile-time) polymorphism.  You
  write code for queues in general, but you instantiate
  that generic type to create queues of particular kinds
  of things.  So, for instance, in ML you can have lists
  of integers or lists of floats, but NOT a list of
  integers and floats.  The compiler must be able to 
  infer the concrete type of all of the list elements
  at compile time.


* DYNAMIC TYPING is RUNTIME POLYMORPHISM.  In Scheme or
  Smalltalk or Self, any variable can hold any type of
  value, so we can write code like this:

  (define foo #f)         ; foo holds a boolean
  (set! foo (cons 1 '())) ; now foo holds a cons cell
  (set! foo 6)            ; now foo holds an integer

  More importantly, it allows you to write routines
  like MAP and APPLY-TO-ALL that are VERY general.
  They can be used either to operate on HOMOGENEOUS
  COLLECTIONS (whose elements are all of the same
  type, like a list of integers), or on HETEROGENEOUS
  COLLECTIONS (whose elements are of different
  types, like a list of integers and floats). 


* Some languages (like Eiffel) allow combinations of
  static polymorphism and dynamic polymorphism.  They
  may CONSTRAIN polymorphism so that the compiler
  can check more things at compile time to avoid
  run-time errors.  So in Eiffel, you may have a method
  for summing a list of numbers, and the compiler
  will allow you to use it on lists of integers,
  or lists of floats, but NOT lists of ASCII characters.


* Eiffel generally assumes that when you define a subclass,
  you mean to define a subtype, and it will let you
  substitute an instance of the subtype where the program
  specifies the supertype.  Some languages directly
  support a DISTINCTION BETWEEN subclassing-as-SUBTYPING,
  and subclassing-for-CODE-REUSE.  Then the compiler
  can allow substitability for subtypes, but NOT for
  arbitrary subclasses, if they're not explicitly subtypes.

---------------

INTERACTIONS BETWEEN GENERICITY AND SUBTYPING

Suppose we have a type T and another type, S, which is a
subtype of T.  (For this example, S could be a subclass,
but we assume inclusion is preserved.)

Now suppose instantiate a generic type, say LIST-OF-<Param>
with type parameter T, to yield LIST-OF-T, and we instantiate
it again with type paramenter S, to yield LIST-OF-S.

What is the relationship between LIST-OF-T and LIST-OF-S?
Is the latter a subtype of the former?

NO!  LIST-OF-T has an interface that requires T's, where
LIST-OF-S has a more specific interface, which requires
S's, not just any old T.

So for example, if T is Number, and S is UnsignedShortInt,
LIST-OF-UnsignedShortInt is NOT a subclass of LIST-OF-Number,
because it doesn't support all of the operations that
LIST-OF-Number does, because the types of its arguments
are MORE RESTRICTIVE.  E.g., if I have a variable of type
LIST-OF-NUMBER, I assume I can put any kind of number
into the list.  But if I have a variable of type
LIST-OF-UnsignedShortInt, I can't put a negative long
integer, into it, or a floating-point value.

MORAL: subclassing (for subtyping) is very different from
instantiating a parameterized type.  Both of them "specialize"
existing code, but in very different senses of the word
"specialize".  A "specialized" subtype of a type has ADDED
functionality that lets it do MORE things---i.e., it has
some special functionality, but it still preserves its
inherited abilities.  A "specialized" instance of a parameterized
type has RESTRICTED functionality, that lets it do some particular
thing, not the more general thing.

Understanding this distinction is essential to using subtyping
and genericity appropriately.  For example, it's appropriate
to use LIST-OF-UnsignedLongInt when you know that for your
application it wouldn't make any sense to put a float or
negative int on the list in question.  If client code then
tries to use the list inappropriately by putting a float in
it, the compiler's type-checker will flag it as an error.

On the other hand, if it makes sense for the list to hold
any kind of number, you should use LIST-OF-NUMBER;  if some
other part of the code decides to put a float in the list,
the compiler will not scream, and it will compile in dynamic
dispatching to make sure that the correct operation is done
for either type at run time.

Note that if your subclassing violates substitutability,
the compiler's checking WILL BE WRONG and your quality of
life may deteriorate rapidly.  The compiler will not be
able to tell whether the types really are compatible, and
will either complain about things that DO work, or won't
complain about things that WON'T work.

Note that in C++ (as opposed to a well-designed type system),
compilers tend to err on the side of assuming you know what
you're doing.  This is because you're allowed to violate
inclusion willy-nilly, and the compiler is clueless what
it all means.

Keep in mind that subtyping defines a set inclusion
relationship, and parameterized typing works best when
it can count on that inclusion property---i.e., knowing
that anything on a LIST-OF-NUMBER will act like a number,
and that anything on a LIST-OF-UnsignedShortInt will not
only act like a number, but can be counted on to be
an instance of UnsignedShortInt (or a subtype thereof).

It's ESPECIALLY important to understand these things, and
use them properly, in a language like C++, where the compiler
isn't watching out for you.


------- PURE VS. HYBRID OBJECT-ORIENTED LANGUAGES----------

Most object-oriented languages are really HYBRID languages,
with a conventional procedural language at the bottom and
an object system grafted on top of that.

For example, in C++, objects may be composed of other
objects (i.e., have them as instance variables), but the
bottom level of these composed types consists of the same
old primitive types we had in C: ints, floats, character
pointers, and so on.  You can't generally define new methods
that operate appropriately on the primitive types.

CommonLisp with CLOS (the Common Lisp Object System) is the
same way---at bottom, you've still got fixnums, cons cells,
etc.

The advantage of hybrid languages is that it's easy to make
them pretty fast.  The leaves of the method-call graph
consist of nontrivial amounts of work done on built-in types
with no polymorphism.  (E.g., a "point" made of an int x and
an int y will operate on those instance variables without
overhead from method dispatching---the object can grab its
own internal values and crunch on them without going through
anything "object oriented".)

The problem with hybrid languages, besides being an obviously
ugly kludge, is that it doesn't have the flexibilty of dealing
with objects at the lowest level.  For example, if you want
to redefine the CAR operation and make it polymorphic, you
can't.  You can't define a new type of cons cell, and have
the old CAR operate on it appropriately.

In contrast, PURE object oriented languages treat everything
as an object in the object system, and all operations have
the semantics of message sends.  If you want to redefine
the "+" method for integers, and define a "+" method for
a user-defined number type, it all works together properly.
(Of course, your "+" may be slower than the usual one, and
the whole system may slow down...)

Smalltalk, Self, and Dylan are pure object-oriented languages,
i.e., they're "objects all the way down".

Self is especially pure because even instance variable lookup
has the semantics of a message send.  Conceptually, an object
sends itself a message to get the value of one of its own
fields---an object sees almost the same messaging interface
to itself that other objects see from the outside.  This
allows methods to be redefined in superclasses (e.g., computing
a value rather than simply looking at a field) without breaking
subclass code.


-------------------- GARBAGE COLLECTION -------------------


*  GARBAGE COLLECTION supports the ABSTRACTION OF INFINITE
   MEMORY, subject to certain constraints.  It allows
   programs to ALLOCATE UNLIMITED AMOUNTS OF DATA, as long
   as there's not too much data that's actually live
   at any given time.


*  GARBAGE COLLECTION supports the abstraction that OBJECTS
   NEVER DIE DURING THE EXECUTION OF THE PROGRAM;  that is,
   objects have INDEFINITE EXTENT---they live "as long as
   necessary."  All objects are kept around if they could
   possibly affect the course of the computation.  The
   abstraction, then, is that one has infinite memory, and
   all objects live indefinitely.  The actual reclamation
   of storage is AN IMPLEMENTATION-LEVEL OPTIMIZATION.


---------------------- PERSISTENCE ------------------------


*  Persistence means that data can survive ("persist")
   beyond the execution (and termination) of programs
   that create and operate on them.  In conventional
   systems, file data are persistent, but objects on
   the heap are not.  In a persistent object store, heap
   objects may be saved beyond program termination.
   We usually use the term persistence, however, when
   talking about objects with object identity, as opposed
   to plain boring file data.


*  Persistent object stores are different from file systems
   in that objects retain their IDENTITY over time.  Rather
   than writing out the contents of a record, and reading
   the data back into another record, you can simply
   "store" objects and get back the same individual objects.
   (Naturally, at the implementation level this may involve
   copying, but the abstraction of object identity is
   preserved.)


--------------------------------------------------------------------


ENVIRONMENTS AND CONTINUATIONS

In Scheme, unlike most block-structured languages (e.g., Pascal,
and C), binding environment frames are allocated separately from
activation records. (A particular Scheme compiler may allocate
some variable bindings in some activation records, but in the
general case binding environment frames are allocated in the
heap, separately from activation records.)

The Scheme equivalent of an activation record is called a
PARTIAL CONTINUATION.  This is not quite the same as the usual
notion of activation record, because it holds the state of
the CALLER, not the CALLEE.  That is, it is more like a SUSPENSION
record, holding the state of the procedure invocation that was
SUSPENDED when it called another procedure.  This suspension
record is called a (partial) CONTINUATION because it holds the
information necessary to CONTINUE (resume) the suspended procedure
(caller).

Scheme's equivalent of an ACTIVATION STACK is a FULL CONTINUATION,
which is simply a chain of PARTIAL CONTINUATIONS representing
the ``stack'' of suspended procedures waiting to be resumed.
These partial continuations are allocated in the HEAP in the
general case, and linked by pointers of some kind that act as
return addresses.  (Again, a given Scheme compiler may allocate
some continuations in a stack for efficiency, but this will not
work in the general case---in general, the Scheme ``stack''
(continuation chain) is NOT guaranteed to follow a true stack
discipline, because of CALL-WITH-CURRENT-CONTINUATION.


It is useful to consider a Scheme system to consist of a current
machine state, probably a small set of state registers, plus
a continuation chain on the heap.  The state registers include
the program counter, which points to the currently-executing
instruction, an environment register that points to the binding
environment frame that it's executing in, a continuation pointer
that points to the top partial continuation, and a value register
where procedures leave their values when they return.

A PARTIAL CONTINUATION holds several things.  In particular, it
saves the current ENVIRONMENT POINTER, so that when a procedure
is resumed, it can restore the appropriate execution environment.
It also saves the previous CONTINUATION;  that is, it has to know
how to continue if IT returns after it's resumed.  This is just
a pointer to a previously-saved partial continuation, i.e., its
caller's continuation.  Pushing a partial continuation involves
saving the current value of the continuation register as part
of the new partial continuation;  this creates a linked list
onto which partial continuations are pushed in a (usually)
stacklike manner.


A PROCEDURE RETURN pops the top (partial) continuation off of
the full continuation chain.  The old values of the execution
environment, program counter, and next continuation are stored
into the registers, and execution resumes.

Note that this popping of a (partial) continuation does NOT
overwrite the values in the (partial) continuation.  They're
still out there in the partial continuation object on the
heap.  The only ``side effects'' are to the basic state
registers.  The continuations on the heap thus continue
to exist unless and until they are garbage collected, and
their values never change---they are IMMUTABLE.  This will
turn out to be important when we discuss call-with-current-
continuation, below.

Try to visualize the continuation chain on the heap, and
the process of saving and restoring partial continuations.
Note that since the continuations remain on the heap and
are immutable, that means that the graph of continuations
is really a tree, like the dynamic call graph.  The current
``activation stack'' is really just a particular chain
from a record in this tree up to the root.  If we never
garbage collected them, we'd have a complete tree of all
the continuations we saved, and every state of the
continuation chain would still be there, as a path from
some partial continuation up to the root.  The root of
the tree would be an initial procedure invocation (e.g.,
the read-eval-print loop, which in turn calls each program).


------

SUBPROBLEMS vs. REDUCTIONS, and TAIL RECURSION

Scheme is one of the very few languages that is guaranteed to
be properly TAIL RECURSIVE.  This allows Scheme programmers
to implement loops as recursive procedure calls WITHOUT
the system using an indefinite amount of memory for
activation records.  TAIL RECURSION is not only important
for loops, however;  it is more general than that.

Consider the following procedure:

(define (foo)
    (bar)
    (baz)
    (quux))

Note that when we call bar, we must save the state of foo, so
that when bar returns, we can resume foo.  We save the state
by pushing a partial continuation onto the current continuation
chain, indicating that when we return, we should return to
the point where we call baz.  Similarly, when we call baz,
we must save the state of foo, indicating that when we return
to foo we resume execution at the call to quux.

NOTE THAT THIS IS A CALLER-SAVES CONVENTION. The caller saves
its own state, rather than assuming that the callee will save
it before clobbering it.  This is not as lazy as a callee-saves
convention, and in some cases is less efficient.  (A given
compiler can often use callee-saves techniques, but in the
GENERAL case, caller-saves is required.)

To understand why we need a caller-saves convention, consider
the call to quux above.  When we return from quux, the only
thing left to do is to return from foo.  There's nothing
interesting about the state of foo, so there's no point in
saving it.  Rather than pushing a continuation, calling quux,
having quux return, and then returning from foo, we can simply
skip the pushing of the continuation and the return from foo.
The call to quux will clobber the state of foo, but it won't
matter, because that state isn't going to be used again.
Then the return from quux will do nicely as a return from foo,
because we skipped pushing foo's continuation.

The calls to bar and baz are called SUBPROBLEMS, because they
solve part of the problem foo solves, on the way to a whole
solution.  But the call to quux is a REDUCTION, because the
problem solved by (foo) REDUCES TO the problem solved by
(quux), once we've gotten to that point in the computation.

NOTE THAT THIS IS A PERVASIVE ASPECT OF SCHEME.  It doesn't
just apply in simple cases like the one above.  It also
works for conditional constructs, even nested ones, like this:

(define (foo)
   (if (expr1)
       (if (expr2)
           (bar)
           (baz))
       (quux)))

Depending on the values of (expr1) and (expr2), we will execute
(bar), (baz), or (quux), but they're ALL reductions---whatever
value they return will be the value of the call to foo.  Scheme
implementations are required to implement these as tail-calls,
without pushing a partial continuation onto the continuation
chain.

Compilers for other languages MAY do this as an optimization,
e.g., the GNU C compiler often does this ``tail call optimization,''
but in Scheme it's NOT an optimization, it's part of the defined
semantics of the language.  The main reason it's required is so
that you can implement LOOPS as RECURSION.  In fact, the semantics
of Scheme's loop constructs are DEFINED AS SPECIAL CASES OF RECURSION.

Consider the following loop construction, called a NAMED LET.
(KNOW THIS SYNTAX!)

(let myloop ((i 0))               ; loop var. i initialized to 0
   (if (< 10)                     ;
       (begin (display i)         ; begin just sequences its args
              (myloop (+ i 1))))) ; tail call to loop

A named let sets up a loop variable (or several loop variables),
in this case i.  The variable i is initially bound to the value
of the expression 0, namely 0.  Then the loop body is executed.
If the name of the let is used as a procedure, and called, as in
the last line, the loop iterates.  The argument(s) passed by
this call become the value(s) of the loop variable(s).  (Note
that BEGIN just executes its subforms in sequence, like a
BEGIN...END block in Algol or {...} in C.  KNOW THIS SYNTAX.)

In this case, every time we iterate, we pass the next iteration
a new value for i, which is 1 greater than the current value
for i.  For the first 10 iterations, we'll print out the
value of i and iterate.  On the last iteration, the test will
fail and the loop will terminate.

Why the peculiar syntax?  The reason is that each iteration of
the loop is conceptually a PROCEDURE CALL, and the new value(s)
of the loop variable(s) are passed as arguments to the procedure.

This is NOT just a syntactic cuteness.  Scheme's loops are defined
to have exactly the same semantics as equivalent procedure calls.
For example, we could put (set! myproc myloop) and actually
capture the loop procedure in a variable, then call it from some
other context.

The above named let form is EXACTLY equivalent to the following:

(letrec ((myloop (lambda (i)
                    (if (< i 10)
                        (begin (display i)
                               (myloop (+ i 1)))))) ; iterate
    (myloop 0)) ; start loop, passing initial value as arg.

Recall that LETREC is a lot like LET, but the initialization
expression(s) are evaluated in the NEW environment created by
the form.  In this case, that's so that the lambda expression
can see the variable myloop, created by the letrec.  Since
myloop can see its own name, it can call itself.

What's going on here is that we're creating a variable named
myloop, and creating a procedure closure to put in it.  The
procedure closure will act as the body of the loop, accepting
a new value of the procedure variable i each time it's called.
We start of the loop by calling the procedure from the body
of the letrec, starting it off with the argument that we
get from the initial value clause of the named let.

NOTE THAT THIS DOES NOT INVOLVE ASSIGNMENT!  When we iterate,
we do a procedure call.  This creates NEW BINDINGS for the
loop variables.  It does NOT re-use the old bindings by
simply side-effecting them with the new values.  That's
because the semantics of named let are defined to be
precisely the same as those of a tail-calling letrec, as
above.  So, for example, the code for the loop body might
create closures that capture the loop variables.  Each
iteration would create a new set of loop variables,
capturable by any closure-creating operations (lambdas)
in the body.

(Again, smart compilers may do something different in
the usual case.  Even the RScheme compiler recognizes
simple loops, and smarter compilers can optimize away
most of the per-iteration environment creation.  But
in the general case, you're allowed to write code that
captures loop variable bindings, and in those cases
the compiler must guarantee that the bindings are there
to capture.)

One of the reasons that Scheme's iteration constructs are
defined this way is to make programs easier to reason
about---it is easier to prove characteristics of programs
that re-bind variables than those of programs that
side-effect variables.

The point of the above example is that the loop will use
bounded space, even if we changed the condition so that
it would loop infinitely.  We would create a binding
environment for each iteration, but only one will
ever be live, so all but one of them will be reclaimed
at any garbage collection.  They will NOT fill up the
with unreclaimable junk.  We will also NOT create
a huge number of unreclaimable partial continuations---
since we don't push a continuation before each call,
the continuation chain doesn't grow.  Iterating in this
way MAY cause garbage collections, but it WON'T cause
you to build up a huge amount of LIVE data, which
could eventually defeat the garbage collector and crash
the system.

------

CALL-WITH-CURRENT-CONTINUATION is a very powerful control
construct, which can be used to create more conventional
control constructs, like catch and throw in Lisp
(or setjmp and longjmp in C), or coroutines, and many
more.  It is extremely powerful because it allows a
program to manipulate its own control ``stack'' so
that procedure calls and returns needn't follow the
normal depth-first textual call ordering.

Recall that we said (above) that Scheme's equivalent
of an activation stack is really a chain of partial
continuations (suspension records), and this chain
is known as a full continuation.  And since continuations
are immutable, they usually form a tree reflecting
the call graph (actually, only the non-tail calls).
Normally, the parts of this tree that are not in the
current continuation chain are garbage, and can be
garbage collected.

If you take a pointer to the current continuation, and
put it in a live variable or data structure, however,
then that continuation chain will remain live and not
be garbage collected.  That is, you can ``capture''
the current state of the stack.

If you keep a captured state of the stack around, and
later install the pointer to it in the system's
continuation register, then you can return through
that continuation chain, just as though it were
a normal continuation.  That is, rather than returning
to your caller in the normal way, you can take some
old saved continuation and return into that instead!


You might wonder why anybody would want to do such a
weird thing to their ``stack,'' but there are some
useful applications.  One is COROUTINES.  It is often
convenient to structure a computation as an alternation
between two different processes, perhaps one that
produces items and another that consumes them.  It
may NOT be convenient to either of those processes
into a subroutine that can be called once to get
an item, because each process may have complex state
encoded into its control structure.

(You probably wouldn't want to have to structure your
program as a bunch of incremental operations that were
called by successive calls to a do-next-increment routine.
It may be that the program it gets its data from can't
easily be structured that way, either.  Each program
should probably be written with its own natural
control structure, each suspending its operation when
it needs the other to do its thing.)

COROUTINES allow you to structure cooperating subprograms
this way, without making one subservient to (and callable
piecemeal from) another.


Coroutines can be implmemented as operations on continuations.
When a coroutine wants to suspend itself and activate another
(co-)routine, it simply pushes a partial continuation to
save its state, then captures the value of the continuation
register in some way, so that it can be restored later.
To resume a suspended routine, the continuation chain is
restored and the top partial continuation is popped into
the system state registers.  Alternation between coroutines
is thus accomplished by saving and restoring the routines'
continuations.

Note that in this case, we can have TWO (or more) trees of
continuations that represent the course of the computation,
and that control flow can alternate back and forth between
trees.  Usually, computations are structured so that most
of the work is done in the usual depth-first procedure
calling and returning, with occasional jumps from one routine's
depth-first activity to another's.


Another use of continuations is to implement CATCH and THROW,
which are roughly like setjmp and longjmp in C.  The idea is
that you may want to abort a computation without going through
the normal nested procedure returns.  In a normal stack-based
lagnuage (without continuations), this is usually accomplished
by storing a pointer into the stack before starting the abortable
computation.  If it is necessary to abort the computation, all
of the activation records above the point of call can be ignored,
and the stack pointer can be restored to that point, just as
though all of the invocations above it had returned normally.

This can be implemented with call-with-current continuation
by saving the continuation at the point where an abortable
computation is begun.  Anywhere within that computation, that
continuation may be restored (clobbering the ``normal'' value
of the continuation register, etc.) to resume from that point.


But call-with-current-continuation is more powerful than
coroutines or catch and throw.  Not only can we escape ``downward''
from a computation (by popping multiple partial continuatons
at once without actually returning through them), we can also
escape ``upwards'' BACK INTO a computation that we bailed out
of before.  This can be useful in implementing EXCEPTION
HANDLING, where we may want to transfer control to a special
coroutine that may ``fix up'' an error that was detected,
but then resume the procedure that encountered the error,
after the problem is fixed.


Call-with-current-continuation can also be used to implement
BACKTRACKING, where the control flow backs up and re-executes
from some saved continuation.  In this case, we may save the
continuation for some computation, but go ahead and return
through it normally.  Later, we may restore the saved continuation
and return through it AGAIN.

Note that in general, continuations in Scheme can be used
MULTIPLE times.  The essential idea is that rather than using
a stack, which dictates a depth-first call graph, Scheme allows
you to view the call graph AS A GRAPH, which may contain cycles,
even directed cycles (which represent bactracking).


The SYNTAX of call-with-current-continuation is fairly ugly,
but for some good reasons; in its raw form, it is very
powerful, but correspondingly hard to use.  Typically, it
is encapsulated in macros or other procedures to implement
other, higher-level control constructs.

Call-with-current-continuation is itself a normal first-class
procedure, which encapsulates the very low-level continuation
munging abilities in something like a civilized package.
Since it's a first-class procedure, you can write higher-order
procedures that treat it like data, or call it, or both.

Call-with-current-continuation is a procedure of exactly one
argument, which is another procedure to execute after the
current continuation has been captured.  The current continuation
will be passed to that procedure, which can use it (or not)
as it pleases.

The captured continuation is itself packaged up as a procedure,
also of one argument.  That's so that you can't muck around
with the continuation itself in any data-structure-like way.
There are only two things you can do with captured
continuations---capture them and resume them.  Continuations
are captured by executing call-with-current-continuation,
which creates an ESCAPE PROCEDURE.  They are resumed by
EXECUTING the ESCAPE PROCEDURE.


Note that call-with-current-continuation is a procedure of
one argument.  We'll call that procedure the ABORTABLE
procedure.  The abortable procedure must ALSO be a procedure
of one argument.  (If you want to call a procedure that
takes a bunch of arguments, and still make it abortable
using call-with-current-continuation, you have to do a trick
I'll describe below.)

The abortable procedure's argument is the ESCAPE procedure that
encapsulates the captured continuation.

So call-with-current-continuation is a procedure of one argument,
which is some procedure we want to call.  The procedure we want
to call must take one argument, which is the ESCAPE procedure
that can resume the computation just beyond the call to
call-with-current-continuation.

As if this weren't cryptic enough, the escape procedure is ALSO
a procedure of exactly one argument.  When the escape procedure
is executed, it resumes execution of the saved continuation,
just beyond the point where call-with-current-continuation
was executed.  That call (to call-with-current-continuation)
may be expected to return a value, however.  The argument to
the escape procedure is the value that will be returned as
the value of the call.  Note that if the escape procedure is
NOT executed, and the abortable procedure returns normally,
the value it returns is returned as the value of the call
to call-with-current-continuation.

Consider the following example:

 0: (define some-flag #f)

 1: (define (my-abortable-proc escape-proc)
 2:   (display "in my-abortable-proc")
 3:   (if some-flag
 4:       (escape-proc "ABORTED"))
 5:   (display "still in my-abortable-proc")
 6:   "NOT ABORTED")


 7: (define (my-resumable-proc)
 8:   (do-something)
 9:   (display (call-with-current-continuation my-abortable-proc))
10:   (do-some-more))

11: (my-resumable-procedure)

At line 11, we call my-resumable-procedure.  It calls
do-something, and then calls display.  But before it calls
display it has to evaluate its argument, which is the call
to call-with-current-continuation.

Call-with-current-continuation saves the continuation at that
point, and packages it up as an escape procedure.  The escape
procedure, if executed will return its argument as the value
of the call-with-current-continuation form.

That is, if the escape procedure is called, it will resume
execution of the display procedure, which prints that value,
and then execution will continue, calling do-some-more.

Once call/cc has created the escape procedure, it calls its
argument, my-abortable-proc, with the escape procedure as
ITS argument.

My-abortable-proc then displays (prints) "in my-abortable-proc."
Then it checks some-flag, which is false, and does NOT execute
the consequent of the if---that is it doesn't execute the escape
procedure.  It continues executing, displaying "still in
my-abortable-proc."  It then evaluates its last body form
the string "NOT ABORTED," which evaluates to itself, and
returns that as the normal return value of the procedure call.

At this point, the value returned from my-abortable-proc
is printed by the call to display in line 9.


But suppose we set some-flag to #t, instead of #f.

Then when control reaches line 3, the if DOES evaluate
its consequent.  This calls the escape procedure, handing
it the string "ABORTED" as its argument.  The escape
procedure resumes the captured continuation, returning
control to the caller of call-with-current-continuation,
without executing lines 5 and 6.

The escape procedure returns its argument, the string
"ABORTED" as the value of the call-with-current-continuation
form.  It restores the execution of my-resumable-proc at
line 9, handing display the string "ABORTED" (as the
value of its argument form).  At this point "ABORTED"
is displayed, and execution continues at line 10.


Often we want to use call-with-current-continuation to
call some procedure that takes arguments other than
an escape procedure.  For example, we might have a
procedure that takes two arguments besides the
escape procedure, thus:

(define (foo x y escape)
    .
    .
    .
   (if (= x 0)
       (escape 'ERROR))
    .
    .
    .))

We can fix this by CURRYING the procedure, making it a procedure
of one argument.

Suppose we want to pass 0 and 1 as the values of x and y,
as well as handing foo the escape procedure.  Rather than saying

   (call-with-current-continuation foo)

which doesn't pass enough arguments to the call to foo, we say

   (call-with-current-continuation (lambda (escape) (foo 0 1 escape)))

The lambda expression creates a closure that does exactly what
we want.  It will call foo with arguments 0, 1, and the escape
procedure created by call-with-current-continuation.