Study guide for 386L (Advanced Programming Languages), Part I --- Copyright 1994 Paul R. Wilson Until 1998, you may use these notes freely for noncommercial purposes, as long as this notice is preserved in all copies. After January 1, 1996, you must obtain permission from the author to copy or redistribute. --- * "Object" can mean several things. In general, data objects include C structs, Pascal records, and so on. In the context of OBJECT-ORIENTED PROGRAMMING, "objects" have both state (the data component in the more general sense of "object") and BEHAVIOR, a procedural component. (In Scheme, Lisp, Smalltalk, and Self, most objects are composed of some number of VALUE CELLS, machine word-sized units of storage that are tagged to say what they hold --- immediate values, or pointers to objects on the heap.) * First-class data objects are any objects (in the general, not necessarily object-oriented, sense) that are represented as data objects (with object identity) that you can manipulate within a program. They can be treated as data in the sense of being passed as arguments to (or returned as values from) procedures or methods. They can be stored into data structures such as lists, and so on. The essential property of first-class objects is OBJECT IDENTITY, which means that there is a distinct representation of each first-class object AT RUN TIME. (In some strongly-typed languages, however, you can't compare objects for identity if they are not of the same type.) * Scheme procedure closures are first class, as are cons cells (also known as dotted pairs, or just pairs), vectors, symbols, characters, booleans, and numbers. In Scheme, all of these things can be compared for equality in the sense of OBJECT IDENTITY --- that is, you can always tell one from another, even if they have the same values for their components. So for example, two Scheme cons cells can be compared for identity, and you can tell if they are the EXACT SAME OBJECT, even if all of their car and cdr fields have identical values. This is generally implemented as a comparison of pointer values---the addresses of the cons cells, not the contents of the cons cells. Similarly, two closures of the same procedure can be compared for equality---even if they are closures of the same procedure, and even if they are closures of the same procedure in the same environment, each has its own identity that can be compared against others. Again this is generally done by comparing the addresses of the two closures, not their structures. (Note that the structure of scheme closure is not defined in the standard---about all that you can count on is its behavior when you call it, its type (closure) and its object identity.) Note that object identity is usually compared by comparing bit patterns stored in a value cell. Immediate types' representations are the bit patterns themselves, while heap-allocated types are represented as tagged pointers. Either way, the bit pattern in the value cell can be tested against another by simple bitwise logical equality---there is exactly one representation for the integer 1 (a bit pattern like 00000001, plus some tag bits saying its an integer); there is also exactly one bit pattern representing any particular object on the heap---a bit pattern representing the object's address, plus a tag saying it's a pointer to the object. The Scheme procedure used to compare objects in this way is called "eq?". * Note that in Scheme, symbols like "foo" and "bar" can exist even if there are no variables by those names. Symbols are first-class objects used as identifiers. * You should understand how expressions are evaluated in Scheme. An expression that consists of a variable name, not in parentheses, is assumed to be a variable reference. The current binding of that variable (in the environment of execution---see below) is looked up, and its value is fetched. An expression that consists of a number or other "self-evaluating" object, simply evaluates to that object. So the value of the variable foo may be 1, but the value of 1 is also 1. An expression in parentheses is a macro expression, a special form, or a procedure call. * If the first subexpression is the name of a macro, it is a macro expression. * If the first form is the name of a special form, it is a special form * otherwise, the first subexpression is assumed to evaluate to a procedure that can be executed. Review your notes about evaluation and compilation. Note that procedure closures are first class, which is why the first subexpression of a procedure call can be an arbitrary computation that returns a closure object. Macros and special forms are NOT first-class. * Macros and special forms are NOT first class, because the compiler typically does all macroexpansion at compile time. All of the macros are expanded (recursively, if necessary) so that the resulting expression is entirely in terms of procedure calls and special forms before it is actually compiled. The compiler understands procedure calling, and has specialized knowledge of each special form. Macros are expanded so that the compiler need not know anything about them after the macroexpansion phase. That is, when you are defining a macro, you are defining a transformation between the syntactic forms the programmer will see (the macro calls) and something the compiler understands very well --- procedure calling and a handful of special operations. * SPECIAL FORMS are required to limit evaluation of subforms. The special form "quote" does ONLY this---it takes an argument, but the argument is not evaluated. (quote foo) evaluates to the symbol foo, not the value of the variable foo. This can't be done with a simple procedure, because normal procedures have their arguments evaluated BEFORE the procedure is called. The procedure can't suppress the evaluation of a argument, because argument evaluation happens BEFORE the procedure gets control. * Other special forms you should know: (define ) ; varname is assumed to ; be a symbol, which is NOT evaluated ; evaluated. A binding for a variable ; by that name is in the current ; environment, and intitialized with ; the value of , which IS ; evaluated normally. (See also the ; alternative define syntax described ; later. (lambda (arg1...argn) ; creates a procedure closure ; that will execute in the . ; environment where this lambda . ; expression is evaluated, i.e., . ; lambda CAPTURES the CURRENT ) ; environment and pairs it with ; a procedure that lives in ; that environment. (set! ) ; NOTE that varname is not ; evaluated---we're not fetching the value from ; a variable binding, and operating on that, ; we're operating on the binding itself. The ; second argument, IS evaluated NORMALLY. (if ; is always evaluated ; this is evaluated if is false ) ; and this is evaluated otherwise (let ((var1 ) ; init-exprs are evaluated in . ; ENCLOSING environment, then . ; variables are bound to fresh . ; value cells (extending the (varn )) ; envt), and initialized. ; then body expressions are . ; evaluated in sequence, and . ; the value of the last body-expr . ; is returned as the value of the ) ; whole LET form. (Note that the ; new environment is then exited!) LETREC is like LET except for one detail. With a LETREC, the new environment is created BEFORE the initialization expressions are evaluated, and the initialization expressions are evaluated IN THE NEW ENVIRONMENT, not in the enclosing environment. This lets you define local variables whose values are procedures that can call each other, etc., e.g., (letrec ((foo (lambda (x) ...(bar 1) ...) (bar (lambda (y) ...(foo 2) ...)) (foo)) In the above code, we use letrec instead of let, so that the lambdas will be evaluated in the new environment, and the procedure closures can see each other. If we used LET instead, the call to bar from foo (and the call to foo from bar) would refer to the bindings of foo and bar in the enclosing environment. * A VARIABLE is an occurence in a program of a symbol that is used to name something. So in the fragment (define (foo x y) (write (+ x y)) (let ((x 1)) (write (+ x y))) (let ((x 2)) (write (+ x y))) there are THREE DIFFERENT variables named x. Note that until the procedure foo is called, this does not create any particular variable {\em bindings}. * A VARIABLE BINDING is an instance of a particular variable, "binding" the variable name TO a particular piece of memory (not a value!) in a particular BINDING ENVIRONMENT. When the procedure foo, above, is called, a variable binding is created to hold the argument variable x. When the first let expression is evaluated, another BINDING ENVIRONMENT is created, having a new binding of the inner variable x. When the second let is evaluated, yet another binding environment is created, binding a different variable x. Note that each time one of the lets is evaluated (entered) a NEW environment is created, but it is an EXTENSION of the environment created by calling foo, binding the arguments x and y. Note that the expressions evaluated in these environments see the new binding of x, which overrides or SHADOWS the (argument) binding created by the call to foo. (Keep in mind that arguments are just local variables that are implicitly initialized to the values passed by a procedure call.) * Sometimes people use the word "variable" to mean a particular variable binding, because it's easier to talk about "this variable foo" than always be saying "this variable binding of the variable foo in this particular environment." You should get used to this sloppy usage, but know what's going on. * Lambda creates a procedure, and thus may define variables as arguments or other local variabls of the procedure, but does NOT directly create any BINDINGS. The following form, when evaluated at the top level, creates a variable binding (which binds the variable "double." It also evaluates the argument expression (the lambda call) to create a procedure closure (in the top-level environment). The closure is the initial value of the variable "double." (define double (lambda (x) (+ x x))) (Another confusing terminological point is that some people use the term "lambda" as a shorthand for "the closure created by evaluating this lambda expression.") * Suppose we have evaluated the above form to define a variable named double (whose value is a procedure closure taking one argument...). If we now evaluate the form (double 2) we are calling the procedure closure with an argument of 2. The first thing the procedure does is bind the argument as the variable x, in a new binding environment, and evaluate the expression (+ x x) in that environment. * Scheme provides an alternative syntax for defining variables whose intial values are procedures. Rather than writing (define double (lambda (x) (+ x x))) we can also write (define (double x) (+ x x))) where the first argument looks like a list, not a symbol. This reflects the syntax of a procedure call (e.g., the above call "(double 2)"). NOTE THAT THE SEMANTICS OF EITHER FORM IS EXACTLY THE SAME, AND THEY SHOULD BOTH BE THOUGHT OF AS DEFINING A VARIABLE, WHOSE INITIAL VALUE IS A PROCEDURE CLOSURE. The explicit "lambda" may be easier to understand, because it clearly shows the closure creation. The other form is merely for convenience. (In particular, either way we've bound a variable named double. We can say (set! double 'foo) if we want, and install a new value---in this case, the symbol foo---in that variable binding. * Top-level variables can be confusing because the "define" form creates a variable with exactly one instance---the top-level variable by that name. So saying (define baz 22) defines a variable named baz, and BINDS it in the top level environment---i.e., there is actually space allocated to hold a value. On the other hand, if we say (define (fubar) (let ((baz 6)) ...)) or (define fubar (lambda () (let ((baz 6)) ...)) Then we've named a variable baz---the let variable, but it isn't BOUND until we actually call fubar, and the let is entered, creating a new binding (i.e., a new location named baz) and initializing to the value 6. Note that the binding and initializing (of a new binding of baz) happens at EACH call to fubar. * Internal defines (that is, scoped inside a variable binding form like let or define) are treated specially. In standard Scheme, you can't really define a new variable in an arbitrary binding environment. The language constrains the way internal defines work, so that implementations can actually transform them into letrec and lambda. So, for example (define (foo x) (define (bar y) (some-computation y)) (define (baz z) (some-other-computation z)) (yet-another-computation (bar y) (baz z))) can be transformed into (define (foo x) (letrec ((bar) (lambda (y) (some-computation y))) ((baz) (lambda (z) (some-computation z)))) (yet-another-computation (bar 1) (baz 2)))) Top-level defines are "real", but inner defines are changed (by macro expansion or whatever) into letrecs before actually being compiled. The restriction that supports this is that all internal defines must come at the beginning of a code sequence, before any actual executable forms. * Evaluating a let expression creates a binding environment (not a closure). The expressions making up the body of the let are evaluated in the new environment, and the value of the last of those exressions is returned. So for example, (let ((a 2) (b 3)) (set! a 3) (* a b)) returns the integer 9. Entering the let binds the variables a and b to fresh locations, and initializes their values to 2 and 3, respectively. Then the two expressions are evaluated in sequence. The first, (set! a 3) assigns a new value, 3, to a. The second, multiplies a and b, and returns the value 9. This is the last expression in the body of the let, so that value is also returned as the value of the let expression as a whole. NOTE THAT NO PROCEDURES OR CLOSURES ARE CREATED. (But see below). * Let CAN be implemented in terms of lambda (closure creation) and procedure calling, because calling a procedure can also bind arguments. A particular implementation {\em may} implement let as a macro that expands into a procedure call to a new closure created by a lambda. The preceding example might be transformed into ((lambda (a b) (set! a 3) (* a b)) 2 3)) NOTE THE PARENTHESES! This is a three-element list-like expression, whose first element is not a variable reference, but is itself a (closure-creating) lambda expression. Its second and third elements are the arguments 2 and 3. Evaluating this expression will evaluate the 3 subforms. The lambda expression will be evaluated to make a BRAND NEW closure of the procedure. 2 and 3 will evaluate to themselves because numbers are self-evaluating. Then the new closure will be called, with the arguments 2 and 3. THIS CALL implements the bindings of the variables (as arguments), and executes the set! and * expressions. NOTE THAT THIS IS NOT NECESSARILY HOW LET IS IMPLEMENTED, and most implementations DON'T bother to create a closure and call it. It's just a demonstration of the power of lambda and procedure calling, and a way of showing the semantics of the constructs. While it's important to realize that lambda and procedure calling are sufficiently powerful that you don't NEED let, you should probably not generally think about let that way. In fact it's probably better to think of calling a closure as "doing something like let" to bind its arguments; procedure calling is a very powerful thing that includes the ability to bind variables---the arguments to the procedure. So when you see an expression like (let ((foo 3)) (write foo)) you should realize that a variable (foo) is bound, but no closures are created. In some implementations, a closure MAY be created, but it's not one that matters to the semantics of the program. * Variable bindings are NOT first class. You can't store a pointer to a variable (or a variable binding) in another variable, even though you CAN store a symbol (which may or may not name a variable or variables) in a variable. Note that a symbol is globally unique---there is only one symbol foo, which has object identity. There may be any number of strings whose values are "foo"---strings composed of the sequence of characters f, o, and o. There may also be different variables named foo in different scopes, and for each of those scopes, there may be several actual bindings of the variables, due to multiple executions of the same code. So, for example, if a call to foo calls itself recursively, and foo has a local variable x, there will be a binding of x for each activation of foo. Note that this corresponds to the notion of a LOGICAL VARIABLE, even if the value of a particular binding of x cannot vary, because the program has no side effects. In logical terms, x is a program variable, which is "variable" in the sense that different BINDINGS may have different values, NOT in the sense that a given binding's value may change over time! It just so happens that in Scheme, the value stored in a variable binding (that is, a memory location with a given name in a given environment) CAN change over time, because we have the special form set!. But even in {\em functional languages} (without side effects), we still talk about variables, because the location or value referred to by x, say, depends on which call to foo we're talking about. * Scheme procedures you should know (and you should know why it's no problem that they're just plain procedures not special forms). THAT IS, SINCE THESE ARE ALL NORMAL PROCEDURES, ALL OF THEIR ARGUMENTS ARE ALWAYS EVALUATED! (cons ) ; creates a data object ; of type cons-cell ; with CAR field = ; and CDR field = (car ) ; returns value of CAR field (cdr ) ; returns value of CDR field (list ... ) (map ... ) ; applies ; closure (of n arguments) ; to elements of lists through ; . Returns a list of all of ; the results of calls to proc. (That ; is, it MAPs the function over the ; values of the argument lists, and ; returns a list that is that function ; of those values. (apply-to-all .... ) ; like ; map except that the procedure is ; applied to the listed values for ; side effects only, rather than ; returning a list of values. (call-with-current-continuation ) ------ LEXICAL SCOPE, CLOSURES, AND INDEFINITE EXTENT ------ In this section I'll be overly specific and talk about particular compilation and code generation details that are relevant to Scheme-48 and RScheme, but not necessarily every scheme implementation. Most of the details are common to most Scheme compilers, and all Scheme compilers have to do something to achieve the same semantics. Scheme is lexically scoped, meaning that the structure of the binding environment that exists at runtime CORRESPONDS to the TEXTUAL nesting of environment-creating forms. Scheme has a notion of the CURRENT binding environment. In Scheme-48, binding environments are represented as hash tables, if they are TOP LEVEL environments, or as chains of BINDING ENVIRONMENT FRAMES, if they are environments nested in other environments. (In RScheme, the compiler can often allocate variables in registers rather than in the heap, but that's an optimization we'll generally ignore for simplicity of explanation. In Scheme-48, the current environment is represented as a virtual machine register, REG_ENVT, which usually holds a pointer to the relevant binding environment frame. (Top-level environments are treated specially at compile time, and don't need to be explicitly pointed to at run time.) When we create a procedure closure (using lambda or the special syntax for define), the current binding environment (at the point of closure creation) is captured and paired with a procedure object to create the closure. Suppose we execute the following form at the top level: (define my-counter (let ((count 0)) (lambda () (set! count (+ count 1)) count))) The define form creates a top-level variable named my-counter, and evaluates the let form to determine its initial value. (Note: no procedures or closures have been created yet!) Evaluating the let form binds the variable count---that is, it creates a NEW binding environment, binding the variable count to a fresh value cell. The new environment is scoped inside the current (top level) environment, and shares all of its variable bindings. It also has the new binding for count, which SHADOWS (overrides) any previous bindings of variables named count. (Note: we STILL haven't created any procedures or closures yet, just a binding environment! Evaluating a let creates ONE binding environment.) After creating the new binding environment, let evaluates the expressions in the let body---in this case there is just one---in the newly-created environment. In RScheme, this is done by SAVING the current value of REG_ENVT, and (temporarily) setting REG_ENV to point to the new environment frame. Then the code for the two body expression is evaluated, and the OLD value of REG_ENVT is restored. The value of the last let body expression is returned as the value of the whole let expression. In this case, there's only one, so the lambda expression is evaluated to return a closure, which is also returned as the value of the let. At the point where the closure is created, the current environment is the new environment created by the LET. THIS is the environment that is paired with the procedure representing the lambda. At this point, we have created one new environment, binding count, and one new procedure closure which captures that environment. Now if we evaluate this form at the top level: (my-counter) we can execute the procedure closure, which will perform the actions in the body of the lambda, IN THE ENVIRONMENT we created with the LET. This will increment the variable count and return its value. When the compiler generates code for the lambda body, it precomputes some of the information necessary to look up the location of the variable count. At run time, rather than actually looking through a list of name/binding pairs, the compiled code can ASSUME that the environment will have a particular SHAPE. In this case, the compiler can tell that the lambda body code will execute in an environment that has a single binding environment frame (binding one variable, count), scoped inside the top level environment we're doing all this in. Since this information is available at compile time, and the corresponding code to fetch the variable can be generated then, there is no need to SEARCH the environment at RUNTIME. The compiler can simply emit code that will dereference the environment pointer (REG_ENV), with an appropriate offset to get the right variable out of the environment frame. This technique is called LEXICAL ADDRESSING, and the information the compiler uses is called a LEXICAL ADDRESS. A LEXICAL ADDRESS consists of two parts: a RELATIVE ENVIRONMENT FRAME NUMBER, and a SLOT NUMBER within that environment frame. In this case, the lexical address of count is 0:0. That is, the compiler knows that the variable binding count will be in the runtime environment in the first frame, and its location in that frame will be the first one (i.e., offset 0.) ----------------- DYNAMIC AND STATIC TYPING ---------------- In Scheme and other Lisp dialects, variables do not have types---VALUES do. That is, a particular variable binding may hold an integer at some times, a list at others, and a symbol or boolean at other times. This is called DYNAMIC typing, because the type of a variable is not fixed. This is true of Smalltalk and Self as well, and also true of several languages we haven't covered yet, including Icon. A major criticism of dynamic typing is that programmers often KNOW the types they intend for variables to hold, but can't tell the implementation about it. In STRONGLY TYPED languages, variables (and their bindings) have types, so the compiler can tell at compile time what kinds of values are legal for those variables to hold. This lets the compiler know what operations can be applied to a given variable value at any point in the code---e.g., it will catch TYPE ERRORS at compile time. This avoids runtime errors where the program attempts an operation that is illegal for the kind of value it's operating on. Note that STRONGLY TYPED languages are not necessarily STATICALLY TYPED! Strongly typed means (for this class, anyway) that the compiler knows which operations are supported by different variables' values, but it may not know the exact type of the value at compile time. So for example, in a language with SUBTYPING, it may be possible to have a variable of type number, but not know at compile time whether the number will be an integer or a floating-point number. The compiler can then make sure that it's legal to multiply the variable's value by another variable's value, but will not know at compile time how that will be accomplished---the variable may be STRONGLY but still DYNAMICALLY typed. STATIC typing is a stronger constraint, meaning that the compiler must be able to tell the actual (concrete, not abstract) type of a given value at any point in the code. This allows the compiler to emit the appropriate code for any operation without inserting any runtime checks. ------------------ OBJECT ORIENTATION ---------------------- * Object orientation has two important threads: * ENCAPSULATION which you should know from ABSTRACT DATA TYPES that support (abstract) operations, irrespective of how they're (concretely) implemented, and * INHERITANCE, the structured reuse of code, allowing implementations to be defined by specifying differences from pre-existing implementations (NOTE: this is actually a controversial interpretation of inheritance! In some systems, the stress is not on code reuse, but on strict SUBTYPING.) ENCAPSULATION supports programming by allowing multiple implementations of the same abstractions. It avoids letting client code at the "innards" of objects. This INFORMATION HIDING promotes flexibility, by allowing later decisions to change the way things are implemented WITHOUT BREAKING CODE THAT ALREADY WORKS. Modular, "plug compatible code" is the goal here. INHERITANCE, when used as a CODE REUSE mechanism, supports programming by allowing similar classes to share code. Often, existing classes can be changed slightly by adding or altering a few functions, to make them suitable for new purposes. INHERITANCE, when used as a SUBTYPING mechanism, often allows programmers to reason about programs at a higher level of abstraction, depending only on the aspects of objects that matter in a particular context. (Note that this is more closely related to ENCAPSULATION than to CODE REUSE.) * ENCAPSULATION means that the state of a data object can only be accessed through a well-defined INTERFACE--- a set of routines that can be called without depending on the internal representation of the object. The interface of a routine consists of the MESSAGES it ACCEPTS, which is another way of saying what GENERIC OPERATIONS it supports. In some systems, rather than talking about "message passing," we may talk about "virtual procedures." A particular class of objects has a particular implementation for a given message (virtual procedure). This class-specific implementation is known as a METHOD. * For the purposes of this class, VIRTUAL PROCEDURES are the same thing as MESSAGES. They define the INTERFACE of a CLASS, saying what GENERIC OPERATIONS are supported. * A MESSAGE SEND is the same thing as a VIRTUAL PROCEDURE CALL. The source code specifies an operation to be performed, but exactly how that is carried out depends on the type of the RECEIVER. * THE RECEIVER OF A MESSAGE IS THE PARTICULAR OBJECT WE SEND THE MESSAGE TO (apply the virtual procedure to), NOT ITS CLASS. The receiver is the particular instance. * When we send a message to an object, the appropriate METHOD is looked up. (This may happen at compile time in some systems, at run time in others.) This is known as MESSAGE DISPATCHING. The system finds the appropriate METHOD for the class of the receiver, which is then executed. * INHERITANCE for CODE REUSE means that subclasses are created so that they can reuse their parents' IMPLEMENTATION code, NOT NECESSARILY TO CREATE SUBTYPES. So for example, an arbitrary-precision integer class might be created as a subclass of LIST, even though the class describes objects that are conceptually NUMBERS, NOT LISTS! This is common in dynamically typed object- oriented languages like Smalltalk and Self, as well as Lisps with object-oriented extensions. * INHERITANCE as SUBTYPING is a more principled approach to subclassing, preserving the conceptual notion of real classes of objects. If you believe inheritance should define true subTYPES, you would not make arbitrary-precision integers a subclass of list; you would make them a subclass of integer, which in turn is a subclass of number, even if this meant you couldn't re-use any code, and had to write entirely new methods. On this view, the inheritance hierarchy should define strict set-theoretic classes---the inheritance links should literally mean IS-A: each arbitrary-precision integer IS an integer, and each integer IS a number. (Not a list!) * Using inheritance for strict subTYPING means that the INTERFACES of different classes of objects (i.e., the messages they accept) have WELL-DEFINED relationships. In particular, there is a definite inclusion relationship between the interfaces of classes and those of their subclasses. To say that one class, say arbitrary- precision-integer, is a subclass (subtype) of another, say, integer, means that the subclass supports AT LEAST the operations specified in the superclass. So if you can multiply and add integers, you must be able to multiply and add arbitrary-precision-integers. * NOTE that the inclusion relationship between supertypes and subtypes (every element of the subtype is an element of the supertype) implies a reversed relationship between the interfaces of the two types. The more specific (sub)type's interface is BIGGER than the supertypes, because the subtype supports AT LEAST the supertype's operations. So you should remember that the SMALLER SET OF INSTANCES (the more specific type) is described by a LARGER set of criteria (the things that make it specific). This is just the normal kind of inclusion you expect from mathematical sets---the more criteria you have for set membership, the fewer things qualify as members of the set. * Using inheritance for CODE REUSE makes no such guarantees! If arbitrary-precision-integer is defined as a subclass of list, so that it can share implementation details with lists, that DOESN'T guarantee it will make sense to apply list operations to arbitrary-precision integers! This has important consequences when attempting to define generic types, as described later. * Suppose we have a class D-E-QUEUE that implements double-ended queues, supporting insertion and deletion at either end of a list. So we have the operations push (to push something onto the front of the queue), pop (to remove something from the front of the queue), add-last (to insert something at the end of the queue), remove-last (to delete at the end of the queue). If we're using inheritance for code reuse, and we need a STACK class, we could define it as a subclass of D-E-QUEUE. This is ideal, in that D-E-QUEUE already has the necessary functionality, and more. We simply override the add-last and remove-last operations (either explicitly, if our language supports "does not implement") declarations, or implicitly by just not using those operations. The problem here, in terms of SUBTYPING, is that D-E-QUEUE is really a subclass of STACK, not the other way around. All D-E-QUEUES implement stack functionality, and NOT vice versa. The subclass link does not respect the intuitive notion of set inclusion, because some instances of subclasses of D-E-QUEUE are in fact NOT double-ended queues. You can't write code for the superclass and expect it to work with the subclass, because you can't assume that all of the operations of the superclass are supported by all of the subclasses. ----------------------- GENERICITY ------------------------- * GENERIC TYPES are really type CONSTRUCTORS, which define a family of types. They are TEMPLATES for the construction of types. (For the purposes of this class, at least, GENERIC TYPES are the same thing as PARAMETERIZED TYPES. A generic type is conceptually a function that takes a TYPE as a PARAMETER and returns a concrete type---i.e., it defines a function from simpler concrete types onto more complex concrete types.) * Most conventional programming languages have a very limited form of generic typing. There are a couple of built-in generic types, like ARRAY, and POINTER-TO, that allow you to construct specific types by specifying array-of-what, and pointer-to-what. So, in Pascal, you can define a record type "myrec", and instantiate the generic type array to give the CONCRETE type ARRAY OF myrec. Similarly, you can instantiate the generic type POINTER and declare the concrete type POINTER TO myrec. * Some languages, like Eiffel, Ada, ML, and the new version of C++, allow the programmer to define NEW generic types, like QUEUE of T, where t can be any type, either built in or program-defined. * It might be better to speak of generic CLASSES, rather than generic TYPES. On the other hand, for compatibility with genericity, classes generally SHOULD correspond to types, and their subclasses should be strict SUBTYPES. This ensures SUBSTITUTABILITY---you can always substitute an instance of a subclass anywhere that an instance of the superclass is expected. Suppose we define a generic type QUEUE OF , which is a queue whose elements can be of any one kind of number. We can instantiate this class with any concrete number type, to get a QUEUE OF FLOAT, QUEUE OF INT, or whatever. If we've defined ARBITRARY PRECISION INTEGER to be a subclass of NUMBER, then this will work fine with queue of . The compiler can check that any type we use supports the operations required by the generic type. So if we instantiate the generic queue-of-number to give us the concrete type queue-of-arbitrary-precision-integer, the compiler can tell that arbitrary-precision-integers satisfy all of the criteria of numbers. This allows us to define an operation on any kind of queue of number, such as sum-of-elements, that DEPENDS on the queue elements having a certain interface---e.g., supporting addition. But suppose we defined arbitrary-precision-integer as a subclass of LIST (for the purpose of code reuse, since arbitrary precision integers can be represented as lists of normal integers). That is, suppose we DON'T ensure SUBSTITABILITY---conceptually, arbitrary precision integers are numbers, and shouldn't be substituted for LISTS, even though they're subclassed from LIST. Then the compiler doesn't know that arbitrary precision integers are NUMBERS and support the NUMBER interface, and are NOT conceptually LISTS. Code that expects some kind of LIST may apply LIST operations to arbitrary precision integers, which will probably not make sense. At the very least, it will violate encapsulation, allowing code to depend on the list representation of arbitrary precision integers, as opposed to their interface (being a kind of number). --------------------- POLYMORPHISM ------------------------ * POLYMORPHISM is a GENERAL TERM that ENCOMPASSES both subclassing and genericity, as well as dynamic typing. All it means is that you can write code that is general and applies to more than one kind of thing. (polymorphism = poly (many) + morph (shapes) + ism (you know, ism)). * GENERICITY is STATIC (compile-time) polymorphism. You write code for queues in general, but you instantiate that generic type to create queues of particular kinds of things. So, for instance, in ML you can have lists of integers or lists of floats, but NOT a list of integers and floats. The compiler must be able to infer the concrete type of all of the list elements at compile time. * DYNAMIC TYPING is RUNTIME POLYMORPHISM. In Scheme or Smalltalk or Self, any variable can hold any type of value, so we can write code like this: (define foo #f) ; foo holds a boolean (set! foo (cons 1 '())) ; now foo holds a cons cell (set! foo 6) ; now foo holds an integer More importantly, it allows you to write routines like MAP and APPLY-TO-ALL that are VERY general. They can be used either to operate on HOMOGENEOUS COLLECTIONS (whose elements are all of the same type, like a list of integers), or on HETEROGENEOUS COLLECTIONS (whose elements are of different types, like a list of integers and floats). * Some languages (like Eiffel) allow combinations of static polymorphism and dynamic polymorphism. They may CONSTRAIN polymorphism so that the compiler can check more things at compile time to avoid run-time errors. So in Eiffel, you may have a method for summing a list of numbers, and the compiler will allow you to use it on lists of integers, or lists of floats, but NOT lists of ASCII characters. * Eiffel generally assumes that when you define a subclass, you mean to define a subtype, and it will let you substitute an instance of the subtype where the program specifies the supertype. Some languages directly support a DISTINCTION BETWEEN subclassing-as-SUBTYPING, and subclassing-for-CODE-REUSE. Then the compiler can allow substitability for subtypes, but NOT for arbitrary subclasses, if they're not explicitly subtypes. --------------- INTERACTIONS BETWEEN GENERICITY AND SUBTYPING Suppose we have a type T and another type, S, which is a subtype of T. (For this example, S could be a subclass, but we assume inclusion is preserved.) Now suppose instantiate a generic type, say LIST-OF- with type parameter T, to yield LIST-OF-T, and we instantiate it again with type paramenter S, to yield LIST-OF-S. What is the relationship between LIST-OF-T and LIST-OF-S? Is the latter a subtype of the former? NO! LIST-OF-T has an interface that requires T's, where LIST-OF-S has a more specific interface, which requires S's, not just any old T. So for example, if T is Number, and S is UnsignedShortInt, LIST-OF-UnsignedShortInt is NOT a subclass of LIST-OF-Number, because it doesn't support all of the operations that LIST-OF-Number does, because the types of its arguments are MORE RESTRICTIVE. E.g., if I have a variable of type LIST-OF-NUMBER, I assume I can put any kind of number into the list. But if I have a variable of type LIST-OF-UnsignedShortInt, I can't put a negative long integer, into it, or a floating-point value. MORAL: subclassing (for subtyping) is very different from instantiating a parameterized type. Both of them "specialize" existing code, but in very different senses of the word "specialize". A "specialized" subtype of a type has ADDED functionality that lets it do MORE things---i.e., it has some special functionality, but it still preserves its inherited abilities. A "specialized" instance of a parameterized type has RESTRICTED functionality, that lets it do some particular thing, not the more general thing. Understanding this distinction is essential to using subtyping and genericity appropriately. For example, it's appropriate to use LIST-OF-UnsignedLongInt when you know that for your application it wouldn't make any sense to put a float or negative int on the list in question. If client code then tries to use the list inappropriately by putting a float in it, the compiler's type-checker will flag it as an error. On the other hand, if it makes sense for the list to hold any kind of number, you should use LIST-OF-NUMBER; if some other part of the code decides to put a float in the list, the compiler will not scream, and it will compile in dynamic dispatching to make sure that the correct operation is done for either type at run time. Note that if your subclassing violates substitutability, the compiler's checking WILL BE WRONG and your quality of life may deteriorate rapidly. The compiler will not be able to tell whether the types really are compatible, and will either complain about things that DO work, or won't complain about things that WON'T work. Note that in C++ (as opposed to a well-designed type system), compilers tend to err on the side of assuming you know what you're doing. This is because you're allowed to violate inclusion willy-nilly, and the compiler is clueless what it all means. Keep in mind that subtyping defines a set inclusion relationship, and parameterized typing works best when it can count on that inclusion property---i.e., knowing that anything on a LIST-OF-NUMBER will act like a number, and that anything on a LIST-OF-UnsignedShortInt will not only act like a number, but can be counted on to be an instance of UnsignedShortInt (or a subtype thereof). It's ESPECIALLY important to understand these things, and use them properly, in a language like C++, where the compiler isn't watching out for you. ------- PURE VS. HYBRID OBJECT-ORIENTED LANGUAGES---------- Most object-oriented languages are really HYBRID languages, with a conventional procedural language at the bottom and an object system grafted on top of that. For example, in C++, objects may be composed of other objects (i.e., have them as instance variables), but the bottom level of these composed types consists of the same old primitive types we had in C: ints, floats, character pointers, and so on. You can't generally define new methods that operate appropriately on the primitive types. CommonLisp with CLOS (the Common Lisp Object System) is the same way---at bottom, you've still got fixnums, cons cells, etc. The advantage of hybrid languages is that it's easy to make them pretty fast. The leaves of the method-call graph consist of nontrivial amounts of work done on built-in types with no polymorphism. (E.g., a "point" made of an int x and an int y will operate on those instance variables without overhead from method dispatching---the object can grab its own internal values and crunch on them without going through anything "object oriented".) The problem with hybrid languages, besides being an obviously ugly kludge, is that it doesn't have the flexibilty of dealing with objects at the lowest level. For example, if you want to redefine the CAR operation and make it polymorphic, you can't. You can't define a new type of cons cell, and have the old CAR operate on it appropriately. In contrast, PURE object oriented languages treat everything as an object in the object system, and all operations have the semantics of message sends. If you want to redefine the "+" method for integers, and define a "+" method for a user-defined number type, it all works together properly. (Of course, your "+" may be slower than the usual one, and the whole system may slow down...) Smalltalk, Self, and Dylan are pure object-oriented languages, i.e., they're "objects all the way down". Self is especially pure because even instance variable lookup has the semantics of a message send. Conceptually, an object sends itself a message to get the value of one of its own fields---an object sees almost the same messaging interface to itself that other objects see from the outside. This allows methods to be redefined in superclasses (e.g., computing a value rather than simply looking at a field) without breaking subclass code. -------------------- GARBAGE COLLECTION ------------------- * GARBAGE COLLECTION supports the ABSTRACTION OF INFINITE MEMORY, subject to certain constraints. It allows programs to ALLOCATE UNLIMITED AMOUNTS OF DATA, as long as there's not too much data that's actually live at any given time. * GARBAGE COLLECTION supports the abstraction that OBJECTS NEVER DIE DURING THE EXECUTION OF THE PROGRAM; that is, objects have INDEFINITE EXTENT---they live "as long as necessary." All objects are kept around if they could possibly affect the course of the computation. The abstraction, then, is that one has infinite memory, and all objects live indefinitely. The actual reclamation of storage is AN IMPLEMENTATION-LEVEL OPTIMIZATION. ---------------------- PERSISTENCE ------------------------ * Persistence means that data can survive ("persist") beyond the execution (and termination) of programs that create and operate on them. In conventional systems, file data are persistent, but objects on the heap are not. In a persistent object store, heap objects may be saved beyond program termination. We usually use the term persistence, however, when talking about objects with object identity, as opposed to plain boring file data. * Persistent object stores are different from file systems in that objects retain their IDENTITY over time. Rather than writing out the contents of a record, and reading the data back into another record, you can simply "store" objects and get back the same individual objects. (Naturally, at the implementation level this may involve copying, but the abstraction of object identity is preserved.) -------------------------------------------------------------------- ENVIRONMENTS AND CONTINUATIONS In Scheme, unlike most block-structured languages (e.g., Pascal, and C), binding environment frames are allocated separately from activation records. (A particular Scheme compiler may allocate some variable bindings in some activation records, but in the general case binding environment frames are allocated in the heap, separately from activation records.) The Scheme equivalent of an activation record is called a PARTIAL CONTINUATION. This is not quite the same as the usual notion of activation record, because it holds the state of the CALLER, not the CALLEE. That is, it is more like a SUSPENSION record, holding the state of the procedure invocation that was SUSPENDED when it called another procedure. This suspension record is called a (partial) CONTINUATION because it holds the information necessary to CONTINUE (resume) the suspended procedure (caller). Scheme's equivalent of an ACTIVATION STACK is a FULL CONTINUATION, which is simply a chain of PARTIAL CONTINUATIONS representing the ``stack'' of suspended procedures waiting to be resumed. These partial continuations are allocated in the HEAP in the general case, and linked by pointers of some kind that act as return addresses. (Again, a given Scheme compiler may allocate some continuations in a stack for efficiency, but this will not work in the general case---in general, the Scheme ``stack'' (continuation chain) is NOT guaranteed to follow a true stack discipline, because of CALL-WITH-CURRENT-CONTINUATION. It is useful to consider a Scheme system to consist of a current machine state, probably a small set of state registers, plus a continuation chain on the heap. The state registers include the program counter, which points to the currently-executing instruction, an environment register that points to the binding environment frame that it's executing in, a continuation pointer that points to the top partial continuation, and a value register where procedures leave their values when they return. A PARTIAL CONTINUATION holds several things. In particular, it saves the current ENVIRONMENT POINTER, so that when a procedure is resumed, it can restore the appropriate execution environment. It also saves the previous CONTINUATION; that is, it has to know how to continue if IT returns after it's resumed. This is just a pointer to a previously-saved partial continuation, i.e., its caller's continuation. Pushing a partial continuation involves saving the current value of the continuation register as part of the new partial continuation; this creates a linked list onto which partial continuations are pushed in a (usually) stacklike manner. A PROCEDURE RETURN pops the top (partial) continuation off of the full continuation chain. The old values of the execution environment, program counter, and next continuation are stored into the registers, and execution resumes. Note that this popping of a (partial) continuation does NOT overwrite the values in the (partial) continuation. They're still out there in the partial continuation object on the heap. The only ``side effects'' are to the basic state registers. The continuations on the heap thus continue to exist unless and until they are garbage collected, and their values never change---they are IMMUTABLE. This will turn out to be important when we discuss call-with-current- continuation, below. Try to visualize the continuation chain on the heap, and the process of saving and restoring partial continuations. Note that since the continuations remain on the heap and are immutable, that means that the graph of continuations is really a tree, like the dynamic call graph. The current ``activation stack'' is really just a particular chain from a record in this tree up to the root. If we never garbage collected them, we'd have a complete tree of all the continuations we saved, and every state of the continuation chain would still be there, as a path from some partial continuation up to the root. The root of the tree would be an initial procedure invocation (e.g., the read-eval-print loop, which in turn calls each program). ------ SUBPROBLEMS vs. REDUCTIONS, and TAIL RECURSION Scheme is one of the very few languages that is guaranteed to be properly TAIL RECURSIVE. This allows Scheme programmers to implement loops as recursive procedure calls WITHOUT the system using an indefinite amount of memory for activation records. TAIL RECURSION is not only important for loops, however; it is more general than that. Consider the following procedure: (define (foo) (bar) (baz) (quux)) Note that when we call bar, we must save the state of foo, so that when bar returns, we can resume foo. We save the state by pushing a partial continuation onto the current continuation chain, indicating that when we return, we should return to the point where we call baz. Similarly, when we call baz, we must save the state of foo, indicating that when we return to foo we resume execution at the call to quux. NOTE THAT THIS IS A CALLER-SAVES CONVENTION. The caller saves its own state, rather than assuming that the callee will save it before clobbering it. This is not as lazy as a callee-saves convention, and in some cases is less efficient. (A given compiler can often use callee-saves techniques, but in the GENERAL case, caller-saves is required.) To understand why we need a caller-saves convention, consider the call to quux above. When we return from quux, the only thing left to do is to return from foo. There's nothing interesting about the state of foo, so there's no point in saving it. Rather than pushing a continuation, calling quux, having quux return, and then returning from foo, we can simply skip the pushing of the continuation and the return from foo. The call to quux will clobber the state of foo, but it won't matter, because that state isn't going to be used again. Then the return from quux will do nicely as a return from foo, because we skipped pushing foo's continuation. The calls to bar and baz are called SUBPROBLEMS, because they solve part of the problem foo solves, on the way to a whole solution. But the call to quux is a REDUCTION, because the problem solved by (foo) REDUCES TO the problem solved by (quux), once we've gotten to that point in the computation. NOTE THAT THIS IS A PERVASIVE ASPECT OF SCHEME. It doesn't just apply in simple cases like the one above. It also works for conditional constructs, even nested ones, like this: (define (foo) (if (expr1) (if (expr2) (bar) (baz)) (quux))) Depending on the values of (expr1) and (expr2), we will execute (bar), (baz), or (quux), but they're ALL reductions---whatever value they return will be the value of the call to foo. Scheme implementations are required to implement these as tail-calls, without pushing a partial continuation onto the continuation chain. Compilers for other languages MAY do this as an optimization, e.g., the GNU C compiler often does this ``tail call optimization,'' but in Scheme it's NOT an optimization, it's part of the defined semantics of the language. The main reason it's required is so that you can implement LOOPS as RECURSION. In fact, the semantics of Scheme's loop constructs are DEFINED AS SPECIAL CASES OF RECURSION. Consider the following loop construction, called a NAMED LET. (KNOW THIS SYNTAX!) (let myloop ((i 0)) ; loop var. i initialized to 0 (if (< 10) ; (begin (display i) ; begin just sequences its args (myloop (+ i 1))))) ; tail call to loop A named let sets up a loop variable (or several loop variables), in this case i. The variable i is initially bound to the value of the expression 0, namely 0. Then the loop body is executed. If the name of the let is used as a procedure, and called, as in the last line, the loop iterates. The argument(s) passed by this call become the value(s) of the loop variable(s). (Note that BEGIN just executes its subforms in sequence, like a BEGIN...END block in Algol or {...} in C. KNOW THIS SYNTAX.) In this case, every time we iterate, we pass the next iteration a new value for i, which is 1 greater than the current value for i. For the first 10 iterations, we'll print out the value of i and iterate. On the last iteration, the test will fail and the loop will terminate. Why the peculiar syntax? The reason is that each iteration of the loop is conceptually a PROCEDURE CALL, and the new value(s) of the loop variable(s) are passed as arguments to the procedure. This is NOT just a syntactic cuteness. Scheme's loops are defined to have exactly the same semantics as equivalent procedure calls. For example, we could put (set! myproc myloop) and actually capture the loop procedure in a variable, then call it from some other context. The above named let form is EXACTLY equivalent to the following: (letrec ((myloop (lambda (i) (if (< i 10) (begin (display i) (myloop (+ i 1)))))) ; iterate (myloop 0)) ; start loop, passing initial value as arg. Recall that LETREC is a lot like LET, but the initialization expression(s) are evaluated in the NEW environment created by the form. In this case, that's so that the lambda expression can see the variable myloop, created by the letrec. Since myloop can see its own name, it can call itself. What's going on here is that we're creating a variable named myloop, and creating a procedure closure to put in it. The procedure closure will act as the body of the loop, accepting a new value of the procedure variable i each time it's called. We start of the loop by calling the procedure from the body of the letrec, starting it off with the argument that we get from the initial value clause of the named let. NOTE THAT THIS DOES NOT INVOLVE ASSIGNMENT! When we iterate, we do a procedure call. This creates NEW BINDINGS for the loop variables. It does NOT re-use the old bindings by simply side-effecting them with the new values. That's because the semantics of named let are defined to be precisely the same as those of a tail-calling letrec, as above. So, for example, the code for the loop body might create closures that capture the loop variables. Each iteration would create a new set of loop variables, capturable by any closure-creating operations (lambdas) in the body. (Again, smart compilers may do something different in the usual case. Even the RScheme compiler recognizes simple loops, and smarter compilers can optimize away most of the per-iteration environment creation. But in the general case, you're allowed to write code that captures loop variable bindings, and in those cases the compiler must guarantee that the bindings are there to capture.) One of the reasons that Scheme's iteration constructs are defined this way is to make programs easier to reason about---it is easier to prove characteristics of programs that re-bind variables than those of programs that side-effect variables. The point of the above example is that the loop will use bounded space, even if we changed the condition so that it would loop infinitely. We would create a binding environment for each iteration, but only one will ever be live, so all but one of them will be reclaimed at any garbage collection. They will NOT fill up the with unreclaimable junk. We will also NOT create a huge number of unreclaimable partial continuations--- since we don't push a continuation before each call, the continuation chain doesn't grow. Iterating in this way MAY cause garbage collections, but it WON'T cause you to build up a huge amount of LIVE data, which could eventually defeat the garbage collector and crash the system. ------ CALL-WITH-CURRENT-CONTINUATION is a very powerful control construct, which can be used to create more conventional control constructs, like catch and throw in Lisp (or setjmp and longjmp in C), or coroutines, and many more. It is extremely powerful because it allows a program to manipulate its own control ``stack'' so that procedure calls and returns needn't follow the normal depth-first textual call ordering. Recall that we said (above) that Scheme's equivalent of an activation stack is really a chain of partial continuations (suspension records), and this chain is known as a full continuation. And since continuations are immutable, they usually form a tree reflecting the call graph (actually, only the non-tail calls). Normally, the parts of this tree that are not in the current continuation chain are garbage, and can be garbage collected. If you take a pointer to the current continuation, and put it in a live variable or data structure, however, then that continuation chain will remain live and not be garbage collected. That is, you can ``capture'' the current state of the stack. If you keep a captured state of the stack around, and later install the pointer to it in the system's continuation register, then you can return through that continuation chain, just as though it were a normal continuation. That is, rather than returning to your caller in the normal way, you can take some old saved continuation and return into that instead! You might wonder why anybody would want to do such a weird thing to their ``stack,'' but there are some useful applications. One is COROUTINES. It is often convenient to structure a computation as an alternation between two different processes, perhaps one that produces items and another that consumes them. It may NOT be convenient to either of those processes into a subroutine that can be called once to get an item, because each process may have complex state encoded into its control structure. (You probably wouldn't want to have to structure your program as a bunch of incremental operations that were called by successive calls to a do-next-increment routine. It may be that the program it gets its data from can't easily be structured that way, either. Each program should probably be written with its own natural control structure, each suspending its operation when it needs the other to do its thing.) COROUTINES allow you to structure cooperating subprograms this way, without making one subservient to (and callable piecemeal from) another. Coroutines can be implmemented as operations on continuations. When a coroutine wants to suspend itself and activate another (co-)routine, it simply pushes a partial continuation to save its state, then captures the value of the continuation register in some way, so that it can be restored later. To resume a suspended routine, the continuation chain is restored and the top partial continuation is popped into the system state registers. Alternation between coroutines is thus accomplished by saving and restoring the routines' continuations. Note that in this case, we can have TWO (or more) trees of continuations that represent the course of the computation, and that control flow can alternate back and forth between trees. Usually, computations are structured so that most of the work is done in the usual depth-first procedure calling and returning, with occasional jumps from one routine's depth-first activity to another's. Another use of continuations is to implement CATCH and THROW, which are roughly like setjmp and longjmp in C. The idea is that you may want to abort a computation without going through the normal nested procedure returns. In a normal stack-based lagnuage (without continuations), this is usually accomplished by storing a pointer into the stack before starting the abortable computation. If it is necessary to abort the computation, all of the activation records above the point of call can be ignored, and the stack pointer can be restored to that point, just as though all of the invocations above it had returned normally. This can be implemented with call-with-current continuation by saving the continuation at the point where an abortable computation is begun. Anywhere within that computation, that continuation may be restored (clobbering the ``normal'' value of the continuation register, etc.) to resume from that point. But call-with-current-continuation is more powerful than coroutines or catch and throw. Not only can we escape ``downward'' from a computation (by popping multiple partial continuatons at once without actually returning through them), we can also escape ``upwards'' BACK INTO a computation that we bailed out of before. This can be useful in implementing EXCEPTION HANDLING, where we may want to transfer control to a special coroutine that may ``fix up'' an error that was detected, but then resume the procedure that encountered the error, after the problem is fixed. Call-with-current-continuation can also be used to implement BACKTRACKING, where the control flow backs up and re-executes from some saved continuation. In this case, we may save the continuation for some computation, but go ahead and return through it normally. Later, we may restore the saved continuation and return through it AGAIN. Note that in general, continuations in Scheme can be used MULTIPLE times. The essential idea is that rather than using a stack, which dictates a depth-first call graph, Scheme allows you to view the call graph AS A GRAPH, which may contain cycles, even directed cycles (which represent bactracking). The SYNTAX of call-with-current-continuation is fairly ugly, but for some good reasons; in its raw form, it is very powerful, but correspondingly hard to use. Typically, it is encapsulated in macros or other procedures to implement other, higher-level control constructs. Call-with-current-continuation is itself a normal first-class procedure, which encapsulates the very low-level continuation munging abilities in something like a civilized package. Since it's a first-class procedure, you can write higher-order procedures that treat it like data, or call it, or both. Call-with-current-continuation is a procedure of exactly one argument, which is another procedure to execute after the current continuation has been captured. The current continuation will be passed to that procedure, which can use it (or not) as it pleases. The captured continuation is itself packaged up as a procedure, also of one argument. That's so that you can't muck around with the continuation itself in any data-structure-like way. There are only two things you can do with captured continuations---capture them and resume them. Continuations are captured by executing call-with-current-continuation, which creates an ESCAPE PROCEDURE. They are resumed by EXECUTING the ESCAPE PROCEDURE. Note that call-with-current-continuation is a procedure of one argument. We'll call that procedure the ABORTABLE procedure. The abortable procedure must ALSO be a procedure of one argument. (If you want to call a procedure that takes a bunch of arguments, and still make it abortable using call-with-current-continuation, you have to do a trick I'll describe below.) The abortable procedure's argument is the ESCAPE procedure that encapsulates the captured continuation. So call-with-current-continuation is a procedure of one argument, which is some procedure we want to call. The procedure we want to call must take one argument, which is the ESCAPE procedure that can resume the computation just beyond the call to call-with-current-continuation. As if this weren't cryptic enough, the escape procedure is ALSO a procedure of exactly one argument. When the escape procedure is executed, it resumes execution of the saved continuation, just beyond the point where call-with-current-continuation was executed. That call (to call-with-current-continuation) may be expected to return a value, however. The argument to the escape procedure is the value that will be returned as the value of the call. Note that if the escape procedure is NOT executed, and the abortable procedure returns normally, the value it returns is returned as the value of the call to call-with-current-continuation. Consider the following example: 0: (define some-flag #f) 1: (define (my-abortable-proc escape-proc) 2: (display "in my-abortable-proc") 3: (if some-flag 4: (escape-proc "ABORTED")) 5: (display "still in my-abortable-proc") 6: "NOT ABORTED") 7: (define (my-resumable-proc) 8: (do-something) 9: (display (call-with-current-continuation my-abortable-proc)) 10: (do-some-more)) 11: (my-resumable-procedure) At line 11, we call my-resumable-procedure. It calls do-something, and then calls display. But before it calls display it has to evaluate its argument, which is the call to call-with-current-continuation. Call-with-current-continuation saves the continuation at that point, and packages it up as an escape procedure. The escape procedure, if executed will return its argument as the value of the call-with-current-continuation form. That is, if the escape procedure is called, it will resume execution of the display procedure, which prints that value, and then execution will continue, calling do-some-more. Once call/cc has created the escape procedure, it calls its argument, my-abortable-proc, with the escape procedure as ITS argument. My-abortable-proc then displays (prints) "in my-abortable-proc." Then it checks some-flag, which is false, and does NOT execute the consequent of the if---that is it doesn't execute the escape procedure. It continues executing, displaying "still in my-abortable-proc." It then evaluates its last body form the string "NOT ABORTED," which evaluates to itself, and returns that as the normal return value of the procedure call. At this point, the value returned from my-abortable-proc is printed by the call to display in line 9. But suppose we set some-flag to #t, instead of #f. Then when control reaches line 3, the if DOES evaluate its consequent. This calls the escape procedure, handing it the string "ABORTED" as its argument. The escape procedure resumes the captured continuation, returning control to the caller of call-with-current-continuation, without executing lines 5 and 6. The escape procedure returns its argument, the string "ABORTED" as the value of the call-with-current-continuation form. It restores the execution of my-resumable-proc at line 9, handing display the string "ABORTED" (as the value of its argument form). At this point "ABORTED" is displayed, and execution continues at line 10. Often we want to use call-with-current-continuation to call some procedure that takes arguments other than an escape procedure. For example, we might have a procedure that takes two arguments besides the escape procedure, thus: (define (foo x y escape) . . . (if (= x 0) (escape 'ERROR)) . . .)) We can fix this by CURRYING the procedure, making it a procedure of one argument. Suppose we want to pass 0 and 1 as the values of x and y, as well as handing foo the escape procedure. Rather than saying (call-with-current-continuation foo) which doesn't pass enough arguments to the call to foo, we say (call-with-current-continuation (lambda (escape) (foo 0 1 escape))) The lambda expression creates a closure that does exactly what we want. It will call foo with arguments 0, 1, and the escape procedure created by call-with-current-continuation.