Dispatching Techniques Object-oriented languages require the ability to DISPATCH at run time, i.e., to find the code that implements an operation for a given object, when that may not be known at compile time. This is true not only for "dynamically typed" languages like Smalltalk and object-oriented Lisps, but also for "strongly typed" object oriented languages such as Eiffel and C++. In the latter case, the compiler may be able to tell at compile time that method dispatching can never fail (i.e., that method lookup will never fail at runtime) but it often can't tell which method or methods will be invoked at a particular point in the code. Dynamic dispatching can be costly in two ways: * it takes time to find the right method * the fact that the method being called is unknown prevents optimizations based on specialization and inlining In these notes, we deal with dynamic dispatching techniques and their costs; we do not address the issue of how to reduce the amount of dynamic dispatching that's necessary. (That's a very important topic that should be addressed as well, but not here. The work of Hoelzle, Ungar and Chambers on compilers for Self is particularly relevant when dealing with dynamically-typed object-oriented languages.) Both of these issues are important for achieving good performance in an object-oriented system. --- There are several basic runtime dispatching techniques for dynamically-typed languages: 1. conditional branches or switch tables to cover a fixed set of possibilities 2. inheritance hierarchy searches at run time, typically augmented with caching techniques to avoid repeated searches 3. dispatch tables (For RScheme, we would like to support all of these with a single base-language compiler, and let the metalevel code pick a dispatch mechanism that's appropriate for the language being compiled and whatever knowledge the metalevel system has about the particular situation relevant to the particular code being compiled.) -------------------------------------------------------------------- Conditional branching Simple conditional branching techniques only work for languages with limited dynamic typing, or in situations where the compiler can tell that the number of possibilities is limited. The classic example of this is a Lisp system with operations that have limited polymorphism. For example, the procedure CAR only works when given a pair (cons cell), so the code for car can simply do a type test to see if the object is a pair. If it is, it extracts the car field of the object. If not, it signals an error. In pseudo-C: if (PAIR_P(arg)) then GET_CAR(arg); else signal_error(...); The next simplest case is when an operation operates on a fixed set of types, e.g., + in a Lisp system that has only two data types, e.g., int and float. This can be done with nested if code, which checks the type of the first argument and then the type of the second argument, and bottoms out in the appropriate low-level code to do the operation on the particular argument type combination at hand. In pseudo-C: if (INT_P(arg1)) then if (INT_P(arg2)) then value = i_i_add(arg1,arg2); else if (FLOAT_P(arg2)) then value = i_f_add(arg1,arg2); else signal_error(...); else if (FLOAT_P(arg1)) then if (FLOAT_P(arg2)) then value = f_f_add(arg1,arg2); else if (INT_P(arg2)) then value = f_i_add(arg1,arg2); else signal_error(...); else signal_error(...); Notice that this kind of thing will result in a big piece of code if several arguments can be of any of several types. It can also be a big piece of code if only single dispatching is used, but the actual argument type could be any of many types. --- Another important issue in dynamic dispatching mechanisms is how EXTENSIBLE they are. The ideal dynamic dispatching mechanism would be fast, but also allow incremental program development. The extreme case is to be able to define new operations and new methods on old operations at run time. Simple inheritance hierarchy searches accomodate this easily. A more limited case is to allow old code to be linked with new modules, which may define subclasses of the classes in the old code. ---------------------------------------------------------------------- Inheritance Hierarchy Searches In a system with inheritance, the obvious way of implementing method lookup is by actually looking up the inheritance hierarchy at run time. This is easy if there are class objects available at run time, which have pointers to their superclasses and lists of methods. When dispatching on a single argument (message receiver), we can simply look up the class of the instance (perhaps by extracting a class pointer from its header), and look at the class object to see if it has a method for the instance in question. If it does, we use it. If it doesn't, we look at its superclass, and see if it has a method for the operation, and so on until we either find a method, or reach the topmost class in the inheritance chain and fail. The problem with this technique is that it can be slow. When looking for a method in a class, we have to search some kind of data structure to see if it has an appropriate method. If we fail, we have to search in the superclass. If methods are stored in a linear list in each class, we have a cost that's linear in the number of methods. If the method is defined in a distant superclass, this cost is multiplied by how many levels of the inheritance hierarchy we have to search. ----------------------------------------------------------------------- Caching of Inheritance Lookups A simple way of speeding up the average message lookup time in an inheritance graph is to cache the results of previous searches. In most programs, the large majority of message dispatches are similar to other recent message dispatches. For example, when invoking a particular generic function, it is very likely that the arguments are of the same types as the last time that function was invoked. If each generic function caches the result of the previous lookup, it will usually find the appropriate class/method pair in its cache, and be able to skip the lookup. This idea can be generalized to store the last n (say, 3) results of inheritance hierarchy searches. Searching the cache may take longer in the worst case, because we may have to check 3 cache slots and then still have to do a full inheritance lookup, but in the typical case we will probably win---as long as a given message is not invoked in more than n ways over a short period of time, we will seldom have to do a full inheritance hierarchy search. An even more effective technique can be used if we know what generic function is being called (or what message is being sent) at the call site. (This is usually the case in Smalltalk, because Smalltalk message selectors are second-class---we don't have to indirect through a first-class generic function object.) We can keep a separate cache per call site, which is specific to that piece of code. This may work better than keeping a cache per generic function, because a generic function may be called from several call sites in different characteristic ways. (It's not actually clear, though, how much of a win you get by caching for each call site separately.) --- Invalidating Caches when Methods are Added or Changed Note that if incremental development is supported, cached lookup data may become invalid. For example, if an existing class inherits a method from another existing class, but a new method for the same operation is defined in the subclass, then the cached results for that subclass become invalid. The subclass now has its own method, but if the result of a previous dispatch to that method are cached, the cache will hold an erronous pointer to the superclass method. Supporting incremental definitions requires having a CACHE INVALIDATION mechanism to discard obsolete lookup information when the inheritance hierarchy changes. A simple way of doing this is to invalidate ALL cached lookup data whenever ANY new methods are defined. This will cause a large slowdown every time a method is changed. More refined techniques allow SELECTIVE INVALIDATION. For example, if each generic function has its own cache, then the cache only needs to be invalidated when new methods are defined on that generic function, or when the inheritance relationships between existing classes are changed. Using Hash Tables for Caching We can make a very effective cache if we use a large hash table to cache lookups for different generic functions and argument combinations. We can assign an integer type ID to each class, and another integer type ID to each generic function. We can combine these to compute a hash value, and use that to probe a large hash table. If the hash table is large enough, we will usually find the exact item we want in constant time. Notice that this can be a very fast hash table. Since we can always fall back on an inheritance hierarchy search, we don't have to store all of the data in bucket overflow tables. If there are collisions, we can discard the old data and the only cost will be that we perform those lookups again if we need that data again. Ways to make hash tables fast: * we could use a simple direct-mapped cache design if we think that hash table speed is more important than minimizing the frequency of table misses. (This is likely to be true if the hash table size is large relative to the number of function/argument combinations encountered over any reasonable period of time.) That is, we hash into the table, and if we don't find what we're looking for, we don't bother with collision resolution. * we can reduce the probability of collisions by ensuring that the hash values of related functions are likely to be different. (E.g., if the hash value depends primarily on the low bits of the type ID's, we can ensure that the type ID's of siblings in the class hierarchy differ in the low bits.) * we can use fast probabilistic matching, by computing a "signature" value for the function/argument combination. We use a subset of this signature (some of the bits) for hashing, and the rest of it to resolve collisions. We may be wrong if the signature isn't unique, but if we're wrong we just take a performance hit (inheritance lookup), not a correctness hit. Here's one possible hash-table caching scheme, off the top of my head: Whenever we define a generic function, we give it an integer ID. We keep a counter for this, which counts up from 0. We store this in the generic function object. Whenever we define a class, we give it an integer ID. We keep a counter for this, which counts down from the largest integer we can store in an immediate integer value. (Assuming we don't have more than about a billion classes and generic functions, all of these ID's will be unique.) When we dispatch, we compute a "signature" for the method and argument combination by XORing the ID's together. (XOR is nice, because every bit in the result depends on every bit in each of the things being XORed together.) Then we select some low bits from this (a mask instruction) to serve as a hash table key. We index into the hash table, which is just a vector. We check the signature of the method in that entry. If it matches, we check that the full function/argument pattern matches. If so we're done, and we use the method stored there. (XOR is nice here, too. If we've checked the signature, and it matches, we don't have to check the whole pattern too. If the signature matches, and all but one of the patterns match, then all the patterns match---the remaining pattern is the complement of what we've checked, and in checking the signature we implicitly checked that. So for single dispatching, we only have to check the signature and the function ID, and if they match we know the argument type ID matches too.) If the signature doesn't match, or the full pattern doesn't match, we've missed the cache. We do a lookup, and store the appropriate pattern, signature, and method in the hash table entry. Then we invoke that method. So the usual-case code for single dispatching looks like this: extract class from argument extract ID from argument class extract ID from generic function XOR them together to get signature mask signature to get hash value index into hash table load signature from hash table compare signatures conditional branch to inheritance lookup (not taken) compare argument (or generic function) ID conditional branch to inheritance lookup (not taken) invoke method (If we use the same code for every generic function, then the loading of the generic function's ID is a reference to its environment. But if we compiled different code for every generic function, it could be an instruction-stream literal. That's why I have GF ID's counting up from 1---to keep the bit pattern small so we don't need an extra instruction to load it, as long as we don't have tons of GF's.) Invalidation of Hash Table Caches Selective invalidation is trickier when a large, unified cache is used. The cached results for a given operation will typically be scattered through the hash table. Either the entire cache must be flushed, or some algorithm must be available to compute all of the possible hash values for the relevant operation applied to existing classes. (For example, we might look down the inheritance hierarchy to find the affected subclasses, and compute the relevant hash values. This may be complicated by multiple dispatching techniques.) --- Other forms of caching, and "copy down" A simple lookup cache can be very fast in the usual case, but may be slow when a cache miss occurs. An inheritance hierarchy search, if implemented naively, may take time roughly proportional to the height of the inheritance graph (since we may have to look all the way up the hierarchy to find the method) times the number of methods defined on a class (since we may have to look through all of the methods in a class to find out that the one we want is not there). To decrease the average case cost of these lookups, we can cache some or all of the inheritance lookup information in the subclasses as well. The extreme case of this is to keep a "cache" for each class, which holds all of the methods defined on that class, whether they're defined directly in the class, or inherited from superclasses. In that extreme case, it's not really a cache, techically, because it holds all of the relevant information rather than the (recently-used) subset of it. This technique is called "copy down", because we copy the method information down into each subclass that can inherit it. Copy-down can cost significant space, because it means that each class must have lookup information proportional to the number of methods defined (directly or indirectly) on that class. If the number of methods may be large, it is also important to represent the set of available methods in a way that's relatively fast to access, such as a hash table. (Copy-down has advantages when you have a good compiler, because you can compile different code for the SAME method in each class that inherits it, and you may be able to inline the code for many methods into other methods on the same class. You should be able to use techniques analogous to the Self compiler's so that you can inline messages to self. In a multimethod system, that should generalize to being able to inline methods for any of the dispatched-on arguments. That's beyond the scope of these notes, however, and I hope it will (eventually) be dealt with in notes on MOPs and advanced compilers.) ------------------------------------------------------------------------ Dispatch tables A very different technique is used in some languages, like C++, which are based on mostly static analysis. Oversimplified C++ Function Dispatching The (over-)simplified idea of C++ function dispatching (which nobody actually uses to implement C++) is that a method of an object is actually a field of the object, which is just a function pointer to the code for the method. To invoke a method, you just index into the object, extract the value of the appropriate method field, and use the value stored there as a function pointer---you call that function with a pointer to the argument as its first argument. So to call the "foo" method of an object, you just extract the value of its "foo" field, and call that function. Let's see how this works out if we have single inheritance. Suppose we define a class bar, with fields x, y, foo, and bleen, where x and y are integers, and foo is a method pointer. The compiler will lay it out something like this: +---------+ x | 10 | +---------+ y | 15 | +---------+ foo | +----+---->code for foo method for class bar +---------+ bleen | +----+---->code for bleen method for class bar +---------+ If we define a subclass baz, which inherits all of these fields, plus a double-float field quux and a method fubar, it will be laid out like this: +---------+ x | 18 | +---------+ y | -23 | +---------+ foo | +----+---->code for foo method for class baz +---------+ bleen | +----+---->code for bleen method for class baz +---------+ | 95687. | quux | 876500 | +---------+ | +----+---->code for fubar method for class baz +---------+ The key thing to notice here is that the layout of the superclass is a prefix of the layout of the subclass---the subclass layout consists of the layout of the superclass, plus some other fields appended to it. Notice that that implies that the foo pointer in the subclass is at the SAME OFFSET from the beginning of the object as the foo pointer in the superclass. To invoke the foo method of either object, all we have to do is index into the object a known distance, grab the foo pointer, and call whatever function it points to. When compiling a call to the foo() virtual function, all the compiler has to do is emit instructions to index a known distance into the object, extract the function pointer, and call it. All that's necessary to make this work is to ensure that every class' layout includes its superclasses' layout as its first part, followed by any additional instance variables defined in the subclass. (Actually, you also have to ensure that the appropriate function pointers are installed in the appropriate fields when the object is allocated. The compiler automatically generates an allocation routine for each class.) This would work for C++, because a virtual function declaration is associated with a particular class, and applies only to that class and its subclasses. Just as an instance variable can be defined in a class, and each subclass will inherit that variable, a virtual function is something defined in a class and inherited in all subclasses. The virtual function can essentially be implemented as a slot that holds a pointer to a procedure. Each subclass inherits the slot, but initializes it to point to the appropriate procedure (method) that implements the operation for that class. This means that dynamic dispatching can be VERY FAST. All you have to do is index a known offset into the object, extract a pointer value, and do a normal procedure call. Also notice that if we actually implement this this way, each individual object could store its own method (function pointer) for any operation---you could change the function pointer in a field of an object, and make it behave differently. Also notice that this depends on the compiler looking at the inheritance hierarchy and the objects' layouts at compile time. The compiler can emit a fixed sequence of instructions to grab the objects' own function pointer and call it. --- Actual (single-inheritance) function dispatching in C++ The actual scheme used by C++ compilers isn't quite this fast, but it's close, and it saves space. Notice that the methods applicable to an instance are determined by its class, so it's not really necessary to store the function pointers in every instance, as instance variables. Instead, each class can have a record holding the function pointers, and the instances can have a pointer to that. The per-class table of function pointers is called a "virtual function table", or "vtbl" for short. (It'd probably be better to call it a method table, or a "member function" table, but "vtbl" is the standard terminology.) So instead of storing an instance like this: +---------+ x | 10 | +---------+ y | 15 | +---------+ foo | +----+---->code for foo method for class bar +---------+ bleen | +----+---->code for bleen method for class bar +---------+ we can store it like this: +---------+ x | 10 | +---------+ y | 15 | virtual function table for bars +---------+ vtbl | +----+------->+---------+ +---------+ foo | +----+---->foo method for class bar +---------+ bleen | +----+---->bleen method for class bar +---------+ As before, a simple instruction sequence can get at the foo or bleen method pointer for class bar, only now we've added a level of indirection. To invoke a method, we index into the object at a known offset to get the vtbl pointer. Then we index into the vtbl at a known offset to get the pointer to the method we want. Then we take that function pointer and call it. This saves space, and still allows constant-time dynamic dispatching, while adding an indirection to the constant. We only have to ensure that each class has a virtual function table pointer in the same place as its superclass, and that its virtual function table layout includes at least the same fields as the virtual function table of the superclass---it must have pointers to the corresponding methods at the same offset. To support this, we change the layout rules accordingly. When we define a subclass with extra data fields, those are appended to the layout of the superclass to give the layout of the subclass. If we define new virtual functions in the subclass, those fields are appended to the layout of the superclass virtual function table to give the layout of the virtual function table for the subclass. So when we define baz as a subclass of foo, it ends up like this: +---------+ x | 10 | +---------+ y | 15 | virtual function table for bazzes +---------+ vtbl | +----+------->+---------+ +---------+ foo | +----+---->foo method for class baz | 472918. | +---------+ | 720193 | bleen | +----+---->bleen method for class baz +---------+ +---------+ fubar | +----+---->fubar method for class baz +---------+ (As long as we're only using single inheritance, we only need one virtual function table pointer per instance. The location of that virtual function table pointer is determined by the first class to define a virtual function. A class with no virtual functions doesn't have a virtual function table pointer, so there is no space cost if virtual functions are not used. The first subclass to define virtual functions will have a virtual function table pointer appended to its layout. This turns out to be important for C++, because it is backward-compatible with C. C structs can be viewed as C++ classes with no virtual functions and hence no virtual function table pointers.) The main point to notice here is that a superclass defines an interface, which is a set of generic functions, and all subclasses must implement that interface. In terms of how the compiler implements subclasses, an interface corresponds to a virtual function table. The compiler ensures that code compiled to operate on instances of the superclass will work with instances of any subclass, because subclasses will have virtual function table pointers in the same place, and the virtual function tables they refer to will have method pointers for the right methods in the right places. --- Supporting Multiple Inheritance in C++ Dispatching The scheme we've outlined allows virtual function dispatching in constant time---a double indirection to find the function pointer, followed by a normal procedure call. Unfortunately, it depends on having the virtual function table pointer for the class at the same offset in instances of all subclasses of a class. What happens when we add multiple inheritance to the language? At first glance, that would seem to wreck the whole scheme---if you inherit from two classes that each have virtual function table pointers in the same place, but define different virtual functions, you can't have the right methods for both superclasses in the right place in the virtual function table. Right? E.g., if you have two different classes that each define one integer variable and one method, they both expect to be laid out this way: +---------+ var | | +---------+ vtbl | +----+------->+---------+ +---------+ meth | +----+---->method +---------+ Since classes expect their subclasses' layouts to include the superclass layout as a prefix (and likewise for virtual function table layouts), how can we reconcile having two superclasses that each expect their subclass to have a particular layout as their first part? The extremely clever technique used by most C++ compilers is to change the notion of what it means to have a "pointer to" an object. (This is either extremely clever, or grotesquely bizarre, depending on your point of view.) Rather than having every pointer point to the beginning of an object, some pointers can point into the middles of objects, and a sneaky technique is used to ensure that a pointer of a given type points to something that's laid out in the way it expects. When a subclass C inherits from classes A and B, a C++ compiler generally constructs the layout of C by appending the layouts of A and B, and adding any extra instance variables defined in class C. (We assume that we're talking about normal inheritance here. Virtual base classes are different.) Some compilers do it this way: Layout of A Layout of B Stuff added for C Others use a different ordering rule, so you may get Layout of B Layout of A Stuff added for C The important thing to notice is that the layout of a C INCLUDES the layout of an A or a B. The trick that C++ compilers use is to represent pointers of different types that point to the same objects as pointers to the appropriate PART of the object, whose layout is a concatenation of the layouts of its supertypes. Lets assume that our compiler puts the first parent first, so in this case the A part comes first, followed by the B part, followed by the C-specific additional stuff. A pointer of type pointer-to-A that points to this object (whose concrete type is C) will be implemented as the address of the beginning of the object. A pointer of type pointer-to-B that points to it will be implemented as the address of the B part of the object, imbedded within the whole C object. A pointer of type pointer-to-C will point to the beginning of the object. ptr of type A or C ---> A part (inherited) ptr of type B ---> B part (inherited) C-specific part The resulting object will have two virtual function table pointers. It will have a virtual function table pointer in the same position that A's do, which will serve as a virtual function table pointer when the object is viewed as an A or C. It will will also have a virtual function table pointer in the B part, which will serve as the virtual function table pointer when the object is viewed as a B. These will point to two DIFFERENT virtual function tables. The first (A-or-C) virtual function table will be laid out in the usual way for single inheritance---it will include method fields for the virtual functions defined on an A, in the same positions as in a virtual function table for an actual A, but giving the corresponding methods for a C. It will also include additional fields, for C-specific methods. It therefore implements two interfaces to the class: C-viewed-as-an-A, and C-viewed-as-a-C. The second (B) virtual function table pointer, in the same position within a B part as the virtual function table pointer of a B, encodes the B interface to a C. Notice how this preserves the fast (constant-time) dispatching of virtual functions. If you have a pointer of type A that points to a C, the code for invoking a virtual function defined in A does not change: you index into the object to find the A virtual function table, and index into that to find the appropriate method. The fact that the virtual function table also encodes C-specific virtual functions doesn't matter---if the pointer is of type A, you won't be using those methods. You just grab the method and call it. Similarly, if you have a pointer of type B to a C, you still use the same code sequence to invoke a B method as if the object were actually a B. You have a pointer to the B part, and you index into it to find a B virtual function table---it just happens that what you get is a virtual function table for a C VIEWED AS a B. You index into that to find the right method for a B virtual function, and call it. If your pointer is of type C, then the first virtual function table pointer can also be used for C-specific methods---the virtual function table contains not only methods for virtual functions defined in A, but (following that) methods for all other functions defined on a C, including those inherited from B. (You should recognize that a language-level "interface" to an object---i.e., the set of virtual functions defined for a class---is implemented as a pattern of double indirections via a pointer of the appropriate type. For a given interface and a given virtual function, there is a fixed code sequence that will index into the object to find the virtual function table that implements the interface, and index into that virtual function table to find the method for the virtual function.) This trick complicates certain other pointer operations a little bit. If you have two pointers of different types (say, B and C) to the same object, then you can't just compare the bit patterns, because they're implemented as different addresses. In generating code for pointer comparisons, the compiler must keep track of the fact that different kinds of pointers point to different offsets within the same object, and spit out code that compensates. When you compare a B pointer and a C pointer, it must generate code to convert them to a comparable form---e.g., converting a C pointer to a B pointer (by adding the offset of the B part) before comparing it to a B pointer. This means that in C++, unlike C, a pointer cast may end up causing actual instructions to be executed at run time, rather than just being a declaration to the compiler at compile time. (Usually, in C, a pointer cast is an unsafe conversion---you're just telling the compiler to regard the address of an object as the address of an object of a different type. It's up to you to ensure that you're not doing something completely stupid and misinterpreting what's on the end of that pointer. In C++ however, some pointer conversions are safe, as in our example of converting a C pointer to a B pointer. That's safe because a C really IS a B, and the compiler knows how to do the conversion correctly, i.e., find the right "interface" to the object.) --- The C++ dynamic dispatch technique is EXTENSIBLE, in that we can define subclasses without changing the virtual function tables or dispatching code for the superclasses. We can lay out the virtual function table for a class by examining the class definition and the definitions of the superclasses. We DON'T have to look at the subclasses---the interface to a subclass will be the same as the interface to the superclas. (Maybe plus some other virtual functions.) In the implementation, that means that the layouts of superclasses affect the layouts of subclasses, but not vice versa. We can therefore compile code using some classes, and we won't have to recompile that code just because we link with code that uses subclasses of those classes. This was important in making C++ practical for use with normal operating systems and normal linkers. You can compile a library and know that you won't have to recompile it just because somebody wants to link it with code that defines subclasses. To compile the subclasses, you do have to have enough information about the superclasses to tell how they're laid out---that's what C++ HEADER FILES are for. You can separate out the information about the fields and virtual functions of a class from the definitions of the methods. One nice advantage to this is that you can share (or sell) binary code, without giving away all of the source code. --- Unfortunately, the C++ dynamic dispatching technique is NOT INTEROPERABLE, in that code compiled with different compilers won't generally link into a usable executable All (or almost all) C++ compilers use the same basic technique for laying out classes and doing dynamic dispatching, but they don't do it exactly the same way. Some compilers lay out multiply-inherited classes with their superclass parts laid out left-to-right, while others do it right-to-left. They also may represent their virtual function tables differently, so that the offset of a method within a virtual function table will be different with one compiler than it is with another. If you try to link code generated with different compilers, it usually won't link. If it does link, the code will not necessarily run properly, because the table and method pointers generated by one compiler will be at the wrong offsets when code generated by another compiler tries to call them. This is an annoying stumbling block with C++, compared to C. You can't buy a compiled library and expect it to work with your code unless you're using the same compiler. Library vendors must not only provide different binaries for different combinations of hardware and operating system, but also for different compilers. Eventually, compiler vendors may get together and agree to use interoperable calling and dispatch conventions, but they haven't done that yet. -------------------------------------------------------------------- Other Dispatch Table Schemes The C++ scheme works pretty well for single-dispatching and multiple inheritance, as long as the compiler can see enough of the program to know how objects and their virtual function tables will be laid out. It also requires that the rest of the system be able to cope with the fact that a pointer "to" an object may not actually point to the beginning of the object. (This could confuse a garbage collector very seriously, but you can work around it.) Two things can complicate the picture very seriously: dynamic changes to the code for a class, and multiple dispatching. If it's possible to add new virtual functions to a class, then all of the subclass tables need to be updated, and lots of offsets recomputed. If multiple dispatching (for multimethods) is used, then we can't rely on the simple prefix property of virtual function tables. --- Global Analysis, Perfect Hashing, etc. If all of the code for a program is available at compile time, the compiler can design the function dispatching code and dispatching data together. For example, if we use multiple dispatching, we may not be able to use C++ virtual function table layouts, but we may be able to design a layout that works well for a particular program. One example of this is for the compiler to devise a custom hash table for the particular program in question, and a custom hash function to make it work well. For any given predetermined set of values, we can derive a "perfect hash" function that will yield no collisions, and guarantee constant-time access. Unfortunately, we can't in general guarantee that this table will be small---in order to ensure that there are no collisions in the hash table slots that are used, we may have to make the hash table larger than strictly necessary, and leave slots that are unused because nothing in the program hashes to that value. ------------------------------------------------------------------- Limited Incremental Development and "Sealing" One way to trade flexibility for efficiency is to limit the degree to which programmers can change programs, so that the compiler can use fast representations and compile good code. The Dylan language supports "sealing", which means declaring that a module will not change. The idea is that you can develop code interactively, but when you're happy with a module, you "seal it", making it unchangeable, so that the compiler can use techniques that are fast but do not support incremental development. In Dylan, you can seal classes, generic functions, or whole modules. A sealed class cannot be subclassed, a sealed generic function cannot have new methods defined on it, and a sealed module cannot have its classes or methods redefined. (I'm not sure what else sealing a module prevents you from doing.) One important example of this is when you're using a system where the compiler itself is just code in a module. When you're incrementally developing other modules, but not modifying the compiler, you don't need to change the compiler module. If the compiler itself is sealed, it can be compiled to fast code, while allowing you the ability to redefine things in other modules. Another important example of this is when you're developing a program in layers. After developing and testing the time-critical code for low-level operations, you may seal the code and recompile it to make it very fast. Then you can still incrementally develop the higher-level code. This same general strategy works for any well-identified module that you're confident is correct (or correct enough for the time being), whether it's low-level code or not. Sealing is analogous to the way many systems allow you to declare functions to be inlinable---you promise not to change things in ways that will undermine the compiler's assumptions. Notice that this is a generalization of the idea of having a HYBRID object-oriented langauge. In a hybrid language, you can't define new methhods on some existing operations, or subclass some existing classes. (For example, in most Lisp-based O-O systems, you can't define new methods on CAR, because car is a monomorphic procedure, not a generic function. You also probably can't define a subclass of , because that would complicate common operations on normal lists.) Sealing essentially allows you to define your own hybrid language, limiting the flexibility where it's worth it to get better performance. For example, in a graphics program, it's likely to be worthwhile to define a sealed class, and a sealed set of monomorphic operations on rectangles. Then the compiler can compile very fast code for rectangles, with no dynamic dispatching overhead except simple monomorphic type checks. ------------------------------------------------------------------- Exploiting Type Declarations We've seen that the C++ type system ensures that we can do fast, constant-time method dispatching even in the presence of multiple inheritance. Type information may also be helpful in other languages, such as Dylan, that support general dynamic typing but also the ability to declare the types of some values. In Dylan, you can declare the type of a variable (whether it's a local or module variable, or an argument to a procedure or method). Since dynamic typing is also available, the compiler must emit type-checking code in cases where the value being assigned to a variable (or passed to a procedure or method) cannot be statically determined to be of the right type. Subsequent uses of that variable's value can depend on the type information. If a variable's type is a sealed class, the compiler knows the exact concrete type of the value---since a sealed class cannot have subclasses, there is no ambiguity as to whether the value is of exactly the declared type, or of some subclass type. In those cases, the compiler can take advantage of knowing the exact type of a value, compiling calls to methods that don't do type checking, or even inlining the methods at the call site to avoid the dispatching and calling overhead, and to enable optimization across the call. ------------------------------------------------------------------- Exploiting Expected Behavior A compiler can generate better code if it is able to predict the run-time behavior of a program. In terms of type dispatching, it may be able to generate code that is very fast in the usual case, but slower in the less common cases. On average, this will improve performance. This kind of optimization is often done by hand, in an ad hoc way. For example, many Smalltalk compilers assume that the receiver of a + message is most likely an integer. The code for the + message first checks to see if its arguments are integers; if they are, it just does an integer addition and returns the result. If not, it goes through the more general dispatch mechanism, checking in a cache and looking up the inheritance hierarchy for the appropriate method. The code that's inlined into the caller is the code for the checking and the usual case. If the arguments are not integers, it does an out-of-line call to a more general routine that can handle the general case, inheritance lookup and all. This is usually a big win in a Smalltalk system, because + usually IS invoked on integers, in particular for loop control variables. In the usual case, the arguments pass the integer type test, and are added together, and it's only a few instructions total. In the worst case, those instructions are mostly wasted, and the normal type dispatching is done. This basic trick can be generalized by PROFILE-GUIDED code generation. A "profile" is a set of statistics about how often different things happen in a program, e.g., how often the arguments to a given call at a particular point in the code are of a given type. If such a profile is available, the compiler can use it to guide code generation, compiling code that will be fast in the usual case, and slower in the infrequent cases, which matter less. One way of obtaining a profile is by "instrumenting" the code, i.e., modifying it to keep statistics about what actually happens at run time. Typically, instrumented code is significantly slower than normal code, so you don't run instrumented code all the time. You pick some "example" inputs, and make a few "representative" program runs, trying to accumulate statistics that are reasonably close to what the program will do when you run uninstrumented code on other data sets in future. On the basis of the accumulated statistics from the profiling runs, you recompile the program, optimizing it for the most common cases, and hope that it will run faster on whatever data it's presented with in the future. Another way of doing this is by using a DYNAMIC COMPILER, which recompiles code periodically based on what has happened in the past. In the more recent versions of the Self compiler, for example, the compiler first generates relatively slow instrumented code, which keeps statistics (in a crude way) about its own execution. As the program runs, frequently-executed pieces of code are recompiled to perform better for the kinds of execution patterns observed so far. This is a very tricky way of doing things---to make it worthwhile, the instrumentation necessary to guide the recompilation must not slow the program down more than the recompilation speeds it up. You also have to assume that what the program has done lately is indicative of what it's likely to do in the future. (In the worst case, this could always be wrong, but in the usual case it's a pretty good assumption.) --- One of the ways that this TYPE PREDICTION is especially likely to pay off in an object-oriented system is that if you send multiple messages to the same object, then you know that the receiver for all of them is going to be of the same type, e.g., in the code for foo(bar) baz(bar) quux(bar) you know that once you've determined the class of the bar object for the purposes of dispatching the foo message, you'll dispatch to the same class for the bar and the quux messages. If you can guess at compile time what the class of bar is, you can compile this to a simple type test followed by the appropriate code for foo, bar, and quux applied to that type. Suppose that you expect bar to be an integer, for example, and that you have specialized code for foo, baz, and quux assuming their arguments are integers. You also have general code for foo, bar, baz and quux, which works for any type that bar might turn out to be. You can compile that to if (INT_P(bar)) then { int_foo(bar); /* inline code for integer version of foo */ int_baz(bar); /* inline code for integer version of baz */ int_quux(bar); /* inline code for integer version of bar */ } else { general_foo(bar); /* general dispatch to foo */ general_baz(bar); /* general dispatch to baz */ general_quux(bar); /* general dispatch to quux */ } If you've guessed right, and bar is usually an int, then usually you'll do one type test, and then execute the appropriate foo, bar, and baz code for that type. If you've guessed wrong, then you'll waste a type test, and go through the general dispatching overhead for foo, bar, and baz. This is likely not to lose you much, though, because a single simple type test is pretty cheap compared to the full dynamic dispatch you have to go through if you guess wrong. And if you guess right, you win big because the (inlinable) special-case code is likely to be much faster. --- Inlining "self" methods --- --- ---- --- --- ---