Proposed Tagging and Generic Dispatching Scheme for RScheme and/or the GNU Extension Language Goals A tagging scheme that allows both fast type checking the conventional Scheme sense and fast generic function dispatching for OO code. Minimal space cost, in particular, not requiring that pairs (cons cells) have headers. Design Tagging: 1. All heap allocated objects are allocated on 8-byte boundaries. This means that raw addresses always end in 000, i.e., you you can use 3-bit low tagging. 2. Pairs have no headers, hence must have their own 3-bit tag. 3. All other heap-allocated objects have a 2-word header which includes a pointer to the appropriate class object. 4. Integers are 30 bits (NOT 29) with a two-bit low tag of 00. This keeps array/struct indexing fast: you can add a tagged int to the address of the array with no masks or shifts. (The two-bit 00 tag in effect means that integers are pre-shifted to be word offsets...). A 3-bit tag of 000 is really a two bit-0 tag plus the low bit of an (even) integer. A 3-bit tag of 100 is likewise a two-bit integer tag plus the low ibt of an odd integer. Integer tag checks can thus be implemented as though two-bit tagging was used, and we also maintain the usual benefit of 00 tags for ints---we can add or subtract ints without stripping the tags out, because the 00 tags will add or subtract to yield 00 tags. 5. Noninteger immediate values (e.g., ASCII characters, booleans) have a secondary tag field immediately above the primary tag. This will probably be three bits. Dispatching: There are two basic kinds of dispatching: * simple type checks for most built-in classes, and * general dispatching, which requires fetching a class pointer of the kind of object in question. To a first approximation, we can do type checks for immediate (nonpointer) values just by checking the tag bits, but we'll use a simple trick to allow the same fast code to work for pairs. For non-pair pointer types, either built-in or user-defined, we can do simple type checks (e.g., vector?, port?, some-user-defined-class?) by checking that the tag is a nonpair pointer tag, extracting the class pointer from the header of the object, and comparing that to the class pointer of the type we're checking for. General dispatching requires a little bit of cleverness to make fast. To a first approximation here's what we do: We extract the primary tag bits and see if we've got an immediate value. If so, we use primary and secondary tag bits as an index into a table of class pointers for immediate values. If not, we grab the class pointer from the header of the object. We have to enhance this a little bit to make it fast and to enable it to deal with pair pointers: We generalize the table lookup to use the primary and secondary tag bits UNCONDITIONALLY, even if the value in question may not really be an immediate value with a secondary tag. No matter what kind of value it is, we take the low six bits and use them as an index into a 64-element array of class pointers. For immediate values with secondary tags, there will be a unique entry in the 64-element table. For other kinds of values, with shorter tags, we will put multiple copies of the class pointer in the relevant slots of the table. E.g., for integers, we'll have a copy of the pointer to the integer class object in every slot of the table whose offset ends in 00. For pairs, we'll have a copy of the pair class pointer in every slot of the table whose offset ends in the 3-bit pair tag. For "other pointer" types, we provide an escape mechanism. In every slot of the table whose offset ends in the 3-bit "other pointer" (non-pair pointer) type, we put a not-in-table flag of some kind. (Probably the integer 0, which has a bit pattern of all zeroes, which is easy to test against in one instruction on most machines) Now all we have to do to get a class pointer for any kind of value is this: Mask in the primary and secondary tag bits, irrespective of whether the value really has both primary and secondary tags. (Masking those 6 low bits in is faster than masking the upper bits out, because the mask will probably be an immediate bit pattern in the operand field of some instruction.) Use the resulting bits as an index into the table of class pointers, and load the class pointer. Check to see that we didn't get the not-in-table flag. If we did, load the class pointer from the head of the object (an indexed load). Assuming that most checks are for immediate values or pairs, the usual instruction sequence will be something like this: mask-imm 111111 ; get primary+secondary tag bits shift 2 ; shift it 2 bit positions to make it an index indexed load ; get table entry cmp 0 ; compare to 0 to see if we missed table branch (taken) ; If we miss the table, we won't take the branch, and we'll do an indexed load too. (To make that load fast, we have to ensure that the table of class pointers we're checking against is in some place we can load from quickly.) Depending on the instruction set, we may be able to branch-on-zero instead of doing a compare and a branch. We'll probably lose a cycle for the branch, but maybe not if branch prediction works. Bottom line: about 5 instructions if we're lucky with branch prediction and have single-cycle loads from a small first-level cache. Up to about 8 if we're unlucky. We can write three or four minor variations on this in C, e.g. reversing the branches of the IF statement, and see what works best, maybe on a platform-by-platform basis with an #ifdef or equivalent. Another thing to notice is that we've got some spare 3-bit primary tag values. So far we've used these: EvenInt OddInt OtherImmediate PairPtr OtherPtr That leaves 3 more values we can use for specialized pointer types that we want types for in our 64-element table. Based on profiling and thought, those might be allocated to closures (to speed procedure call checks) or vectors or strings. I'd be very cautious in using up the remaining tag values, though. Another possibility (maybe relevant to GNU ext. lang.) is to make sure that #f and '() have different primary tags, but tags which differ in exactly one bit. Then we can check for #f-or-'() by extracting those tag bits they have in common and checking to see that they're what we expect. We might use one of the tag values for headless double floats, but it's not clear that's worthwhile. If we're heap-allocating a lot of floats, we're probably losing already, and need to work on type declarations. (E.g., homogenous float arrays, type-declared float expressions.) Cutting the size of a float down to 2 words from 4 is probably not going to help, because the main costs will be the time cost of boxing and unboxing floats, and maybe the generational GC cost of tracking pointers to short-lived float objects from long-lived objects (e.g., general arrays). --- Supporting advanced (e.g., MOP-based) object systems Advanced language systems based on metalevel programming (e.g. metaobject protocols) typically divide compilation into high-level and low-level components. The goal is to divide the language into a BASE LANGUAGE which can be compiled to fast code using relatively conventional compiler techniques, and a META-LANGUAGE that allows higher-level constructs to be mapped onto the base language. This modularizes the language and its implementation, allowing the separation of high-level concerns from low-level concerns. The art of designing such a language system is to identify the critical issues that must be dealt with in the (relatively conventional) base language compiler, and to allow higher-level issues to be addressed cleanly in higher-level modules. (See Kiczales et al., esp. the book "The Art of the Metaobject Protocol".) The key step is to design the interface (e.g., a metaobject protocol) so that things that really matter to performance are exposed and made clear, while abstracting away from unnecessary detail. We think that for a fairly conventional OO-Lisp or Smalltalk-like system, the only real support we need from the basic language compiler is the ability to quickly get a class pointer for any object (see above) and support for FUNCALLABLE INSTANCES (see below). (Thanks to Gregor Kiczales for pointing out the latter.) ------ Supporting user-defined procedure types ("Funcallable instances" in CLOS MOP terminology) It seems to be a very good idea to leave two similar 3-bit tags free to represent different kinds of procedures. The basic idea is that along with being able to find a class pointer (quickly) for any arbitrary value, there is one more little bit of support that's useful to high-level object systems like metaobject protocols: being able to use a uniform calling discipline for either plain procedures or odd user-defined kinds of procedures. In CLOS terminology, the latter are known as "funcallable instances", i.e., things that are instances of some class, but also happen to be structured so that you can call them as procedures. An example of this is a CLOS generic function (or an instance of any subclass of generic-fucnction), which has environment and code pointers like any other function, but which may have other slots defined for its class. As far as the basic language goes, all that's necessary is that there's some layout common to plain closures and funcallable instances, and that it be able to check very quickly that something is one or the other. (Any other stuff will be handled by the high-level object system, so the basic compiler doesn't need to know about it.) In terms of defining a tagging scheme, it's desirable to reserve a primary tag for pointers to callable objects. Then the header of the object can be used to discriminate between different kinds of callable objects (e.g., plain closures vs. funcallable instances). The check for callability can just extract the primary tag and check it against a constant. --- Other uses of the basic dispatch mechanism (probably not worth it) Another possibility is to use the basic dispatch mechanism to do some other handy dispatching. If we use quad-word alignment, and have a few tag values to burn, we could refine the representation of procedure pointers to encode the most common argument patterns as well as the fact that a value is a procedure. E.g., we could have a different tag value for procedure-that-takes-exactly-two-integer-arguments or for procedure-that-takes-a-list-argaument. It's not clear that this is worth the hair in the interface between the high-level compiler and the low-level compiler, however, especially since the biggest wins will come from static determination (and inlining) of methods.