Erlkönig: The Development of Spatial Environments

show entire document

show outline only

(You can jump forward to the introduction to Z)

Note: This document supports subsection focusing if viewed in W3C DOM conformant web browsers such as Mozilla with JavaScript support. Simply click on any of the section title bars to shift the focus, or use the links to the upper right to show the entire document or just the outline form.

(While the subsection focusing is pretty handy, it has a bug regarding section numbers being affected by lists inside of the sections, and probably needs to be entirely reworked - oops)

Note: If some images are showing up entirely black, your browser probably doesn't support translucent PNGs. Upgrade!

This document is currently under construction

abstract

images of various topics from this document

This document provides a historical and analytic perspective on the ongoing paradigmic change of the Human/Computer Interface, or HCI, along with a hypothesis of what paradigm can be expected to follow.

Some of the more significant past transitions are described for evolutionary context, including technological and cultural factors which contributed to or inhibited their success.

In particular, the issue of backwards compatibility between each new HCI and its predecessor is analyzed both for its historical rôle and for its contribution to the adoption of future interface models.

Finally, one possible such model, that of the spatial environment, is described and evaluated for fitness and acceptance as a successor to the desktop paradigm ubiquitous in the early years of the second millennium.
human / computer interface history

Before looking at what the next paradigm change in HCI might be, a brief overview of past changes is in order.
1. punched cards and batch mode
  
  Jacquard loom, with card in front
  
  In Paris in 1801, Joseph-Marie Jacquard was the first to exhibit a commercializable loom based upon the earlier binary-programmable looms of Basile Bouchon and Jean-Baptise Falcon (around 1728), and the automaton inventor Jacques de Vaucanson (around 1741-1744). All of these looms controlled the production of intricately patterned fabrics under direction of what was effectively a program, but Jacquard's variant used a series of edge-connected perforated pasteboard cards which combined the resilience of medium and the open-ended program size independently offered by its predecessors. [fof01]
  
  Hollerith card from 1890 census
  
  Herman Hollerith adapted the concept of the Jacquard's punched cards into an input method for the mechanical tabulating machine he patented in 1884 pursuant to winning the competition towards tabulating the results of the 1890 United States Census.
  
  post-1928 80 column Hollerith card
  
  In its early round-hole form with up to 45 columns, and later in its canonical 80 column rectangular-holed version first seen in 1928, the Hollerith card formed the basis for much of computer input for almost a hundred years, from the 1880s to the 1980s.
  1. the punch card paradigm
    
    modern Hollerith card with JCL
    
    This canonical punch card, in combination with the lineprinters of the early mainframe computer age, supporting infrastructure and staff, and certain social effects, gave rise to the first widespread paradigm of human-computer interface.
    
    modern IBM 029 card punch
    
    During the 1950s and 1960s, when computers were first appearing at universities, computer programming was an incredibly abstract pursuit. To communicate with the machine, programmers created a series of punch cards, with each card representing an individual software command. Programmers would then hand the cards over to a central system administrator who would then insert them, one by one, into the machine, waiting for the machine to spit out a new set of punch cards, which the programmer would then decipher as output. This process, known as batch processing, was cumbersome and time consuming. It was also prone to abuses of authority. One of the motivating factors behind hackers inbred aversion to centralization was the power held by early system operators in dictating which jobs held top priority.
  2. frustration in the edit / test / correct cycle
    
    The punch-card interface came with many frustrations. Editing concepts such as line insertion and removal were done purely manually. Modifications to a line usually resulted in brute card replacement. Dropping a card deck was generally a significant disaster. Aside from these difficulties, even in the 1970's, for thousands of people the height of computing interaction could be roughly described thus:
    1. Spend hours on a noisy, complicated, often dysfunctional card punch.
    2. Consume a startlingly large number of blank cards in the process of trying to produce the error-free input typically required.
    3. Carefully cart the resulting pounds of cards to a computing center.
    4. Wait for the scheduled moment you'd reserved, often the preceding day.
    5. Offer your card deck to the mainframe operations staff for processing.
    6. Wait an indeterminate amount of time for the program to be run.
    7. Collect the reams of paper produced by the lineprinter as output, often emerging in a different or apparently random building elsewhere on campus.
    8. Cart it off to a place to contemplate the results.
    9. Probably start the whole process again due to an unforeseen bug.
  3. the priesthood
    
    stereotypical computation center, JP
    
    The blatant division of users from mainframe was most evident in the specialized rôle of the operations staff as intermediaries. Users were required to offer up their card decks to the operations staff as though to priests in some religious ritual. A deck could be refused based on past acceptance of a user's offering by the mainframe, as though the mainframe the deity to be pleased, deep in the shrine of the computing center. If such an offering did fail to execute, or actually negatively impacted the mainframe as a whole, the implications of the user having personally failed, annoying or even upsetting the god the priesthood was present to defend, could be quite palpable.
    
    This unfortunate marginalization of the users often gave the impression that computers were too big, powerful, and complex to be used by normal people. This mindset would delay the acceptance of interactive mainframe time sharing, as well as the idea of personal computers as relevant, capable tools. Moreover, the idea of having a mainframe of one's own, as seen later with the introduction of personal computers running Unix, would have conflicted directly with the mindset of the time.
  4. high latency
    
    One of the most pragmatically obvious factors of the punched-card paradigm, was the extreme latency present in the development process. For normal users, it was not uncommon to only have one opportunity in a day to submit one's card deck and hope for a successful run. This heavily impacted how quickly changes could be applied and tested, dragging the both the software development process and the use of that software to a crawl. Coding tended to be cautious, with heavy reliance on proven methods, since any failure came with such a significant time cost.
  5. benefits
    
    Despite all of the negatives, there were a few benefits to the punched card experience. Most notably, the need for physical access to card punches and card submission areas often fostered a certain level of social interaction and among peers, sometimes to the benefit of their projects. In a perverse way, the intrinsic lag of the development process encouraged conversation while standing around waiting for output.
  One rendition of this period, highlighting the issues of the cumbersome HCI, impact of systems operation, and high latency reads as follows:
  
  Despite the many practical concerns which had shaped computing into the batch-processed, punched-card and priesthood experience, the latency imposed by the need to serially share computing resources was an obvious detriment. Clearly a model was needed to grant users at least the illusion of having the computer to themselves.
2. printing teletypes and timesharing
  
  Ken Thompson and Dennis Ritchie on UNIX
  
  Printing teletypes were the major stage of computing after punched cards and printers, refocusing interaction to defer to the user, rather than forcing the user to defer to the mainframe. Finally, the mainframe would appear to await the user's every command, taking advantage of vastly decreased latency to accelerate development time, and most importantly, popularize a whole new style of interaction.
  1. lower latency and conversational computing
    
    ASR 33 printing teletype
    
    In the teletype paradigm, the command/response cycle between user and computer could be decreased to intervals of seconds rather than minutes, hours, or days. This inspired a profoundly different model of interaction, often referred to as conversational computing, which led to the creation of entire new operating systems, including TOPS-10, MULTICS, and UNIX. These supported not only far tighter interaction between the computers and their users, but even brought the computer into a new rôle as medium for real-time interaction between its users.
    
    Complex, monolithic batch jobs could be broken down into smaller, simpler interactions, enabling tighter discretionary control of jobs by users. Line editors flourished and replaced the task of manually editing card decks. Interactive programs and games were developed. Tons of paper were consumed playing SpaceWar on DEC timesharing systems.
    
    The priesthood had been removed from the human/computer interface, and the result to the users was not unlike emancipation.
  2. compatibility
    
    Compatibility with the older metaphor was still retained in some cases. Some teletype drivers were mere wrappers around the older card reader drives, simply making each line of typed input into a virtual Hollerith card. Similarly, sending output to a teletype was little different from sending it to a printer. Older applications dependent on cards could often be used transparently from the new teletypes, preserving the time and money already invested into such legacy software -- at the cost keeping cards' eighty (80) column line-length limit, still in evidence decades later.
    
    Even the batch job model garnered an extended life in conversational systems like UNIX, which allowed background jobs to be run in a detached manner which still left the console free for normal user interaction.
3. glass teletypes
  1. virtual paper
    
    Lear Siegler ADM 3 dumb terminal
    
    Glass teletypes were one expression for the next stage of computing after printing teletypes. Notable for reducing the risk of deforestation, quieter operation, and somewhat-improved speed, these combined keyboard and cathode-ray tube (CRT) terminals soon decorated the desks of many installations.
    
    The paradigm implemented was not particularly new, since the CRT for the most part emulated the behavior of the printing teletype, just without the box of paper. The screen's contents scrolled upwards after each linefeed in the same way that paper output would have been expected to, save for the detail of that text simply vanishing once having passed off of the top of the CRT.
  2. compatibility
    
    In general, many features of both printer and punched cards were retained for compatibility's (or laziness') sake. The eighty column line was usually retained, and the screen's text scrolled upwards as did the paper in a printing teletype, or, for that matter, a printer. In effect, these were more simple technological optimizations, rather than fundamental changes in the paradigm.
    
    As with printing teletypes, some implementations managed to interface the terminals to existing hardware by simply emulating teletypes, or even the card reader and lineprinter model. All editing for a line of user input was done locally on the terminal screen, and upon pressing a “send” button, the mainframe still effectively received a simulated punched card.
  3. foreshadowing
    
    The glass teletype remapped what had been a concrete process into a mere image, transforming a familiar physical interface into a understandable emulation of one. This was significant in a way that the mere reuse of card reader and printer drivers wasn't: it was perhaps the earliest widely accepted virtual human/computer interface, and would lay the groundwork for greater virtualization to come.
4. terminals
  1. cursor address-ability
    
    DEC VT100 terminal
    
    The glass teletype gave way, slowly, to the so-called intelligent terminal, a terminal with additional features outside the paradigm of printing teletypes. The most important of these new features was cursor addressability, which allow a character to be drawn an any arbitrary point on the screen, rather than in strict left-to-right, top-to-bottom order as the printing teletype metaphor would have dictated.
  2. editors and screen-oriented applications
    
    Cursor-addressability lead to the development of line editors with a visually-oriented, full-screen mode. One such, the Unix vi(1). editor, was itself built upon the older, traditional, printing-teletype style line editor ex(1). vi was groundbreaking not only in having a movable cursor to mark the point of edit anywhere on screen, but also in its interlock with the termcap (5). library, allowing the same program to be used on hundreds of different incompatible types of terminals. This same cursor-addressability also stimulation a new wave a game development, new types of monitoring tools, forms-based data entry, and so forth.
  3. faster connections
    
    All of these new applications, due to their tighter coupling of human/computer exchanges, added new incentive to increase the speed of mainframe/terminal connections and reduce latency. The early speeds of up to about 1200 baud (perhaps a line and a half of transmitted text per second) would be raised gradually to, and later past, 19200 baud (one screen of text per second) in an effort to address the these issues.
  4. compatibility
    
    Despite cursor-addressability, the new terminals generally defaulted to printing-teletype compatibility mode, allowing line-editors such as ex(1) and other legacy software to be used without difficulty. The eighty column custom persisted, and most terminals tended to scroll automatically if one generated a linefeed on the bottom screen line or tried to use the last screen position, making the bottom-right corner in particular into a kind of no-mans land, and complicating the writing of terminal-based editors for decades to follow. Users' demand for graphics was also on the rise, and some kind of union between graphics and text would have to come soon.
5. early graphics
  1. early admixture attempts
    
    Apple ][ Plus
    
    Early graphics developments happened along many different time lines, from the Video Display Storage Tube (VDST), to vector graphics, to the early highly modal text-or-graphics environments seen on computers like the Apple in 1976. All of these were attempts to broaden the range and presentation possibilities of representable data by adding graphics and color to text, or at least to use them as an adjunct to text.
  2. disjoint text and graphics
    
    Most of these attempts, although astounding for the time, were only marginally successful in advancing the user-interface paradigm. The graphics devices available in these disparate communities were not typically well suited to the display of text. As a result, a user either had to have both types of display hardware available, or use a single display device for which could be toggled between text and graphics. Serious attempts to merge these two disjoint mediums would be prohibitive until higher resolution displays became available.
  3. cultural resistance
    
    Due to the early adoption of color by games in the personal computer and video game markets, business users commonly perceived color as an unnecessary, expensive feature useful only for frivolous gaming and unsuited to serious work. This tended to keep color support off of the requirements list of business purchases, marginalizing computers and monitors with color support and preventing their early widespread adoption and commoditization.
6. the desktop
  1. Douglas Engelbart
    
    In the 1980s, many aspects of the desktop idiom were being commercialized in projects such as the Apple Lisa, Sun workstations, and other projects - many of which were inspired during visits to XEROX PARC in the 1970s. But PARC itself had taken much of its inspiration from Douglas Engelbart of the Stanford Research Institute (SRI). From him had come the pioneering synergy of the mouse, the windowed graphic user interface, hyperlinks, email, and videoconferencing -- in the 1950s and '60s, decades before such concepts would see widespread adoption.
    
    From an article by Tia O'Brien on Engelbart's 1968 lecture:
    
    Diehards can repeat Engelbart's opening line from memory:
    
    “ If in your office, you as an intellectual worker were supplied with a computer display backed up by a computer that was alive for you all day and was instantly responsive, how much value could you derive from that? ”
    
    Then, on a gigantic movie screen, he brought this new world alive. TV cameras switched from shots of Engelbart's hands working new contraptions called a mouse and a chord keyset, in conjunction with a standard keyboard, to shots of the computer screen where Engelbart was effortlessly adding, deleting and reorganizing a grocery shopping list. Like magic, the cursor moved words and thoughts.
    
    The world premiere of video conferencing was a show-stopper: Talking into a director's-style headset, Engelbart punched up his colleague at SRI, 30 miles away. “Hi, Bill,” said Engelbart as Bill's head filled the left corner of the screen, surrounded by text. “Now we're connected . . . let's do some collaborating.” The two proceeded to work jointly on a piece of text, passing the cursor and computer controls back and forth. Engelbart and his team had invented what's now called “groupware”; 30 years later it's hard to find software that allows you to do what they did in the demo - share control of a computer screen for sophisticated collaboration.
  2. merger of text, graphics, mouse, and sound
    
    the desktop port of the XEROX ALTO, 1970s
    
    Based on Engelbart's work, in the 1970s the XEROX PARC project developed the Altos, a new computer which featured a high resolution bitmapped display, a mouse as a pointing device, and Ethernet networking. The console fit comfortably upon a desk, with the main computer (disk storage and processor) being housed in a textured beige metal cabinet only slightly taller than the desk itself and placed on the floor alongside.
    
    the screen of the XEROX STAR c.1981
    
    These computers became some of the first full desktop interfaces seen outside of the lab, with fifty being donated to Stanford, Carnegie-Mellon, and MIT. Their use of graphics and text together on one screen, integrated with the mouse as a precise, intuitive pointing device suitable for a two-dimensional environment, quickly became the standard by which many others were judged.
    
    XEROX lost people like Charles Simonyi, the Hungarian implementer of the Alto text editor Bravo (designed by Butler Lampson), who later joined Microsoft and led the creation of Microsoft Word. (Some would argue that Word's interface has come full circle, and become nearly as impenetrable as Bravo's original cryptic commands.)
    
    XEROX management apparently had little idea as to what to do with this new computing paradigm, but word, and personnel, tended to leak out. After the donations and demonstrations of XEROX ALTO and STAR systems through the late 1970s and 1980, various Unix vendors and others also entered the realm of the desktop paradigm, chewing away at a market XEROX itself seemed unable to even identify. Later, Apple, after experimenting with these ideas in the largely unsuccessful Apple Lisa, would finally bring a simplistic desktop paradigm to the mass market with the Macintosh in 1984.
  3. compatibility
    
    At XEROX PARC (using the program Chat [alto01] ), and later on Unix workstations (via cmdtool, xterm, etc.), backwards-compatibility with the terminal paradigm was achieved through little pictures of terminals that behaved like the real thing - amounting to virtual representations of terminals which in some cases had themselves been virtual representations of teletypes - allowing the continued use of all older terminal applications within the new environment.
  4. hardware requirements
    The original Alto as used by PARC in 1972:
    - Bitmapped monochrome 606x808 display
    - 5.8 MHz CPU
    - 128KB of memory
    - 2.5MB removable cartridge hard drive
    - Three button mouse
    - 64-key keyboard
    It is common in the desktop paradigm to want to display multiple windows simultaneously, some stacked, partly hidden, or side-by-side for concurrent use, etc. It became clear to workstation manufacturers that a certain minimal resolution and monitor size would be necessary to view such windows with both readable text and a minimal amount of inconvenient overlap. The minimum generally adopted was typically a 19 inch diagonal graphics monitor with around one million pixels (or “megapixel”) visible, such as the 1152x900 displays seen on Sun workstations in the 1980s.
  5. sound as notifier
    
    Another interesting aspect of the desktop in the 1990s emerged as a side effect of widespread sound support on workstations and personal computers, largely as a side effect of the popularity of games and their associated sound cards. Many of these newer desktops have sound integrated into the environment as a notification and audio-feedback mechanism. Typically such audio alerts are strictly monaural sound effects, providing a simple monotonic cues to the user.
    
    Preliminary work has also taken place on network-transparent audio systems, providing for sound the same type of transparency seen for graphics in X and GLX.
7. 3D applications in the desktop paradigm
  1. XEROX Palo Alto research center
    
    XEROX/PARC UIL splashscreen
    
    The XEROX PARC project has done some interesting work in creating some 3-dimensional tools for the desktop environment, with a focus on data visualization.
    
    browsing an information base (click for animation)
    
    The Hyperbolic Browser, which uses diminution rather than perspective in order to provide logarithmic rather than geometric amounts of context. There is a Java-based interactive version on Inxight (as of 2004-01-26).
    
    display of an MSOffice directory and subdirectories
    
    The Cone Tree is a method for reducing the screen real estate required to usefully display a hierarchy of directories, by using a projection to cause the amount of space to represent a given directory to scale by the square root of the number of entries instead of linearly.
    
    viewing a schedule centered on November of 1993
    
    The Perspective Wall allows a user to more easily retain context for a specific section of a table by displaying nearby data at a diminished level of detail. Although not all data is being completely displayed, the partial data is still sufficient, either in the data partially displayed or by the implicit hinting towards nearby data the user had previously viewed, to augment the context surrounding the section of the table being viewed with full detail.
    
    webbook view of a page on the Information Visualizer and George Robertson's homepage
    
    The WebBook provides an interface to a webpage and all of the webpages to which it refers, coding them to highlight whether each page is from inside the collection the origin page came from, or from outside of the collection, elsewhere on the web. It also supports viewing all of the pages simultaneously. [parc01]
    
    webforager, showing the Product webbook, with the desktop and bookcase context areas.
    
    The Web Forager workspace is arranged hierarchically (in terms of interaction rates) into three main levels: (1) a Focus Place (the large book or page) showing a page, a book, or an open book at full size for direct interaction between user and content; (2) an Immediate Memory space (the air and the desk), where pages or books can be placed when they are in use, but not the immediate focus (like pages on a desk). A Tertiary Place (the bookcase) where many pages and books can be stored.
  2. computer-aided design
    
    Computer aided design has historically been a big player in the 3D space, in the sense that architectural and simulations modeling frequently deal in real-world objects and scenarios. A great of deal of relevant work has taken place in the realm of creating and editing three-dimensional data, and the applications developed have been quite successful.
    
    Somewhat in contrast to the 3D nature of the models they manipulate, CAD programs tend to use traditional planar manipulation of projections of the object - just as seen in classic paper draftsmanship and blueprints - in a style now considered a standard aspect of the desktop paradigm. This approach usually interferes significantly with the use of multiple applications, since in order to adequately display the complex objects and the CAD program's flat icon-based interface, the application typically commandeers the entire screen upon initialization.
  3. modern computer games and simulations
    
    Modern computer games, as well as certain simulation packages such as flight simulators, have made incredible inroads in providing real-time simulated 3D environments in consumer-accessible hardware, and some even bring multitudes of users together into environments presented as shared realistic spaces to those users.
    
    These environments are typically tightly tailored around a particular gaming or simulation scenario, and so naturally limit users to the tools and actions endemic to that specific environment. Moreover, the environments themselves are often written in widely disparate and incompatible underlying toolkits, limiting the extent to which even the developers can rely on or leverage the work of their peers. The toolkits themselves are often tied to proprietary and nonportable underlying libraries, best exemplified by Microsoft's ongoing determination to lure developers into its own proprietary 3D visualization libraries (and thereby into a full web of nonportable subsystems), instead of the higher end and nearly ubiquitously portable OpenGL. Problems like these leave development both fragmented and often redundant, hurting both developers and their end users, and cursing everyone with balkanized, non-interoperable, lower-utility 3D environments.
directions for future growth
1. application interface interpenetrability
  
  Application interface interpenetrability refers to the ability for the user working within the context of one application to access the facilities of other applications without departing the original context, and without requiring the programs themselves to have detailed knowledge of each other. This is implemented by placing the disparate application interface contexts in an enclosing frame of reference. This frame of reference can be something as simple as a nested three-dimensional coördinate system.
  
  By way of example: consider that if one is wandering around in some 3D world online, such as one of the many multiplayer fantasy role-playing games, it's difficult to attach a mail-alert icon for your main Internet email account to the side of your in-game helmet visor. Nor is it easy to bring up an instance of Mozilla within that world where you can read said email and show it to another player in that world. Forget doing a copy-and-paste of some sinister-looking tree from the game world into your email as an attachment, or attaching a structural capture of the (interpenetrated) terminal screen the other player just showed you. Adding your own private annotations - in the form of a waving smiley animated model - to the original tree itself for a particular friend and co-player to discover later just isn't going to happen. All of these possibilities are unavailable because you normally can't access the behavior of one application by tunneling through the interface provided by another.
  
  Exceptions can be made for the X window system's concept of window managers, which pass the majority of input events through to applications about which the window manager has no specific knowledge, and through which output events (graphic updates) are for the most part visible to the end user without window manager interaction or knowledge. It is also noteworthy in X that panels, intended to remain on the desktop, can be used to contain and organize the active running windows of other applications (sometimes called widgets), like a calender program, clock, load meter, etc., which were not specifically written for this purpose. In such cases, once the mouse pointer has been moved into the panel to interact with the widget's interface, the panel itself will be almost completely unaware of the following human/widget interactions.
  
  On the other hand, it's difficult to drop an arbitrary X application window/widget into the middle of another arbitrary X window and expect it to work. Hence, we can only claim limited application interpenetrability for X, since the containing application must specifically aid and abet the containment of the other. This is largely due to that fact that X apps must each maintain their own knowledge of the hierarchy of windows and widgets they're using, and thus importing a new widget (such as swallowing a running xterm already on the display) into that hierarchy must be a deliberate act.
  
  Outside of the X window system, the desktop paradigm is typically opaque and occlusive, especially so in subcultures where applications commonly run in fullscreen mode, requiring a crude and complete switch from one application to another without the user even being able to comfortably view two apps simultaneously. The idea of using normal windows to float a load meter over a software development GUI to monitor CPU saturation during compiles is basically impossible for most users with the default configurations of Windows and (pre OS X) Mac OSes (clicking in the dev GUI to use it will immediately hide the load meter), and interpenetration to insert the load meter into the software development GUI's interface is simply unavailable.
  
  While the common need to display rectangles of text is not expected to change, clearly many desktop tasks have been somewhat arbitrarily divided from one another in enclosing rectangles, especially in those apps where subwindows are caged within the application's primary window. Shaped windows provide an alternative approach, but given the lack of any type of depth cuing or parallax on the desktop, it becomes difficult to distinguish shaped applications from one another. The simple lack of space available in the windows of the desktop paradigm is a significant factor preventing interpenetration, since usually no suitable hole exists into which to drop the interpenetrating application window.
  
  In summary, a different method of task visualization, and direct support for interpenetration (as well as at a level underneath that of the applications could provide for a better integrated and more useful interface than that commonly seen in most of today's implementations of the desktop paradigm.
2. shared environments
  1. collaboration
    
    Popular desktop environments are generally built upon operating systems with support for sharing resources such as memory, cpu, filesystems, and so forth. These underlying systems typically provide support for various programs supporting collaboration such as email, internet chat, and many others.
    
    To project the interface of the locally-run X program xeyes to the remote X display of the hypothetical host far.z.org, one once simply ran:
    xeyes -display far.z.org:0.0
    As of 2010, using SSH tunnels for safety, the last part usually become a reference to a tunnel on the local computer.
    
    However, of the X, Windows, and Apple desktops, only X has long explicitly supported (for over 20 years) a network-transparent window system permitting X programs to interact with virtually any X display on the Internet with simple, broadly supported methods (see sidebar). Yet, even in this best-case scenario, special application code is still required to share the program's HCI across multiple users' displays simultaneously.
    
    Programs like Emacs (an extremely powerful text editor) allow collaboration as simply as using a built-in command, make-frame-on-display, to add an additional editing window to the current session, projected to any X screen in the world that allows for the connection. Unfortunately, most X displays no longer allow such connections by default as a result of the higher level of background threat on the Internet, and the commands required to overcome this more-restrictive default are complex enough to discourage productive collaboration for many users.
    
    A environment where all task interfaces are automatically shared by default - by the nature of the environment rather than per-application code - would implicitly enable virtually all tasks to act as groupware, media of expression, and as potential adjuncts to other shared systems such as the World Wide Web. Further, the social interaction possible within such an environment would do much to counter the loss of the social interactions seen in earlier computing paradigms where human proximity and conversation was often a side effect of access to limited computing resources.
  2. task interface security
    
    Shared environments give rise to a number of practical and philosophical questions, mostly over the issue of whether one's fellow man is worthy of trust, whether one's comrades - or in a networked environment, total strangers - can always be trusted to protect one's work from misuse, damage, and destruction. If one subscribes to the view that a person changes moment by moment, then even the self of one's future can be treated with a certain degree of suspicion, especially if one has created an irreplacable work that one's future self may accidently mangle or forever erase. Further, task-interfaces are often viewed as representative of an active user, especially programs for instant messaging, email, and so forth. Securing these against misuse is even more important than the familiar controls on files.
    
    For a while, in the early days of X, a race condition could be exploited to screenlock a session yet leave a single application window usable to the public on top of the screenlock's protective shield over the user's more sensitive windows.
    
    The fact that we found this useful demonstrates that discretionary task-interface access controls are beneficial even in a crudely shared real-world environment, and should be more so in virtual environment designed for collaboration.
    
    Most operating systems have some intrinsic implementation of security with respect to the filesystem, and it's common to consider basing a security model on a underlayer of a filesystem shared between multiple machines. However, virtually all of the filesystems were designed with the idea of controlling permissions for a known set of users local to the system, and opening such systems to users from outside of that domain often leads to massive security penetrations. Of those designed with global access in mind, like the Andrew File System (AFS and OpenAFS), the heavy reliance on Kerberos and the mismatch between AFS permission and the underlying filesystem have proven problematic. No widely-deployed current filesystem appears to be an adequate basis for an open-networked task interface security system.
    
    A shared environment which allows for task interfaces to be developed within it must implement a security model which is well suited to large-scale, decentralized use and in a way that eases security development of apps to run within that environment. While the typical filesystem is not a candidate as a supporting layer, the concept of hierarchical access rights can act as a starting point for considering object permissions in a shared environment supporting nested objects. It's also possible that a new type of filesystem, potentially implemented via a FUSE userspace filesystem, would be able to address the access requirements.
  3. spreading the load
    
    Given a sufficient number of users, almost any service can be rendered unserviceable.
    
    Especially true when used in reference to Internet services, even in the absence of anomalous effects like denial of service and slashdotting, any shared environment seeking a broad base of users will eventually exceed the capacity of any single computer to host it. In order to mitigate this problem, the shared environment should be capable of spanning multiple servers as transparently as possible. In the ideal case, the computer of every participating user would contribute enough resources to meet or exceed the additional load imposed on the system by the user's tasks. In such a case, the loophole of “almost” in the truism might be realized, providing a shared environment for any number of concurrent users.
    
    Unfortunately this problem is much more complicated than a metric of mere CPU load would imply, and issues of network saturation with the environment's communications protocol, topological complexity over too much internode interconnectivity, and other architectural concerns can cause difficulties. It's technically quite challenging to build a system where the entire planet's population can enter a shared environment, each person move some representation of himself into a shared room, and then wave and see the other 6.4 billion little representations waving little arms back at him.
    
    One the other hand, most users would prefer to share their workspace with just certain friends and colleagues. In this scenario, all that's required is for any one user's computer to be connected to (or be the same host as) each other friend's or colleague's computer. Those users may be connected to yet other users, but by choosing to exclude indirectly connected users (or users exceeding a certain number of levels of indirection), the biggest potential problem of scaling is removed.
3. candidate applications
  1. shared environment gaming
    
    Xybots - a classic shared-reality 3D game supporting independent views for 2 players
    
    Massively Multiplayer Online Rôle-Playing Games (or MMORPG), do not typically allow users to bring along other 3D tools into the shared gaming environment. For example, a gamer will not typically be able to retire to her virtual abode in the game and pop up a terminal within the game environment allowing them to interact with some arbitrary computer on the net, nor pull up a web browser within the game to show to some other player.
    
    So essentially, the MMORPGs users are only allowed to use the gaming environment to play the game. While this is completely understandable, a more seamless integration of in-game and non-game tools might allow a player to overlap work and play in new ways, and explore new kinds of interaction bridging the game and non-game environments.
  2. collaborative CAD
    
    CAD tools generally handle 3D data as something to be viewed and manipulated through the customary two-dimensional desktop environment, and hence do not typically model their user interfaces in a 3D paradigm, nor do they typically attempt to provide for collaboration with other CAD designers in a shared space.
    
    It is this latter weakness that could most profit from a shared environment, which would allow multiple designers to work collaboratively together even across national boundaries, while allowing offsite nondesigners and clients to view the work-in-progress without having to install the primary design application locally first.
    
    VRML was intended to make VRML objects (documents, in a sense), widely available and editable. Unfortunately, the lack of a ubiquitous VRML browser has greatly hampered the adoption of VRML. This is possibly due to a combination of SGI's early Cosmo VRML project being closed-source and Windows-specific in a crucial early period, and the high complexity of later VRML standards impeding open-source implementations. As a result, VRML comprises a vanishingly small fraction of the content of the World Wide Web.
    
    This type of collaboration brings to 3D designers the same kind of ease that web designers and surfers have today by using desktops shared through tools such as VNC (or shared remote emacs sessions via M-x make-frame-on-display), and common SGML-derived data representations viewable through web browsers. Using the shared paradigm of the World Wide Web has led to an explosion of content on the Internet, and a shared, easily modified spatial environment could lend itself well to a proliferation and wide availability of 3D content as well.
  3. shared-forum course instruction
    
    A common idiom in course instruction in the field of chemistry involves a group of students, a professor, and the delightful or terrifying risk of someone's classwork exploding. Despite the drama, such environments offer a high level of interaction, the chance to see others' experiments in progress, and the presence of many simultaneous paths of communication. More importantly, the offer the opportunity to see the actual tools being used, barring incident, as they should be.
    
    In contrast, much of today's vaunted internet-based course instruction restricts users to a tightly-moderated chat forum controlled by the instructor, and the less dramatic but also rather less immersive model of the broadcast slide presentation as courseware. Interactivity is much lower, and hypothetical questions asked of the instructor typically cannot be immediately, practically explored in a way the entire class can see.
    
    However, when the class material includes computer-representable in-class assignments, such as a 3D modelling class for CAD designers or webpage layout for designers, much of the lab environment can be reproduced. A shared environment can provide any-to-any communication, the ability of students to see each other's work in progress, the ability for students to point at the instructor's materials in a way other students can see, and other typical aspects of lab-type training. Conversational audio can be used when asking a question in order to annotate the actual act-in-progress which raised it, being viewed in realtime by the listener, typically reducing the class time taken to process and answer the request.
    
    Indeed the MUPPETS program at the Rochester Institute of Technology is pursing many of these goals:
    
    The experience of the authors and others in delivering coursework to students in the area of multimedia programming have lead us to seek ways to enhance student involvement. Through capitalizing on research in the areas of gaming and virtual community social psychology, the authors plan to develop a Collaborative Virtual Environment (CVE). This CVE will be aimed specifically at engaging upper-division students in the education of lower-division students. The authors propose to build on existing research and technical developments in the field to design and construct a CVE and supporting infrastructure. This will be aimed at encouraging and rewarding student engagement and peer-knowledge transmission.
    
    One advantage over a physical lab is that task security can be applied in a way to prevent the accident modification of one's work in the virtual class environment. The student and instructor have write access, but other students can only view the virtual tools being used.
  4. other applications
    
    Some other possibilities which could profit from group involvement include:
    - Network operations and control
    - Software code reviews
    - The classic shared-whiteboard meme
4. backwards compatibility
  1. interacting with images of the past
    
    Under X11, and other graphics systems, one can run emulators of historic video games, work with console of ancient mainframes now tending towards dust and decay, see the boot sequence of the old Apple ][, use an old imaginary glass teletype by typing into a picture of one, and scores of other things. These simulations provide a link between the software of today, and the software and even hardware of the past, providing a sense of history and continuity.
  2. retention of legacy software
    
    Backwards compatibility with X
    
    More important than the feeling of nostalgia, however, is the the fact that these older interfaces, whether they be an obsolete game machine or a noisy paper terminal, were the environments required by software often still essential to some purpose today.
    
    It would be difficult, although not impossible, to design a new interface paradigm from scratch, and then create all needed software within that new paradigm, discarding all known user-level utilities and tools from the past. A more consistent interface can be achieved in this manner, but at exorbitant cost, considering the thousands of tools that would have to be rewritten or totally redesigned to build a complete system.
    
    On the other hand, by the simple expedient of simulating the legacy software's environment, almost all of the older software can be retained and made fully functional in the new medium. This immediately magnifies the value of any new environment, and reduces the barrier to entry for new users. The process has already happened numerous times, from punch cards and lineprinters, to printing terminals, to glass teletypes, to intelligent terminals, to window systems.
    
    This type of backwards compatibility with the desktop paradigm is a required attribute of any interface with a stated goal of widespread adoption.
the spatial interface
1. introduction
  1. questions raised by newer technology
    
    There are a number of assumptions made in conventional desktop design that, given the new technologies available for HCI design, now seem open to examination, including but by no means limited to:
    - Why should a task window have to face a user square on, instead of being angled back out of the way while the user works on another task?
    - Why shouldn't an email notifier be attached to the screen pointer, or arbitrarily to any available task window?
    - Why can't a player of a first-person perspective game such a Quake, in a rare idle moment, access the web through a browser within the game environment, and show it to another player?
    - Why can't the environment's default graphics model be display-resolution independent? (note: Sun's NeWS window system provided a valid two-dimensional solution to this problem)
    - Why can't task interfaces be effectively shared?
    - Why can't shared task interfaces rely on an underlying, environmental security model to provide discretionary access controls?
    - Why can't mentoring over the network be done without being restricted to the functional equivalent of internet chat and a slideshow?
  2. key concepts
    
    The are a number of general concepts are essential to discussing this area:
    
    A spatial environment is a three-dimensional environment wherein the user can, in general, freely position and orient his point of view and whatever task interfaces which may be in use. Where desktops primarily use a two-dimensional coördinate system with perpendicular axes labelled x and y, the spatial environment adds the third axis, z.
    
    A hierarchical spatial environment has the added attribute of using nested coördinate system transforms to ease issues such as allowing the user's viewpoint to be fixed within a particular frame of reference, permitting task interfaces to be attached to the user's viewpoint, and having window-manager style manipulation of other tasks' positions and orientations within the spatial context.
    
    A shared environment posits that multiple users or user agents (programs) can work within the shared context. It is desirable for any shared environment to have access controls of some kind to establish various levels of privacy and other characteristics.
    
    A networked environment is an environment which can be accessed remotely, presumably across the Internet.
    
    A distributed environment is one where environments which might otherwise have been isolated may be interconnected to allow for larger, more complex environments, or better performance under load.
    
    A spatial human/computer interface is the input and visualization paradigm providing a front end to a spatial environment.
  3. the third axis
    
    left handed mnemonic
    
    right handed mnemonic
    
    +X +Y +Z
    
    The position of a shape in desktop-style environments can be described using simply the horizontal and vertical axes x and y, along with a measure of rotation within the xy plane which will be called roll here. In a spatial interface six control axes are required to completely describe a position and orientation in three-dimensional space. These axes are given below in the right handed, conventional mathematics system. Notes for left-handed coördinates, which are hardwired into the API of some graphics libraries, are marked by , but are presented here for comparison's sake only.
    
    In this document, an OpenGL-typical coördinate system is generally assumed (+X right, +Y up, -Z forward from the user) unless otherwise specified.
    
    Left handed cartesian systems flip the z/-z axis, so rotations in planes containing z are also reversed
    
    right handed xyz axes; +z towards the viewer
    
    x
    
    The lateral, increasing left-to-right axis.
    
    y
    
    The vertical, increasing bottom-to-top axis.
    
    z
    
    The longitudinal, increasing far-to-near or near-to-far axis.
    
    Rotations are typically described as being around specific axes, in the simple cases around X, Y, and Z, although rotations can more generally be specified around a vector in any arbitrary direction.
    
    It's not uncommon for flight simulators to use a left-handed coördinate system, with -Z pointing out through the nose of the plane and +X out through the left wing, so that the +/- signs for pitch, yaw, and roll agree with a pilot's standard notion of things in real life, and map conveniently to instrumentation in the user interface. However, in the science/technology fields where mathematicians have greater sway (taking NASA as the specific example) right-handed coördinate systems are used for virtually everything, including spacecraft, each with its own particular assignment of +X, +Y, and +Z. [nasa01]. For this reason any assignment of pitch, yaw, and roll to the various rotations should be viewed as use-specific and not as a canonicalization of those terms,
    
    A humanistic description of rotations, assuming viewpoint at the origin:
    
    xr / rotation around X
    
    Facing towards +X, with +Y above, lean right left towards +Z.
    
    yr / rotation around Y
    
    Facing towards +Y, with +Z above, lean right left towards +X.
    
    zr / rotation aroudn Z
    
    Facing towards +Z, with +X above, lean right left towards +Y.
    
    Note that if the right hand is held with the four fingertips curled back to face the palm, and the thumb extended in the direction of +x, +y, or +z, that the fingertips point in the direction of positive or negative rotation around the axis.
    
    Left handed rotations,
    DirectX view, +Z is away:
    
    Left handed, sign-matched
    pitch, yaw, and roll, with
    aircraft if -Z is forward:
    
    Right handed rotations,
    OpenGL view, +Z is closer:
    
    Right-handed, -Z forward,
    inverted signs on roll, yaw
    if compared to aircraft:
    
    Alternatively, if facing the palm of the hand in the appropriate of the two hand-based mnemonics shown, the positive rotation around any given axis can be found by rotating counterclockwise or clockwise through the shorter distance between the fingers for the other two axes. Hence, for Z axis rotations (around your middle finger, not your index finger), positive rotations go counterclockwise clockwise from the thumb to the index finger.
    
    The NASA Orbital Body Axis Coordinate System, Note the right-handed system with +Z pointing down. [nasa02].
    
    The desktop paradigm typically makes no use of tilt (roll) for the simple reason that tilted text is far more difficult to read on the low-resolution displays that dominated during the early decades of its development. In the desktop the z axis only had the token rôle of determining which windows were visible where windows overlapped, or in certain environments (the Amiga), individual shadowcasting distances for windows (using the Amiga dropshadow program).
    
    In a spatial environment, many more options exist for moving objects out of the user's way, tilting them to allow other objects to be seen, flipping them over to look at their backs, placing them into the user's heads-up display to follow the viewpoint around, or simply scaling them up or down in size. In short, all the axes can be used more flexibly than in the desktop paradigm.
  4. advantages
    
    Where most tasks in a desktop paradigm involved the manipulation of planar objects such as window, icons, and menus, in the spatial paradigm tasks commonly involve the manipulation of three-dimensional objects. As a corollary, this means that many of the manipulations will be spatial, so rather than sliding an aircraft around in several different planar projections alternately to achieve a specific position and attitude (or even laboriously adjusting 6 separate linear slider controls), a user might now virtually grasp the aircraft and move it simultaneously in all the projections until the desired effect is obtained.
    
    Of course, the same manipulations can be applied to objects for which they would be inapplicable in a desktop environment, such as angling a terminal window partway to one side, or affixing various interface gadgets such as clocks and email notifiers into the viewpoint's frame of reference so that they will then follow changes in the user's point of view - even if that point of view is entering some simulated enclosing environment such as a first-person perspective game or a CAD model.
    
    The use of an underlying three-dimensional model removes the pixel-mapped desktop's bias towards limited integer dimensions, providing a much smoother mechanism for scaling, display resolution independence, and greater dimensional range.
    
    Window permissions have been implemented for X, in a sense. There was a University of Texas project to implement security classification restrictions throughout the OS and X, and thus mouse-highlighted high-security data could only be pasted into X windows of equal or higher security classification.
    
    A spatial interface also lends itself very well to the prospect of sharing an environment. A desktop environment has a fairly limited planar surface to share, and lacks permission constructs to allow for selective protection or hiding of specific objects. By contrast, a spatial environment has vast amounts of space both inside and outside of the current viewing volume, which when combined with object permissions allows a great number of users to peacefully share computing resources and selectively enjoy a more sociable atmosphere.
    
    In particular, this endemic sharing supports the users' abilities to mentor one another, and collaborate on certain types of tasks without the need for each application to explicitly support these notions. This sharing becomes even more powerful when made available over a network due to the vastly greater number of potential concurrent users, although the correspondingly increased need to distribute the load across multiple host computers is an issue.
2. hardware
  1. output
    
    The need to display more yields a different output model than that from the desktop paradigm, and changes the baseline hardware requirements.
    1. perspective display
      1. nonphysical depth cuing
        
        Virtual worlds, and spatial environments in general are customarily presented as perspective projections onto the user-facing surface of one or more display monitors or projection screens. The believability of these images is typically enhanced with a number of depth cuing methods such as distance fog, saturation adjustment by distance, line attenuation by distance, and so forth. Further, animated worlds now generally use a projection of the environment which produces parallax effects as the user's viewpoint travels. While these methods are good, without presenting a separate image to each eye it can sometimes be difficult for a user to visually parse confusing environments, since all the above methods rely on cues largely processed without either major physical perception of depth: focus, and visual triangulation (fusion).
      2. options for physical depth cuing
        
        Although essentially no consumer-available apparatus exists to provide focus-based depth cuing, quite a number of options exist for providing fusion-based cues, and in fact there has been an explosion in 3D cinema, 3D gaming, and work other fields taking advantage of them.
        
        cinema and television 3D mechanisms
        
        The cinema has been taking advantage of its single, large viewscreen by using either polarization or color filtering - both allowing extremely inexpensive glasses to be worn by viewers.
        
        Home 3D movies are initially using 120 Hz television screens and shutter glasses (image alternation). These unfortunately entail a significant per-viewer cost, but new developments in screen technology is showing a likely long-term move to use one of the cinema based polarization mechanisms.
        
        personal computer 3D mechanisms
        
        Computer users have a wider range of options with their monitors, including image alternation with shutter glasses (widely supported by upper-end yet still consumer-grade graphics cards), splitscreen, and polarized solutions. This market is already quite mature and well commoditized, largely due to the extensive development of perspective in the first-person subgenre of computer games such as Doom, Quake, Thief, Half-Life, and so forth.
        
        head-mounted 3D displays
        
        For the more affluent computer users, instead of the single display screen, head-mounted displays, or HMDs, with smaller viewscreens can be used, to provide each eye with an independent view. These often have the distinct advantage of head-tracking support, allowing a user to simply turn the head to change the in-simulation view angle.
        
        Proview 60 head-mounted stereo display
        
        The HMD market is, however, far from being commoditized, and a middle class consumer would have difficulty affording any but baseline models with sharply limited resolution, often either monochromatic or monocular imaging, sometimes both, and with a very narrow field of view. In such cases, head tracking becomes almost necessary, since otherwise the extreme tunnelvision would make most environment nearly unusable. In such low-end devices, the expense of this feature, combined with the desire of the manufacturers to stay under a consumer acceptable price point, typically comes at the cost of sacrificing any improvement to the quality of the images.
        
        As of 2011, high end stereo vision hardware like the Sensics piSight HMDs supported up to 180° view angle, 4200x2400 pixel resolution, 24 bit color, and so on. The starting price range of ~$20k put them well out of the price range of a typical consumer, although the prices were entirely consistent with such an uncommodized market. In 2013, the Oculus project was working on an extremely low-latency, stereo HUD with a consumer-accessible price.
        
        environmental 3D displays
        
        Other options exist, such the CAVE systems which surround the user in screens on all sides, but are generally too rare, due to expense and space constraints, to include as widely-implementable methods.
        
        The easy availability of nonphysical depth cuing, plus various social factors in the typical workplace which work against the acceptability of workers being in 3D glasses or HMDs most of time, make it both reasonable and necessary to support the typical, nonstereo computer monitor as a baseline. However, the commoditized nature of stereo glasses, especially the polarized variants, make it easy for users to optionally enhance the immersive nature of their interfaces if desired. This mono-required/stereo-optional duality will be taken as the baseline to support for a spatial environemnt. This means that the software should fully support the 2D screen by using a perspective projection of the 3D world modelled, but should be trivially able to enable binocular support. For this reason, the binocular option must be implemented by the system, rather than being left to individual applications.
      3. hardware requirements
        
        The approximately megapixel display capability commonly used in the desktop paradigm is inadequate for 3D work, because in a spatial environment windows may now be nonparallel to the viewing plane, which tends to show severe text artifacting even on megapixel displays. Furthermore, more colors are required for smooth object color shading, far more than the 256 simultaneously displayable found on many inexpensive color systems with high pixel resolution from the 1990s. In addition, immersive environments using parallax-based perspective are often difficult to use if a certain minimum arc of viewing is not subtended. Lastly, fast texture mapping is needed, where fast means speed equivalent to that seen in faster graphics cards from the year 2002 timeframe, with at least 1024x1024 square textures to support the mapping of X window system framebuffers, and adequate bandwidth to support animation of those same textures. Pursuant to these difficulties, the following recommendations can be made:
        
        In 2003, a 20 inch diagonal 16x9 aspect display with 1600x1200 pixels with 1237 cm² of screen area, 1.92 megapixels, and typically 24bit truecolor was readily available and fairly inexpensive.
        Of course, resolutions of up to 9.1 megapixels (at 1429 cm²), or 48 bit truecolor with 281 trillion possible shades per pixel were available, but at five to ten times the price.
        In 2007, a 2585 cm² (30 inch diag), 4 megapixel display was around $1200.
        
        Resolution
        
        2 megapixels, to allow legibility of rotated text.
        
        Color
        
        16 bit truecolor (65536 colors) or better to reduce banding in the fog depth cuing. Note that although this is adequate, 24 bit truecolor is more effective, and trivially available.
        
        Textures
        
        Fast 1024x1024 or higher pixel textures, to allow easier mapping of 2D application onto surfaces in the 3D environment.
        
        PC graphics cards from around 2003 typically use AGP graphics busses up to about AGP 8x. This speed ws barely adequate for animation of a single X framebuffer as a texture using standard methods. This limitation could be mitigated by modifying the graphics subsystem to texture directly from the virtual X framebuffer in primary system RAM, or modifying the X server to render directly into a OpenGL pbuffer, or a texturing extension to OpenGL to allow manipulation of arbitrary subsections of (probably uncompressed) textures in the GPU, or faster AGP speeds, among other possibilities.
        
        By 2013, most of these problems were essentially solved, or no longer major impediments. Various X quirks still retard development, however, including limitations on subtexture manipulations, and draconian limits on sending synthetic events. X doesn't support any notion of deputizing an X client as a user agent, and distrusts events generated by such proxies.
    2. positional sound
      
      Although largely underestimated in applications outside of the gaming realm, sound provides the ability to provide blended alerts not only to a user for applications that may be out of sight - either buried under other applications off of the edge of the visible screen - but also to a user who has moved away from the computer console for some reason.
      
      This ability wouldn't have been particularly useful in the computing paradigm promulgated by Microsoft where typically only a single application could be run reliably at a time, which would then usually gobble up the entire screen and wait for user interaction, leaving little ambiguity over which application might be having a problem.
      
      Yet, in contrast, the Unix environment of the same era from around 1989 to 1994 had been characterized by virtual window managers such as olvwm, vtwm, tvtwm, and fvwm, where the majority of applications running could be off screen, some of the remaining windows might be partially or totally concealed even on the visible screen, and a significant number of windows might be candidates for generating an alert of some type at any given time independent of user interaction.
      
      In such an environment, audio cues can provide specific verbal information about the alert, while the positional aspect provides an immediate hint as to in which direction the window of interest can be found. In a spatial environment, where the alerting application might even be above the user's virtual position, the positional information which can be imbued on the audio stream can be even more useful. In a number of games, as well, direct 3D audio data could be advantageous and desirable to the player.
      
      Unfortunately the adoption of full positional audio is hindered by tradeoffs, as noted in this extract on the difficulties of the positioning of speakers (referred to as “transducers” in the upcoming quote) for full 3D sound, complete with vertically offset speakers:
      
      These systems use externally placed transducers to reproduce appropriate sound fields. This is cumbersome, hardware intensive and expensive. A different approach is to reproduce not the “free space” sound field, but its effect on our two ears. We only have two ears and all sound directions in real life are interpreted using them. A dichotic, audio signal played back over good quality headphones should reproduce the sound accurately and carry spatialisation information.
      
      Typical coplaner ITU 5.1 positions, listener center, up represents forward
      
      Yet despite the economy of headphones, many computer users dislike wearing them for long, or at all, and so naturally fall to the widespread assumption that coplanar audio outputs (such as the coplanar layout of 5.1 and 7.1 speakers) are sufficient. This coplanar bias has also been hard-coded into most audio drivers, retarding support for true 3D audio.
      
      Due to these inhibitors, it's fairly likely that most users in a spatial environment will still use legacy stereo or coplaner systems for a long time. However, the environment should retain support for full 3D positioning, for those users using headphones, and those few who are dedicated enough to audio to possess a full positional speaker configuration. This entails little more than retaining a y coördinate along with the x and z coördinates already required for planar positioning, along with the y component in any velocity vector.
  2. input
    1. keyboard
      
      Kinesis 510 USB keyboard
      
      The modern keyboard model is arguably adequate for textual input into a spatial environment, although typically some method will be required for selecting a destination for whatever input it typed, just as is currently done in the desktop paradigm.
      
      Although Emacs's modifiers may appear to be a victim here, it should be noted that in ten years of Unix classes, nonprogramming students exposed to equally to both Vi and Emacs were about evenly divided in preference after about eight hours spent using each.
      
      Keyboards usually have a number of modifier keys whose purpose is to change the effect of other keys on the keyboard, such as a held Shift key causing the “a” key to send “A”, instead. Additional modifiers such as Control, Meta, Alt, and so on have been used extensively in many environments, most commonly for special characters, abbreviated triggers for specific actions (hotkeys), or to provide a command set to allow the editing of text. However, novice users typically have difficulty when first faced with modifiers other than Shift, especially when combined or spanning letters to be input. For example, input puzzles for early Emacs users include:
      
      C-_ - undo
      
      keyed in as: control shift _
      
      C-x C-s - save current buffer to its associated file
      
      keyed in as: control x s or control x control s
      
      C-M-% - replace string fitting a pattern, asking y/n for each
      
      control meta shift % or escape control shift $ , or just meta ^& control ^& shift %
      
      Usually students have mastered the mechanism once they internalize the last one, where the order-independence of the modifier keys has been recognized. The same input concept appears in many applications, generally targeting advanced users, and posing a hurdle (or barrier) to many novices.
      
      Modern keyboards also often have various buttons which have been added with the stated purpose of controlling multimedia applications on the host computer. Although potentially convenient, these buttons are limited by the expectations of the manufacturer, and often don't map very well to the needs of the end user.
      
      It is unfortunate that the keyboard is rarely seen as an output device, and so almost never has reprogrammable text labels which might appear on the keys themselves to indicate to which user-defined function the key might be currently assigned. Applied to all keys on a keyboard, the effects of modifiers such a shift, alt-graph, greek, APL, or language-shift keys might become much more obvious, although at the expense of tighter integration with the HCI.
    2. mouse
      1. standard use
        
        R.A.T.™7 mouse with 6 obvious buttons (one the wheel)
        and 4 more logical ones used for wheel events
        Newer models often have more.
        
        Although the mouse is fundamentally a two-dimensional device, its rôle as tool for pointing and selection remains quite viable in spatial environments, although not in itself a complete answer for manipulating the selected objects. The presence of various buttons, usually three on the mice of Unix workstations in the 1980 - 2000 timeframe, as well as more recent additions such as the scroll wheel (the RAT7 mouse shown has 6 useful buttons and two axes of scroll wheels), have provided a number of opportunities to map these actuators to various functions.
        
        It is unfortunate that design decisions at Apple and Microsoft have produced millions of mice with but one or two buttons, but popular demand has mitigated this in recent years, and the third mouse button of the 1980s workstation has returned, often in the form of a thumb button, on many modern mice.
        
        In some Unix environments (DEC HPUX, Linux, probably others), an extra virtual middle mouse button is simulated on two-button mice by actuating both buttons together. Theoretically, on a standard three button mouse, four additional virtual mouse buttons could be obtained with various such combinations of the three present, but this somewhat pathological extrapolation has not been seen in practice.
        
        Another approach to working around the limited number of buttons on a mouse is to use double- and triple-clicks on an existing button. However, this is difficult for some users and thus poses accessibility concerns, as well as contributing to repetitive stress injury (RSI) and carpal tunnel syndrome (CTS) in some users (including this author). It also contributes to HCI confusion for most novice and many intermediate users. Due to these issues, the practice of multiple clicking is deprecated for purposes of this paper and will be avoided in most cases. Those vendors using double click as a fundament of their HCI design can hardly be lauded for it.
        
        A less-detrimental solution to the problem of limited buttons is seen in the use of modifier keys in combination with mouse buttons, allowing a left-mouse-click while the Control key is held down to be mapped to a different action than that to which a left-mouse-click (without Control) would normally be assigned. This HCI feature is seen across virtually all systems employing a mouse and keyboard, including Unix, Microsoft, Apple, and nearly all other desktop systems. Although posing accessibility concerns for users limited to single-hand or single-key input, these can typically be addressed by allowing the modifiers to be toggled through their corresponding keys instead of requiring those keys to be held. In contrast with the observation that novice users are familiar with the use of the Shift key, virtually any use of modifier-affected mouse buttons tends to confuse novices.
        
        Radial-displacement mouse mappings in Croquet
        
        Mouse mappings have been applied successfully to the problem of predominantly planar movement, as seen in this extract from Croquet, where mouse +y is mapped to velocity on -z, and mouse +x is mapped to angular velocity around +y:
        
        If you move the mouse underneath the cross hair, you will move backwards. Moving it right rotates you to the right. Again, distance determines velocity in this case, angular velocity, or the speed at which you rotate. If you are directly over or under the cross hair you will move in a straight line with no rotation. If you move directly to the left or right of the cross hair, you will rotate around your center without any forward or backward motion. If you put the cursor in just above and to the right of the cross hair you will move forward a bit and rotate to the right a bit all at the same time. This allows you to walk in a circle.
        
        However, in spatial movement when accessing other movements or manipulations, HCI's will typically employ one or two modifier keys, such as shift and control, used in conjunction with the mouse. Direct experience has shown that many users, especially less technical users, have trouble employing such modifiers correctly, particularly the one to which they ascribe the lesser significance. Users often end up trying to use both, which depending on the implementation might shift the HCI to yet a fourth interaction mode, or the extra modifier might be ignored, or override the first, etc.
        
        Games have provided more leeway for alternatives to overloading the mouse with extra movement mappings, typically using the keyboard to augment the two axes provided by the mouse.
        
        Some implementations of Descent, which demands the full use of the 6-axes of freedom it provides, actually support the Spaceball directly, obviating the mouse entirely.
        
        In FPS games like Quake and Descent the keyboard is often used for all translation and for the rotation around the z axis (roll), and the mouse is used for rotation around x (pitch) and y (yaw) in order to retain precise control of the game's shooting reticule. However, since nongaming applications typically require the use of the keyboard for other things besides movement - such as text entry - there've been various approaches taken to get the mouse to handle more than its native two axes.
      2. radial access to additional device mappings
        
        Although not a direct solution for spatial manipulation, one indirect approach, called a radial menu, involves selecting a manipulation axis based upon a brief initial directional stroke following the pressing of a particular mouse button. Various portions (usually contiguous) of the 360° of possible directions are mapped to a small number of modes, and the following mouse motion events are mapped through the stroke-selected mode until button release.
        
        This creates a small dead zone in the center of the possible gesture, since the pointer must pass a certain distance from the location of the button press before the gesture direction can be determined. This dead area can be used to allow the passthrough of normal mouse clicks if the gesture acceptance threshold is not reached before button release.
        
        Translations and rotations applied to objects through the radial menus should be done in a way that makes visual sense to the manipulating user, and having a way to map the initial mouse direction into a directionally consistent manipulation of the target object eases user familiarization, for those radially-chosen modes which correspond to such manipulation actions.
        
        It's thought that these mappings could be useful for users who dislike a paradigm which switches the 6-axis controller between movement and manipulation mode, since it gives the option of doing reasonable manipulation with the mouse. They might also be useful if no spatial controller is available, probably in conjunction with a special keyboard modifier or held mouse button (beyond the basic three) to enable them.
        
        In the example radial mappings shown below, the translations and rotations selected by the initial mouse direction are modelled after a view of (0,0,0) from a point along the ray through (1,1,1). Perspective drawings are given of each translational axis and circles around each axis of rotation.
        
        In most of the examples which use a 1st/2nd degree modal, the user never has to use the modifier or other mechanism necessary to access the 2nd degree mappings, since all the rotations can be performed, albeit serially, through the composition of 1st degree mappings.
        
        linear-left / rotate-right mouse mappings
        
        The mappings below are predicated on gridlike arrangement of controls, with linear controls on the left and rotational on the right.
        
        The 1st degree rotational controls (below, left) have the idiom of initiating movement from point on the object slightly up, right, and near to the user, and so generating (from top to bottom in the figure) roll, yaw, and pitch motion.
        
        1st degree radial mouse mappings
        
        2nd degree radial mouse mappings
        
        The 2nd degree rotational controls (above, right) have the same basic idiom.
        
        trifold symmetric linear / rotational mappings
        
        The mappings below are predicated on trisymmetrical arrangement of controls, With vertical motion and vertical pull rotation above, longitudinal motion and pull rotation in the lower left, and horizontal motion and pull rotation in the lower right.
        
        Axes x and y map to mouse motion as would be expected; the +z axis maps to the mouse's -y axis when z-axis mode is in use. The circles in the three planar quadrants (-x,y), (-y,z), and (x,-z) are each associated with the positive axes tangent to them - y, z, and x, respectively. In the rotational modes, what would have been a translation along the associated axis is mapped to be proportional to rotation in the circle's plane. Notionally, the circle's periphery is being pushed by the axis, so giving impression that rotational moves are simply linear pushes to visible sides of a sphere, thus forming a type of virtual trackball.
        
        1st degree radial mouse mappings
        
        2nd degree radial mouse mappings
        
        The 2nd degree version changes the rotations within a plane to be translations within the same plane, and the axial translations to be planar pushes on the side of the virtual trackball (at the origin) which would have been penetrated by that axis.
        
        Separate linear / rotational radial menus
        
        Users comfortable with using the mechanism for enabling the additional mapping layer might prefer a layout which provides all translations in one, and all rotations in the other.
        
        translational radial mouse mappings
        
        rotational radial mouse mappings
        
        However, this layout will generally require the user to be able to toggle between the translation and rotation mappings, which was not required in the 1st/2nd degree pair.
        
        A variant 1st degree radian mapping
        
        1st order (deprecated) radial mouse mappings
        
        A variant radial mouse mapping with an x-bias which flips non-screen-parallel rotations in order to make rotations around +y and +x better fit the mouse motion - however, this breaks the relationship between lateral motion and pitch, and could make user errors in moving out into a section more frustrating.
    3. 6 axis controller
      
      SpaceTec Spaceball 5000 USB 6-axis controller
      
      A paddle control can be used for any one axis, usually moving a simple bar along a linear path as seen in the classic games Pong and Breakout. The canonical mouse controls movement along the x and y axes, as did early joysticks. Modern joysticks often have control of 3 axes, typically mapped to pitch, yaw, and roll when used in flight simulations. A six-axis controller such as the SpaceTec Spaceball allows a user to change x, y, z, pitch, yaw, and roll simultaneously with a single hand and device.
      
      Various methods of applying a mouse to the problem of velocity, orientation, and object manipulation in 3D space exist, but generally exhibit an undesirable quality of being a modal interface which interferes with smooth control. This comes from the need for the user to constantly remap which axes are being controlled by the mouse in order to orient an object in a particular way.
      
      By contrast, 6-axis controllers allow graceful, modeless control at this basic level. Although 3D work such as CAD can obviously be done in the desktop model using a mouse, there are ergonomic and efficiency issues to be considered. Addressing each of these in turn:
      
      This study does not provide data on whether or not the objects within the task environment were manipulated more often or with more efficacy with the 6-axis controller, whether the controlled contributed to the exploration of additional activities within the task, or whether task completion was accelerated by the use of the 6-axis controller. Hence, this report is principally useful only for ergonomic factors rather than performance factors
      In physical measurements and user perceptions of 20 CAD subjects, using a two-handed working style (3D motion controller and mouse) vs. a one-handed working style (mouse) yielded the following results:
      
      Physical Measurements:
      
      Left hand motions were reduced 67%
      
      Right hand motions were reduced 64%
      
      Average muscle activity was 33% less
      
      Peak levels of muscle activity were 35% less
      
      Average and peak flexion/extension wrist deviation were reduced 57% and 34% respectively
      
      Perceptions:
      
      All nine body comfort metrics were rated better
      
      90% of the subjects would prefer to have a 3D motion controller available for their CAD use
      So clearly there were ergonomic benefits to using 6-axis controllers with a mouse instead of the mouse alone. (This composite controller model will be discussed in more detail as part of the subject of object manipulation.)
      
      However 6-axis controllers also provide for demonstrably less modal interaction with experienced users, allowing for complex adjustments in 3D space without the need to constantly switch which axes are being controlled by a single 2D input device such as a mouse or trackball. The existence of an initial learning curve as well as the resulting facility is seen in this summary:
      
      In a rather consistent priority order, subjects tended to concentrate on fewer degrees of freedom at a time during early learning stages and progressed to cope with more degrees of freedom together during later learning stages. Between horizontal, vertical, and depth dimensions, the horizontal dimension appears to take attentional priority. Between translation aspects and rotation aspects, translation appears to take higher priority. It was found that after 40 minutes of practice more than 80% percent of the subjects were able to control all 6 DOF simultaneously.
      
      In a number of studies, it is noted that use of the spaceball in combination with a “laser” or “ray-casting” technique for selection was a definite advantage in manipulating virtual 3D objects. This technique mathematically projects a ray from the user's idealized (i.e. monocular) viewpoint in real-world space, through the current cursor position on screen, and into the virtual world, where any object in its path is a candidate for selection. Typically the nearest object is chosen.
      
      3D interaction task with jet engine
      
      Here users were required to select the components of the object (an exhaust system for a jet engine) one-by-one and pull the object apart using the laser cursor. Users' performance in this task, and others, clearly indicated that: they were cognizant of the fact that the 3D cursor is an object in its own right in the scene, that assigning effective metaphors to the cursor improved its usability when performing selection, highlighting and movement operations, and that using a 3D pointing device reinforces the feeling of a direct engagement between the user and the environment. We also measured the speed of performance and the number of errors made: this task was one the users were able to complete more effectively using the 3D (laser) cursor and spaceball than using the 2D cursor and a mouse.
      
      These factors suggesting ergonomic and manipulatory fitness of the spaceball for working with 6-axis information will be taken as sufficient cause to use it as the representative but non-exclusive sample of 6-axis controllers for the remainder of this document.

Z

introduction

One tiny letter; one vast universe...

- Dawn Patterson, 1999-05-10

In this document, the illustrative environment named “Z” is presumed to refer collectively to a distributed¹, networked², shared³, hierarchical, virtual spatial environment and its associated spatial human/computer interface and other client programs.

(as of 2010, 1, 2, and 3 are still awaiting implementation)

Sheep heirarchy being manipulated near a 3D coördinate grid.
A yellow-highlighted sheep is being rotated.
Its green-highlighted sub-sheep follow in lockstep.

Z is modelled as a zone, which manages a structure of access-controlled, nested frames of reference, called zys. which can send various type of messages called zevents to one another, or to zys within other interconnected zones. In order to allow for a user-visible representation of this hierarchy, many zys have attached zobjects, which act as the visible representation of the zys, and themselves are composed of sub-zobjects and zops, which allow for basic graphic and audio operations.

Since the zone is based around the management of the zys tree itself, and the zys typically only refer to zobjects by reference - typically URIs - user agents are responsible for acquiring such resources on the own. Zones are thus left free to concentrate on handling relatively small zevents without bottlenecking on representational content such as image and audio files.

Z agents are programs which connect to a zone, typically through the Internet but possibly even just from another thread within a program, to a proxying zys within the zone, often created by the zone on the client's behalf. Once connected, the client can then send events to other zys within the zone or to the zone itself, and thereby create and manipulate zys, attach visual zobject representations to them, and listen through them for requests and events from other zys.

User interaction with a zone takes place almost entirely through a special kind of client program called a user agent, which connects to the zone in typical agent fashion, but then handles additional HCI issues to allow for human interaction with a zone. This includes channeling the user's device inputs into special zevents sent from the proxying object representing the user's location in the zone environment, listening to status updates about the zone from the proxying zys, and fetching the zobject data required to create a visible representation of the Z environment for display to the user.

The attachment of a user agent to a particular zys, and by extension to that zys's frame of reference and local coördinate system, allows for the user's viewpoint and any other zys position and orientation to be manipulated via other programs. This allows for the spatial equivalent of the X's window managers, and for the control of the user viewpoint by another agent such as a game, tutorial, guided tour, or other application.

Furthermore, by taking advantage of distributed aspects of Z, the user's viewpoint can be moved by the mediating agent program into an alternate zone. This alternate zone's contents may actually be layered on top of those in the original zone, it might act as an intermediary between the original zone and the annexed user agent, or it might completely supplant the original. This ability can be used within the user agent itself as a method of caching, both to lower network traffic as well as to greatly increase the ease of performing 3D rendering.

Combined, these abilities allow for both layered and disjunct shared environments, providing for user overlays, collaborative applications, dynamic changes to the user's HCI via intermediary programs, and shared environments with special constraints via user agent viewpoint annexation and input interception. The last represents the base ability needed for full-scale gaming, one of several key components in wide-scale general platform acceptance.
transforms

The fundamental abstraction in Z is an hierarchy (an N-tree) of nested coördinate transforms. A transform is a short term for referring to a mathematical coördinate tranformation, which is a set of parameters for converting between two frames of reference. Every object in the spatial environment is associated with a particular transform which describes that object's position in the context of an enclosing transform. These nested transforms allow objects sharing a frame of reference to be moved as a unit, and allow this to be done with computational efficiency.

For a real-world example (with the constraint that it assumes consistent coördinate systems), consider the scenario of creating a description (also called a scene graph) of the heirarchy of frames of reference implicitly acknowledges when describing the view of an astronaut during a lunar mission.

20°11'27" N latitude, 30°46'18" E longitude is the location in IAU Mean Earth Polar Axis coördinates of the Apollo 17 lunar landing at 1972-12-11 19:54:59 GMT. [NASA]
1. The astronaut's view is currently 30° to his left
2. The astronaut sits in the left seat of the rover
3. The rover is canted 5° nose up and 3° left from level
4. The rover is 200m out at the lunar module's “7 o'clock” heading,
5. The lunar module is at 20°11'27" N latitude, 30°46'18" E longitude
6. The moon is at a certain position in its orbit about earth
7. The earth is in a position in its orbit about the sun, its northern axis angled somewhat away from the sun
To correctly show the objects in the astronaut's view, including whether the lunar module, sun, and earth are currently visible (and in a virtual environment, therefore selectable), one first sets the rendering projection to correspond with the field of view through the astronaut's helmet, a perspective viewing frustum. Next, one composes the transforms for each frame of reference from sun (the base frame of reference in this example), to earth, moon, lunar module, rover position, rover canting and astronaut, to the astronaut's view direction. This provides the viewing transform (called the modelview matrix) for a spatial rendering. One these are set, the zys' zobjects can each be rendered until the complete scene is visualized.

With this process, the two transforms indicating the positions and rotations of the earth and moon can be easily updated, and the correct rendering of the astronaut's view seen in the next rendering pass. By contrast, were the scene-graph structured so that positions for all objects were kept in the sun's frame of reference, then the transforms for every object on the the earth and moon would have to be updated to account for stepping the two orbits forward. Considering that thousands of vertex positions might be involved in the rendering of just the moon's surface, it becomes quickly obvious that pushing the math for determining all these positions out to the graphics unit using nested transformations can dramatically decrease the workload on the main computer's processors.

These transforms are usually implemented as either the standard 4x4 3D matrices used by visualization libraries such as OpenGL, or as more mathematically efficient quaternions in host software. Although the latter approach often adds the overhead of converting back and forth between the two paradigms - since the libraries are almost always matrix based - it can still pay off in performance gains for certain applications.
objects

The basic premise of the Z environment is that the realm made available to the user is composed primarily of zys with visible and/or audible zobject representations, of realistic, abstract, or purely metophorical nature, within a navigable zone. Zobjects are built from references to other zobjects, as well as the zop primitives, and include multimedia elements to allow for transform information to be shared, and for tight binding, for exmaple, between sound samples and frames of zop-based animation. Multimedia such as images, audio files, and so on, are referred to from the zop primitives by URI, and are intended to be fetched by agents as needed.
1. visual representation
  
  Tasks and data in a spatial environment are typically represented as clearly distinguished graphical entities in the virtual space, and that visible component is the key enabler for pointer-based interaction with the controlling program for that graphic.
  
  In the prototypes, the graphical zop primitives are essentially wrappers around OpenGL calls, and include the ability to define cachable display lists, light sources, and define textures through URI to both files and to shared memory areas. The latter allowed for texturing the display of a special shmem X server directly onto objects in the user agent.
2. audio representation
  
  Not all tasks and data need necessarily be represented visually. Audio can provide a number of cues in terms of distance, importance, direction, and to some extent velocity, all of which can be quite applicable to certain tasks. Obviously music and audio players produce audio output. However, where most desktop environments simply mix audio without attempting to model it within the desktop itself, there are gains to be realized by given audio positional information in a spatial environment.
  
  In particular, the modelling of sound's radius of effect can lend background context and information to users working in virtual proximity. Further, modelling the attenuation of sound provides a method for limiting the potential interference this very sharing could otherwise cause.
  
  In the paradigm presented here, sound is considered to be a potential attribute of an object just as is its visual representation, and is governed by the same rules of heirarchical frame of reference as evinced in the impact of attenuation by distance and other supported factors.
operations

Objects in zones might be candidates in some cases to a wide range of application-specific operations, many implemented within the connected client rather than the zone itself. However, to avoid scenarios where ill-written client software connects to a zone and unconstrainedly manipulates the objects therein into some broken and inconsistant state, it is desirable to have a base set of solidly reliable operations into which all more complex actions must be converted. Such base actions include but are not limited to objects being queried, moved, manipulated, modified, cloned, or deleted.

A more general set of input/output methods for client programs to communicate with one another via their objects is also necessary. Z provides specific approaches to encapsulating very common data such as Unicode glyph data, HCI-controller events like those generated by use of the mouse, six-axis controllers, and keyboards. Z also provides features for implementing interactive object selection.

The zone server process must be able to process certain basic operations, including dealing with connections from client processes, allowing for resets, halts, restart and status queries, and maintaining interlinks with other zones, possibly on other computers.

These operations comprise the various forms of the event abstraction, used for all communications within zones by objects, for inter-zone communication, and for all communications with external client programs. Events being sent to a zone for processing are subject to constraint based mainly upon agent authorization and zys permission settings.
authentication

Authentication to a zone can be required through a number of different methods, such as exchanging credentials with the zone directly, or working by proxy through another Z agent already connected to the zone. Likely underlying mechanisms for authentication include PAM, remote PGP authentication, SSH keys, SASL (possibly with GSSAPI, but a SASL blending with SSH keys would be advantageous), remote LDAP auth, authentication against some other local database (including those not requiring root access), and so on. The ability to provide limited access to remote users, some of which may be allowed unauthenticated connections, is essential to certain operational models. Name clash issues from a potential global user space must be resolved and unique IDs generated for all connected users.
permissions

In order to prevent chaos in a zone, as well as to allow the innate groupware potential of zones to be applied on a discretionary basis, ownerships and permissions must be attached to all zys in the zone, supported by the authentication models. These permissions can be used to limit the availability of zys manipulation based upon user identity, groups, or other criteria. Permission types should minimally include a required set of owner rights, as well as permissions for an arbitrary set of other users, user groups, or roles.

In general use, a user should be easily able to see a graphical display of the permissions allowed by object to that particular user, possibly in a viewing reticule or some other modification to the pointer.
events

The following heirarchy of internal Z Zone events is currently evolving as it undergoes experimentation. They allow actions upon Zys and the Zone containing them, as well as providing base facilities for communication between programs interacting via Zys nodes within a Zone. They are the Z equivalent to the Unix kernel trap list (also known as system calls).

The Rune and Unrune data below refer to character symbol data such as unicode glyphs, or to non-unicode input such as modifier and other special function keys, respectively.
```
      +-Noop                        - Null operation
      |
      |    (these first four may be better off not grouped under Zone)
      |      +-Ping                 - Z analogue of pings
      |      |-PingReply            - Z analogue of pings' replies
      |      |-Ingress              - Zone/Agent entering a Zone
      |      |-Egress               - Zone/Agent leaving a Zone
      |      |
      |      +-Syncher              - Zone/Agent reader-locking Zone
      |      |-Interlink            - Zones interlinking
      |      |-Expulsion            - Zone expels Zone/Agent/Syncher
      |      |-Scope                - Asking if local or shared Zone
      |      |-Halt                 - Zone halting
      |      |-Reset                - Zone resetting (warm-boot)
      |      |-Recache              - Zone cache resetting
      |      |-Status               - Report Zone status
      |      |
      ZEvent-+-Zone-+-List-+-Zones    - List Zone and other known Zones
      |                    |-Agents   - List connected Agents
      |                    |-Synchers - List reader-lockers
      |                    +-Links    - List interlinked Zones
      |
      +-Zys-+-Make                  - Zys creation
      |     |-Kill                  - Zys destruction
      |     |-Base                  - Zys rebasing (change of parent)
      |     |-Leaf                  - Zys releafing (change of children)
      |     +-Warp                  - Zys transformation (geometric)
      |     |
      |     +-View                  - Zys viewing (by other Zys)
      |     |-Poll                  - Zys polling (for Zys's methods)
      |     +-Task                  - Zys method call (inter-Zys I/O)
      |
      |    +-Rune---+-Press   - Glyph key press events
      |    |        +-Release - Glyph key release events
      |    |
      |    |-Unrune-+-Press   - Non-glyph press events
      |    |        +-Release - Non-glyph release events
      |    |
      +-Io-+-Button-+-Press   - Button press events
      |        +-Release - Button release events
      |
      +-Motion           - 6-axis motion events
      |
      +-Selector-+-Enter - Raycast entering object
                 |-Leave - Raycast leaving object
                 +-Track - Raycast intersecting w/ object
     
```
1. event representation
  1. normal events
    
    Events are sent between objects derived from the communicator type, such as agents, probably the most common. All events are tagged with increasing integer identifiers, set by the sender upon transmission and local to that directed path to the receiver. These IDs will be assumed to apply to all the normal event types described below.
    1. ZEventNoop
      
      Null operation
    2. ZEvent[Zone]^?Ping
      
      Z analogue of pings
    3. ZEvent[Zone]^?PingReply
      
      Z analogue of pings' replies
    4. ZEvent[Zone]^?Ingress
      
      Zone/Agent entering a Zone
    5. ZEvent[Zone]^?Egress
      
      Zone/Agent leaving a Zone
    6. ZEventZoneSyncher
      
      Zone/Agent reader-locking Zone
    7. ZEventZoneInterlink
      
      Zones interlinking
    8. ZEventZoneExpulsion
      
      Zone expels Zone/Agent/Syncher
    9. ZEventZoneScope
      
      Asking if local or shared Zone
    10. ZEventZoneHalt
      
      Zone halting
    11. ZEventZoneReset
      
      Zone resetting (warm-boot)
    12. ZEventZoneRecache
      
      Zone cache resetting
    13. ZEventZoneStatus
      
      Report Zone status
    14. ZEventZoneListZones
      
      List Zone and other known Zones
    15. ZEventZoneListAgents
      
      List connected Agents
    16. ZEventZoneListSynchers
      
      List reader-lockers
    17. ZEventZoneListLinks
      
      List interlinked Zones
    18. ZEventZysMake
      
      Zys creation
    19. ZEventZysKill
      
      Zys destruction
    20. ZEventZysBase
      
      Zys rebasing (change of parent)
    21. ZEventZysLeaf
      
      Zys releafing (change of children)
    22. ZEventZysWarp
      
      Zys transformation (geometric)
    23. ZEventZysView
      
      Zys viewing (by other Zys)
    24. ZEventZysPoll
      
      Zys polling (for Zys's methods)
    25. ZEventZysTask
      
      Zys method call (inter-Zys I/O)
    26. ZEventZysIoRunePress
      
      Glyph key press events
    27. ZEventZysIoRuneRelease
      
      Glyph key release events
    28. ZEventZysIoUnrunePress
      
      Non-glyph press events
    29. ZEventZysIoUnruneRelease
      
      Non-glyph release events
    30. ZEventZysIoButtonPress
      
      Button press events
    31. ZEventZysIoButtonRelease
      
      Button release events
    32. ZEventZysIoMotion
      
      mono- or poly-axial motion events (mouse, spaceball, etc.)
    33. ZEventZysIoSelectorEnter
      
      Raycast entering object
    34. ZEventZysIoSelectorLeave
      
      Raycast leaving object
  2. user-interface input events
    
    User-interface events are typically translated into normal Z events, however, a method for notating input events from the user's perspective is convenient.
    
    Note that keyboard events are usually mapped into the Z event types of Rune, Unrune. Mouse and spaceball button events are mapped to Button, with spaceball movements mapping to Motion. Mouse movements are mapped to Pointer events.
    - Format
      
      References to user input events will generally be displayed in an informal form of:
      
      modifier device name transition
      
      ...where the input event components are:
      
      modifier
      
      Any held (see Transitions) buttons or keys required to compose an event. Modifier composition is independent of the order in which the press events may have taken place. A modifier can be represented as a simple device-name pair, for modifiers that might be considered distinct based upon the originating device; keyboard is assumed by default. If modifiers is not specified, none are assumed.
      
      device
      
      Usually keyboard, mouse, or spaceball in the context of input events. If device is not specified, keyboard is assumed.
      
      name
      
      The button, named keyboard key, or the name of a glyph composed via the keyboard or other factors by the user agent.
      
      transition
      
      One of press, held, release, or actuate. held refers to the status of the represented event, not actually referring to an event itself but rather to an earlier press event with no matched release. actuate refers to matched press and release, generally without any unmatched release events occurring in the interim. If not otherwise indicated, held is the default in respect to modifiers, actuate for all others.
      
      It's assumed that no transition name overlaps with any button/key name.
    - examples
      
      meta shift C press
      
      while Meta and Shift are held, C is pressed.
      
      hyper press
      
      Hyper is pressed.
      
      super held
      
      Actually a reference to keyboard state, rather than an event, means that super press has occurred, but super release has not, yet.
mapping events into output to the user

The events relating to zys creation/destruction, as well as adjustments to the heirarchy and positions of zyses (i.e. Make, Kill, Base, Leaf, Warp), are the bulk of events streamed from the remote zones to connected agents. These are also the primary contributors affecting the visual and audio rendering in most HCI user agents.

A visual user HCI agent is expected to produce a 3D rendered representation of the agent's internal zone. That internal zone is a dynamic structure representing a merger of data received from external zones and any additional zys maintained by the agent as an embellishment upon those zones. Those local embellishments include any HUD within the agent itself, as well as any other unshared interfaces connected to the agent's internal zone, such as timepieces, mailbox monitors, etc.

A rendering pass basically involves creating a reader lock upon the internal zone via a Syncher, making a rendering pass, and then releasing the reader lock. For each zys encountered upon traversal of the internal zone, the agent generates a 3D represention based either upon available primitives, or upon data loaded dynamically from local filesystems or via protocols such as HTTP. The agent is expected to handled issues of clipping and object occlusion in an intelligent manner, although some zones may conceal objects which should not be visible to the agent.

Audio emitters are also handled during the rendering pass, sound data likewise being provided parametrically, or through loading from local filesystems or the network. Agents are expected to process this data appropriately for position (delay, attenuation, speaker positioning), velocity (pitch modification), and gross optimization overly quiet sounds are suppressed). More advanced agents should handle audio boosters as well, which should mix the audio of sufficiently near emitter and booster objects, based on either line-of-sight audio reflection using a parameterized filter, or parameterized transmissive filtering when line-of-sight to the booster is blocked.

Once the complete set of audible audio objects is determined, including as much as feasible of their time offsets relative to the base sound data (adjusting for distance), pitch variance, filtering parameters, and (3D) angles from the listener are established, then all of this information can be passed to the next layer of the audio system for mixing and discrete speaker output.

mapping user input into events

The manipulation of virtual objects in a 3D environment is a fundamentally more complex endeavor than such control in a 2D desktop environment. The approach of harshly limiting the number of axes of freedom, such as constraining user motion to a primarily planar environment such as a flat landscape might be tempting. However, that approach will not be taken here, for the simple reason that applications may have the need to override the default manipulation model. Since their needs cannot be reliably determined in advance, no closed manipulation model can be asserted to be sufficient.

Hence, the specific mappings documented below must be considered to be merely an imperfect and supplantable set of default mappings for the user's proxy agent program, which accepts and converts his input into Z events, as well as probably converting incoming Z events into a graphic realization of the virtual environment. However, any substitute method should provide a least the basic capabilities as detailed in Requirements, unless dictated otherwise by the demands of a specific application.

device repertoire

The following devices or their near equivalents will be taken as a baseline, with general rôles as follows:

rôle of the keyboard (and simulation aspects)

The keyboard will most typically be used to enter character data into selected objects, but that data might also be intercepted and used by the user's viewpoint client (for example, an OpenGL environment visualizer), and object manager (by conceptual extension from the X window managers), among other possibilities.

In cases where the physical keyboard is inadequate for generating a sufficiently broad set of runes, synthetic runes can be generated or composited by other methods. For simplicity, it is often useful to treat synthetic runes as though they were keyboard input.

Internationalized input is expected to be supported mostly by a client-side application called an input-filter which uses some arbitrarily complex process to map user input to internationalized synthetic input which then takes the place of the user's original input. For example, the input filter could map roman character input phonetically to japanese hiragana, and then into kanji, as in “toukyou” to 「とうきょう」 to 「東京」 for Japan's capital city. The input-filter can display intermediate translation results within the Z space within the viewer's viewing frustum, as an HUD. Note that this is primarily a fallback method, since better translational context (such as surrounding text) might be available within the target application (such as an editor) than within the input filter being described. Due to this reality, the input filter should be overridable, and might itself simply be a user-controllable agent.

The keyboard also has a rich set of modifier keys, usually drawn from a subset of modifiers such as Shift, Control, Alt, Meta, Super, Hyper, Numlock, Capslock, Shiftlock, Altgraph, Compose, Graphic, as well as many others including such as language-specific character access keys (Greek, 漢字, カタカナ), and propriatary operating system keys (Open-Apple, Windows, Command, Option). Although these must certainly be supported in the general sense within Z, they are excluded from consideration as modifiers for interactive spatial manipulation, for two reasons:
- They're difficult to actuate when the spaceball and mouse are in concurrent use
- Many of them are already expected input for applications in the desktop environment which we may be using through a backwards-compatibility feature in the spatial environment.
It is desirable that some keyboard modifier, perhaps hyper held, be reserved as a modifier giving access to a special keymap for various useful spatial functions, so as to reduce the need of the hands to move out frequently to the spaceball and mouse for simple manipulations like switching between the current and last selected object, flipping an object to look at its back, iterating through subobjects, activation of subobjects, etc.

rôle of the spaceball

The spaceball (or other 6-axis controller) orb will be used to control movement of the viewport and the manipulation of objects. Such controllers often come with a plethora of buttons, even more than most mice. This section will use Spaceball as a specific example.

Spatial coördinate events are notated herein as either of the following, depending on type:

modifier spatial button transition: A spatial controller button event.
modifier spatial x,y,z,xr,yr,zr .: A spatial controller motion events.

The spaceball also has nine buttons accessible to the fingers and three more for the thumb. The first three buttons are particularly easy to reach, and can be manipulated by a user with only minor interference with the use of the orb. These buttons are numbered on the device from 1 to 9 and (lettered) A to C, each of which be the source of its name in its represention. Examples:

meta spatial D press
spatial 1 release
spatial 7 held
shift spatial -76,-4,-313,-2,12,0

Although one might expect that these buttons would be the primary choice for object interaction, given that the mouse buttons tend to have preëxisting purposes in 2D applications, many drivers for 6-axis controllers monopolize the buttons for tailoring controller events. For example, the 3DxWare 1.4.3 driver for the Spaceball has these default mappings, most of them toggles:

spatial 1	mapped as	spatial Translations
spatial 2	mapped as	spatial Rotations
spatial 3	mapped as	spatial Dominant Axis
spatial 4	mapped as	spatial Keep on Top
spatial 5	mapped as	spatial Decrease Sensitivity
spatial 6	mapped as	spatial Increase Sensitivity
spatial 7	mapped as	spatial Pan Only
spatial 8	mapped as	spatial Zoom Only
spatial 9	mapped as	spatial Restore Defaults

Although all of these can be dynamically set to generate merely the basic button event instead of the default toggle (or restore), leaving them in place might be of benefit to the end user. Note that Z itself will never receive them, since the 3DxWare device driver absconds with them. The following buttons are free however:

spatial A	may be mapped as spatial 1 or spatial 10 (among others)
spatial B	may be mapped as spatial 2 or spatial 11 (among others)
spatial C	may be mapped as spatial 3 or spatial 12 (among others)

The buttons can be reasonably mapped to one of two sets of values (1, 2, and 3 aren't generated when the device is sending Translations, Rotations, and Dominant Axis instead), although neither of them matches their labelling. The 10 through 12 at least conflict less with a use where the 1-9 buttons have been configured to send simple button events.

Mappings like spatial Translations interfere with the temptation to use the multifarious buttons on the spaceball as an adjunct chord keyboard as recommended many years ago by Engelbart for basic text input sufficient for very brief text input without having to move the hands back to the keyboard. However, should these modes turn out to be unneeded, with the buttons simply set to generate their button values, using them for a chord keyboard would be interesting, despite their not being ideally shaped or laid out for this task.

One complication in the use of 6-axis controllers is that they tend to generate values rather differently, especially for positional controllers versus force-based controllers. Positional controller inputs, once calibrated to the orientation of a selected object, can be directly applied to the object's orientation. Force-based controllers generate velocity inputs rather than simple offsets, and need to be applied to the selected object with a consideration for elasped time, and may need more complex user-specific configuration in order to be comfortable to use. In both cases, but especially with force-based controllers, configuration may need to be application-specific as well, meaining that the parameters used to translate the controller inputs into suitable values to pass to an object may need to be tailored depending on the three-part combination of the controller type, the user, and target object.

rôle of the mouse

Historically, the mouse is a well-accepted device for pushing a pointer around on the screen, and implicitly supplying the x and y coördinates of that screen location to the underlying environment. Mathematically, it is straightforward to model a ray from the user's assumed vantage in front of the screen, through the screen x,y coördinate indicated by the standard mouse screen pointer, and into the virtual space and possibly one or more objects in its path therein. However, that doesn't mean that a conventional two axis mouse event will be delivered to target objects. Positional event information must first be requested by the object, and if so, would generally be presented as the point of intersection between the ray and the object, translated into the object's local coördinate system, and including at least the 3D coördinate (optional parts might include rotational information or the ray's point of origin).

Despite all that, a notation for the incoming mouse events from the host system (e.g. X) independent of the host system's details is useful. Hence mouse events are notated as either of the following, depending on type:

modifier mouse button transition: A mouse button event.
modifier mouse x,y: A mouse motion event.

Mouse coördinate events are notated herein as mouse x,y.

Most mice also have a least three usable buttons (reportedly colored red, yellow, and blue on the Alto computers at XEROX PARC in the 1970s) which can be used with only minimum interference with the mouse's primary pointer function. The buttons will be referred to as follows:

mouse 1: the primary button, typically under the pointer finger.
mouse 2: the middle button, thumb button, or the result achieved by pressing the scroll wheel, or a synthetic event simulated by pressing buttons one and three simultaneously.
mouse 3: the “other” button besides button one on a two-button mouse, the button under the ring finger for the classic three-button mouse of the 1980s, or generally the button on the top of the mouse on the opposite lateral side from button one.

Or, more pictorially for right handed classical mice:

1	2	3

For right-handed users, mouse buttons 1, 2, and 3 would have been activated by the pointer, middle, and ring finger respectively. For left-handed users the numbering was reversed, so that the ring, middle, and pointer fingers would still be activating buttons 3, 2, and 1, just now numbered from right to left. Thus the pointer finger for either handedness always had the same function in the window system.

Notation is the same as for the spaceball, except for the substitution of the word “mouse” for “spaceball”, and the addition of scrolling events for mice so equipped. Examples:

mouse 2 press
meta mouse 200,300
shift mouse scroll-up actuate
mouse scroll release

Under X, mouse-wheel events are usually encoded as down+up events on hypothetical extra mouse buttons, which is a bit contorted, but does lend itself well to being converted to actuate events in Z.

Buttons which are being referred to by their use instead of their physical layout can be represented (with the parentheses shown) as mouse meaning

In the case that one of the virtual trackball or radial mouse mapping concept is being used, the mouse is thereby able to generate spatial events as does a spatial controller like the spaceball. At the protocol level, these events should look exactly like those of a normal spatial controller, so any differences in their handling must be implemented by the user interface software.

Mouse inputs also have per-user tailoring issues like those described for spatial controllers, but these are typically handled using mouse pointer acceleration configured in the window system hosting the Z user agent, and thus the user agent generally doesn't have to worry about them. If the user agent were the sole interface without an intervening system like X, then mouse acceleration handling would need to be intrinsic to the agent.

This section doesn't define the mouse coördinate ranges for the simple reason that they shouldn't be passed into Z by the user agent to start with.

device synergy

A recurring motif in HCI studies is the two-handed input model for positional control, using the mouse as the screen coördinate through which to ray-cast, and the spaceball or similar tool for adjustments to objects and the viewpoint.. The keyboard is retained for text input. Based on the research, it seems reasonable to take this model as a basis on which to build a model for interaction with a spatial environment.

However, since the aliasing of object manipulation and viewpoint adjustment onto the same 6-axis controller might contribute to the user perception of having a bimodal interface, it is appropriate to attempt to minimize this impact. One of the simplest approaches to reducing subjective modality is to require the user to maintain the mode kinesthetically through holding down an actuator of some kind to use the secondary mode. For example, The emacs text editor uses this model extensively, where the user holds down the control and/or other modifier keys during non-text-entry operations. In the case of the spaceball, however, holding one of its buttons while using the ball is difficult. Since the spaceball is the device principally encumbered by this problem, buttons on other devices such as the mouse or a foot pedal can be used to activate auxilliary spaceball modes. Spaceball buttons can still be used as modifiers to the mappings of the mouse and other devices. Since there are cases where only the spatial controller and the keyboard are in use, without the mouse, it might be useful to map both a mouse button and a keyboard modifier to the secondary mode activator.

Obviously an improved spaceball or other 6-axis controller which could be used effectively in conjunction with its own buttons would be a benefit. Being able to detect that the ball is being squeezed, for example, would translate fairly intuitive to manipulation mode rather than view mode, although the threshold would need to be user-tunable.

Another factor against combining the 6-axis manipulator with using, especially holding, one of its own buttons is that twisting the manipulator changes the geometry of the hand in relation to the device's buttons. Since the mouse doesn't have this particular problem, preference could be given to the mouse buttons in the case that any need to be held as part of an action.
A Mouse+Spatial Haptic Paradigm

These requirements should not be interpreted as a standard look-and-feel in any way, but merely a mode from which to begin. Indeed, despite current fashion to the contrary the following has been stated about the benefit of the multiplicity of look-and-feel interface choices

Providing optional modules for selected look-and-feel interface characteristics would serve an important practical as well as evolutionary need. [...]

Besides relaxing the troublesome need to make people conform to a standard look and feel, this approach has a very positive potential outcome. So far, the evolution of popular graphical user interfaces has been heavily affected by the “easy to use” dictum. This has served well to facilitate wide acceptance, but it is quite unlikely that the road to truly high performance can effectively be traveled by people who are stuck with vehicular controls designed to be easy to use by a past generation.

The default manipulation model should support at least the following types of interactions, subject to whatever constraints may be imposed by object permissions or other system constraint factors.
1. The spatial initiator button
  
  On my mouse is a second thumb button, mouse 9 that I never use in 2D.
  
  My mouse init3d
  
  An initiator button will be used in this model to preface many 3D interactions, in order to preserve the majority of mouse buttons for their conventional use. This is important both to allow for backwards compatibility with 2D applications mapped into the space, as well as to preserve that familiar paradigm for new apps. In cases where the difference between 3D Zone objects and 2D application objects is unclear, it is expected that Z-aware applications will also receive and reasonably interpret traditional mouse button actions implying selection. This model is quite close to that used by a 2D widget-based application, and should pose no challenge to the programmer.
  
  Since many modern mice have so many buttons on them that at least one tends to have no important predefined use, mapping one of them to be an initiator for 3D object actions is often quite viable. This paradigm will take this approach, creating a mouse init3d event from a spare mouse button.
2. pointing
  
  A way must exist for allow the user to point at objects as a preliminary to selection or other actions.
  
  Raycasting will be done via the mouse-determined screen x, y position. The current objects along that ray are considered to be pointed, can receive pointed, unpointed, nearest-object pointed, and nearest-object unpointed events, and while pointed will be available for other actions. No buttons are involved in pointing.
3. tracking
  
  The user must be able to indicate a position on the surface of an object, regardless of whether or not it's selected. Furthermore, being able to include additional axes, such as tablet stylus pressure, is highly desirable.
  
  The nearest object along the pointing ray which has elected to support tracking will be provided with the position in its coördinate system where the ray entered (on the user-facing side) the object in a tracking event. The object may use this information to update itself accordingly.
  
  Such events should include the near 3D contact point between the pointer's raycast and the object, possibly accompanied by the ray's origin translated into the object's frame of reference and optional additional axes to carry information such as stylus pressure, rotation, etc.
  
  Although tracking events would logically be bracketed in nearest-object pointed/unpointed events, relying on the presence of either is probably unwise, since there are risks of momentary occlusions from interposing moving objects, user agents might be disconnected midstream, and other reasons a user agent might erroneously send the events to different targets, tangle the expected order, or fail to send expected followup events.
4. selection
  
  A method must be provided to select an object. Spatial selection should be distinct from selecting a point on an object's surface, assumed to be the first mouse button for compatibility with X and kin.
  - Selection mode is entered via mouse init3d press
  - The current pointer coördinate is recorded as frustum₀.
  - Switch to selection modification by picking if mouse 1 press occurs.
  - (Other future event actions may be defined here)
  - If mouse init3d release occurs, then:
    - If there is a current pointed object, then:
      
      clear the pointed-objects list of all other objects
      
      toggle the selection state of the pointed object
    - exit selection mode
  One subtlely here is that mouse movement is required to select an object. Objects drifting into a idle 2D pointer raycast or unchanging frustum should not affect selection (or input focus, q.v.)
5. selection modification by picking
  
  The user must be able to modify the selection in units of individual objects, without shifting hands from the current controls.
  - This mode was entered from the overall selection mode, so we expect mouse init3d held and mouse 1 held to be in effect, and frustum₀ to be set.
  - If mouse init3d release, then process any pending objects-selecting or objects-deselecting lists, then exit selection mode.
  - Switch to selection modification by frustum if if nearest-object unpointed occurs
  - If mouse 1 release occurs while still pointing to the same nearest pointed object (note that this state would have been exited already if it wasn't the same object) it's added to whichever of the objects-selecting or objects-deselecting lists could result in toggling its selected state. Any following mouse 1 press sets a new nearest pointed object.
  - On any other mouse (N != 1) then exit selection mode.
6. selection modification by frustum
  
  The user must be able to augment the selection in units of object groups swept by frustum, without shifting hands from the current controls. Only objects which are completely within the frustum are considered, in order to prevent undesired selection of large, container, or background objects such as the sky or ground.
  
  Whether this effect should be to select or toggle is an interesting question. For now, the effect is assumed to be nonexclusive selection, similar to that seen in the current desktop paradigm.
  - This mode was entered from the Selection Modification by Picking, so we expect mouse init3d held and mouse 1 held to be in effect, frustum₀ to be set, and that it's likely there is no current pointed object.
  - Clone frustum₀ into frustum₁.
  - Record whether selecting or deselecting, based on whether nearest-pointed object at frustum₀ was not in, or was in, respectively the selected objects list.
  - On pointer motion, record the updated pointer location into frustum₁.
  - (note: a method should be added to control the near/far limits of the frustum).
  - If selecting, then enter all objects in the frustum which are NOT in the selected objects list into the objects-selecting list.
    Else enter all objects in the frustum which ARE in the selected objects list into the objects-deselecting list.
  - If mouse 1 release occurs, remove any object in objects-deselecting from the objects list, and add an object in objects-selecting from the objects list.
  - If mouse init3d release occurs, exit selection mode.
  - On any other mouse (N != 1) exit selection mode.
7. selection cancellation
  
  The selection must be cancellable without shifting hands from the current controls. Hence, reaching for the escape key won't be necessary.
  
  Essentially this is intended to provide a way to abort everything done to the current selection since mouse init3d press + mouse 1 press happened.
  
  As a corollary of selection, If mouse init3d press then mouse init3d release occurs, and the only nearest current pointed-to object is currently selected, then all objects are unselected. However, a more state-independent mechanism is desirable.
  
  If a mouse (N != 1) press occurs before the mouse init3d release, then all objects are unselected. One example would be to do a mouse 3 actuate as an abort sequence.
  
  This means that users who want to access all selection activities purely via mouse init3d, can still clear all current selections by selecting and deselecting a single object with mouse init3d actuate. For the same reason, the common desktop action of unselecting all selections by clicking somewhere nonreactive can be emulated by clicking twice on any nonreactive object - such as a sky or ground in environments modeled after a physical spaces.
  
  Unfortunately, this brings up an important issue: should there actually be any nonreactive objects? There are definitely advantages to being able to select any object whatsoever, for purposes of copying its geometry for various purposes. It may be that all objects should be selectable, but some are excluded unless special action is taken.
8. motion
  
  The user must be able to move the viewpoint.
  
  By default the six axes of the spaceball, spatial x|y|z|xr|yr|zr, are applied as velocities to the user viewpoint transform.
9. manipulation
  
  The user must be able to manipulate a selected object (or set of objects). It is desirable that this not involve the mouse, thus leaving it available for lathing.
  
  Although exhibiting blatantly modal nature without haptic feedback, the following is very similar to how the 3DxWare driver does things: spatial B actuate could toggle the effect of spatial x|y|z|xr|yr|zr events between viewpoint velocity control and object manipulation.
  
  This would appear to be an ideal instance for pressure sensitivity on a Spaceball, since pressure could be interpreted as a grasp, and thus support kinesthetic feedback in a natural manner. However, moving a spaceball itself requires some force, so some other actuator, whether a wrist pressure plate, a foot pedal, or a stray Spaceball button, might be preferable.
  
  This does not preclude the possibility of mapping spatial B held as the enabler for the radial mouse mappings discussed elsewhere in this document, but it should be noted that lathing, while still viable in this model, would then be suboptimally limited to a fixed pointer position.
  
  The manipulation toggle approach with spatial B seems rather coarse and it would be interesting to test the hypothesis that a held actuator, such as a footpedal, for enabling object manipulation would be more effective.
  
  Without a lathing option, a user could still use the pointer to detail a path across an object while moving the viewpoint, instead of manipulating the object.
10. lathing
  
  There must be a way to manipulate an object with one hand as one indicates different points (generally with the notion of describing a path) across its surfaces with the other.
  
  While in object manipulation mode (see the manipulation section) the mouse is left completely free to generate trackings via the raycast into the manipulated object controlled directionally by mouse x|y events translated into tracking events
  
  These trackings may be interpreted by the object's controlling program as directions for where to draw upon or modify itself, such as the task of painting an object by rotating it under the brush, with the brush controlled by the mouse or tablet stylus. Further, if the raycast data is included, it too could affect object modification through considering the angle of the ray with respect to the drawing surface and so on.
11. input focus
  
  First, it should be noted that even in normal manipulation/movement mode, tracking events from the mouse and keyboard input are still directed into the current pointed object. This is nearly identical to the default focus-follows-mouse input mode of classical X. However, being able to lock input on to an object that's about to be occluded or leave the viewing frustum is valuable, so the concept of input focus described here is quite distinct from the default state.
  
  In 2D window systems, input grabbing has caused problem. A popup window can grab focus without warning while the user is typing, causing an essentially random line of typed text (or a password intended for another program!) to be fed into the popup which then consumes it (upon return) and vanishes. In Z, however: Only the user agent may set the user's input focus. This means that even programs interposing UI elements between the user and other objects will not be granted input focus. Such popups might be allowed to request focus, and user agents might provide an accelerator to quickly switch input to such a petitioner and then restore input focus afterwards. If granted however: No program may forcibly retain user input focus.
  
  A mechanism must exist to lock the input stream onto selected target objects, overriding normal focus-follows-pointer and being orthogonal to object selection.
  
  Note: Since the hands will most probably switch controls during input focus (for example, moving to the keyboard to enter text, then returning to the spatial controls), kinesthetic feedback on the current mode (barring footpedals or the like) is impractical. Further, input focus may well be left in place for an extended period. Hence a non-tactile mode switch is preferred here, and the provision of visual or other feedback on the input focus state is strongly recommended.
  
  In normal manipulation/movement mode, if any objects are currently selected, and spatial C press is followed by spatial C release then the mode is set to input focus mode.
  
  If nearest-object unpointed occurs before spatial C release then the input focus attempt is cancelled.
  
  While in input focus mode, input is sent to the selected object(s) with one exception: spatial C press + spatial C release restores normal manipulation/movement mode. Notice that in this case an object selection is not required for success - since the object earlier focused upon may be out-of-view or no longer exist (!).
  
  Input focus could potentially involve a set of selected objects rather than just one. For example, selecting a dozen terminals on different host computers, and then sending matched commands to all of them in unison.
User Input Tracked State

The context of selection requires several items to be tracked, including those listed below:

screenxy

Current mouse-driven pointer position

near-object

Nearest pointable object in the current instant's ray-cast through screenxy.

screenxy0

Start of possible frustum sweep. Set from screenxy

object

Most likely object for a simple single-object selection. May be null/undefined. Set from near-object

objects

List of objects completely contained within a frustum sweep. May be null/undefined. Probably will not contain object, since that would normally be at frustum corner, and hence not completely within the frustum. Set from screenxy0 and screenxy.

objectxyz

Near-side intersection of raycast through screenxy with target object, in object coördinates. A somewhat expensive datum. Only tracked intermittently, as needed.

interaction with programs

Once a program is started, it can create new, interactive objects in the virtual space for communication with other objects, some of which may be user agents for other users. Typically interactions with these objects will be forwarded to the controlling program for processing, and that program will then typically modify the aspect of the controlled objects as directed, or to provide feedback to the activating user.

This sets up the virtual environment as a viable mechanism for IPC.
multiuser interaction

Multiple users can potentially share one environment, aiding them in asking questions of one another about their tasks, allowing coöperation, and potentially enhancing esprít de corps. By allowing interaction via the environment, instead of on a per-application basis, groupware becomes the norm, instead of a specialized software subcategory.
backwards compatibility via X
1. OpenGL and interactive textures
2. rendering desktops into textures
3. rendering application windows into distinct textures

paradigm acceptance issues

As was seen with the introduction of virtually every other paradigm shift, business users will typically classify a new paradigm as being gamer-specific, and then avoid it for some arbitrarily long time period. This has been especially obvious at the introduction of color and the later introduction of 3D graphics. It is expected that business users will react in a similar way to spatial interfaces, until either planned obsolescence forces them to upgrade, or some compelling feature in an application running under the spatial interface - some killer app, in other words - overcomes their pragmatic reluctance. Various other factors, however, will contribute to the creeping acceptance of any desirable meme, including:
1. exposure
  
  For any new technology to succeed, the prospective audience must be exposed to it. Two key channels for exposure deserve attention. The first vector is composed of the technology's developers and their communications with associates and potential users. The second, and potentially far larger vector, is composed of those adopting the technology who did not develop it, and who further evangelize the new technology to others who might adopt it.
  
  In order to not alienate that second vector of adopters, who do not yet have a vested interest in the new technology, a new environment will likely have to provide all the key features - as seen subjectively by these users - of the environment being replaced, including:
  - Web browsing
  - File browsing
  - Email
  - Developer and WYSIWYG editing facilities
  - Instant messaging
  - Clock
  - Calculator
  - Terminal
  - Audio player
  - Solitaire and other simple demonstration games
  - A method for manipulating the representations of the above, as the window managers do in X
  Fortunately, the virtual X framebuffer facility can easily provide a fallback compatibility layer for all of these except for 3D applications such as CAD and 3D gaming, and in many cases even for those. However, it is important to provide incentive for adopters to write new software in the new paradigm in exactly these deficient areas. One useful technique in both proving the fitness of the environment and shortening the learning curve for developers is to produce a number of demonstration games and other programs, and include the source code for these as part of the system and API documentation.
  
  Applications particularly suited to early deployment include the terminal (absolutely essential in the minds of many developers and Unix users), clock, calculator, and solitaire games. Adding a multiuser game such as chess or go, or a simple drawing program, would allow the demonstration of the shared environment features, and lend credit to the environment's fitness for collaborative CAD, groupware, and more serious gaming.
2. demand
3. generational acceptance
  
  Although early adoption may permeate a significant chunk of the population, it's usually the case that the younger adopt earlier, and that eventually a generation will come that never knew a world that didn't possess the technology in question. This generation, by taking the technology for granted, and by taking for granted that it should be everywhere, will drive the technology from being merely a cool new thing to a intrinsic aspect of the ongoing culture.
Evaluation of Fitness
summation
lexicon

artifacts

If undesired stairstepping pixels are visible in a curve or line, the effect is called artifacting or more casually, jaggies. The former implies that the visible errors (the jaggies) are an artifact of the inadequate graphics system.

backwards-compatible

Allows the continued use of part of an older technology within the framework of a newer one.

cursor-addressable

Allows almost any character on a terminal's screen to be changed without regard to any other.

denial of service

A denial of service attack, or DoS, is said to be occurring when an attacker sends so many messages to a Web site or other service that it can't respond to regular users or it shuts down completely.

deprecated

Worthy of avoidance, particularly in cases where some practice is thought to be unwise, or is expected to be eventually banned.

desktop, two-dimensional; desktop paradigm

A not-quite-realistic portrayal of a real-world desktop on a computer screen, in which applications' display areas can partly or wholly conceal one another as would sheets of paper on a desk. Such a display is two-dimensional in the sense that objects can only be moved along two axes along the surface of the desk, and not, for example, rotated, folded, or moved towards or away from the human viewing it.

latency

A delay of some kind in a system or communication link, usually repeatable and endemic.

lathe

A reference to the woodworking tool of the same name, where an object is rotated underneath a finely controlled scoring device.

map

To establish a mathematical correlation between sets of data, generally used in computer graphics to refer to correlating every point in a polygon being rendered to a corresponding point in a texture

megapixel

One million discrete, individually-controllable positions on a graphics screen. Possibly first used by NeXT to describe the display shipped with its first computer. Such a display would typically have minimal dimensions of 1152 x 900 pixels. A two-megapixel display might be 1600 by 1200 pixels in size.

metaphor

A particular way of conceptualizing a human/computer interface which is considered anomalous to some allegedly more intuitive system outside the realm of computing.

paradigm

A particularly clear example, archetype, or framework upon which a human/computer interface can be based.

ray-casting

A selection technique where a ray is mathematically projected from the user's idealized viewpoint in real-world space, through the current cursor position on screen, and into the virtual world, where any object in its path is a candidate for selection. Typically the nearest object is chosen.

resolution, visual display

The ratio of pixels in a certain linear measure to the measure itself, i.e., 100 pixels per inch, or 40 pixels per centimeter, etc. Higher ratios are essential to viewing crisp, clear curves. See artifacts

six-axis controller

A hand-held control device giving six axes of control, such as vertical, horizontal, longitudinal, yaw, pitch, and roll. For contrast, a joystick provides two or three axes of control, a mouse but two, and a paddle-control only one.

slashdot effect, slashdotted

Refers to the huge, often debilitating, rush of Internet traffic to a website as a result of its being mentioned on Slashdot or some other popular site.

spatial, three-dimensional

Having three usable and/or perceivable dimensions such as width, height, and depth. In a three-dimensional human/computer interface, simulated objects can often be manipulated in ways not possible under the desktop metaphor, ie. they can be rotated, stacked, brought closer for a detailed inspection, etc.

texture mapping

The technique whereby an image can be applied to the surface of a three-dimensional virtual object, acting perhaps as a decal or wallpaper upon the side of an object. In some cases a texture may be animated, or be interactive and respond to users in the environment.

transparent

In computing, this refers to something being unnoticable, particularly in reference to hiding a complex system acting in some role as an intermediary, such as the network (which is transparent to the X window system), or a software layer which succeeds in completely hiding the fact it's an adapter layer by exposing all the expected functionality of the underlying system (such as a transparent encryption layer for a filesystem).

virtual

Not real. A virtual object does not exist in corporeal, physical form, but is merely suggested by data within a computer.

zone

A heirarchy of Zys, typically existing within a server program called a zone server, or as a caching mechanism within some Z agent programs, most notably a user's 3D interface agent.

zys

A object in the Z system representing a frame of reference, as well as the applicable access rights, event mask, and representation objects. Nestable.
bibliography

alto01

XEROX (Lampson, Butler, and Taft, Ed), ALTO USER'S HANDBOOK, p 14, section 6.6, Access via Chat, (September 1979)

cro01

Smith, D. A., Raab, A., Reed, D., Kay, A., Croquet............ The User Manual, Draft revision 0.1, (October 2002)

eng92

Engelbart, Douglas C., Toward High-Performance Organizations: A Strategic Role for Groupware, AUGMENT,132811, (June 1992)

etc01

Hausen, Derrick. Ergonomic Efficiency Testing Two-Handed vs. One-Handed CAD Working Styles (2003)

fof01

Volti, Rudi. “Jacquard” Loom. The Facts On File Encyclopedia of Science, Technology, and Society. New York: Facts On File, Inc. (1999)

lagjm

Leach, G., Al-Qaimari, G., Grieve, M., Jinks, N., McKay, C. Elements of a Three-dimensional Graphical User Interface, Department of Computer Science, Royal Melbourne Institute of Technology Melbourne VIC 3000 Australia (1997)

manley

England D, Manley D, 3D Spatialised Audio in Data Landscapes, in proceedings of the 7th UK VR-SIG Conference, UKVRSIG, September. (2000)

nasa01

Space Station Reference Coordinate Systems, NASA International Space Station Program, Revision F, 26 October 2001

nasa02

Ibid. p 4-8 fig 4.0-7

mup01

Phelps, A., Sonstein, J., Joy, D., and Stratton, W., MUPPETS: Multi-User Programming Pedagogy for Enhancing Traditional Study, grant proposal. (2001)

parc01

Card, Stuart K., Robertson, George G., and York, William. The WebBook and the Web Forager: An Information Workspace for the World-Wide Web, in CHI 96 Electronic Proceedings, ACM. (1996)

swi01

Williams, Sam. Free as in Freedom: Richard Stallman's Crusade for Free Software, p 61. (2003)

tia01

O'Brien, Tia. The Mouse (1999)

zhai01

Zhai, Shumin. Human Performance in Six Degree of Freedom Input Control xii-xiii, Ph.D. Thesis, University of Toronto, Department of Industrial Engineering (1995)

Erlkönig: The Development of Spatial Environments

This document is currently under construction

abstract

human / computer interface history

punched cards and batch mode

the punch card paradigm

frustration in the edit / test / correct cycle

the priesthood

high latency

benefits

printing teletypes and timesharing

lower latency and conversational computing

compatibility

glass teletypes

virtual paper

compatibility

foreshadowing

terminals

cursor address-ability

editors and screen-oriented applications

faster connections

compatibility

early graphics

early admixture attempts

disjoint text and graphics

cultural resistance

the desktop

Douglas Engelbart

merger of text, graphics, mouse, and sound

compatibility

hardware requirements

sound as notifier

3D applications in the desktop paradigm

XEROX Palo Alto research center

computer-aided design

modern computer games and simulations

directions for future growth

application interface interpenetrability

shared environments

collaboration

task interface security

spreading the load

candidate applications

shared environment gaming

collaborative CAD

shared-forum course instruction

other applications

backwards compatibility

interacting with images of the past

retention of legacy software

the spatial interface

introduction

questions raised by newer technology

key concepts

the third axis

advantages

hardware

output

perspective display

nonphysical depth cuing

options for physical depth cuing

cinema and television 3D mechanisms

personal computer 3D mechanisms

head-mounted 3D displays

environmental 3D displays

hardware requirements

positional sound

input

keyboard

mouse

standard use

radial access to additional device mappings

linear-left / rotate-right mouse mappings

trifold symmetric linear / rotational mappings

Separate linear / rotational radial menus

A variant 1st degree radian mapping

6 axis controller

Z

introduction

transforms

ZEvent[Zone]^?Ping

ZEvent[Zone]^?PingReply

ZEvent[Zone]^?Ingress

ZEvent[Zone]^?Egress