HUMAN-COMPUTER INTERACTION
SECOND EDITION
Speech recognition is a promising area of text entry, but it has been promising for a number of years without actually delivering usable systems! It is forecast that the market for a successful system runs into billions of pounds and therefore a lot of development work is being put into this area. Indeed, practical systems are beginning to be delivered commercially so a major growth in this area may occur in coming years. There is a natural enthusiasm for being able to talk to the machine and have it respond to commands, since this form of interaction is one with which we are very familiar. Successful recognition rates of over 97% have been reported, but since this represents a letter in error in approximately every 30, or one spelling mistake every six or so words, this is stoll unacceptible (sic)! Note also that this performance is usually quoted only for a restricted vocabulary of command words. Trying to extend such systems to the level of understanding natural language, with its inherent vagueness, imprecision and pauses, opens up many more problems that have not been satisfactorily solved even for keyboard-entered natural language. Moreover, since every person speaks differently, the system has to be trained and tuned to each new speaker, or its performance decreases. Strong accents, a cold or emotion can also cause recognition problems, as can background noise. This leads us on to the question of practicality within an office environment: not only may the background level of noise cause errors, but if everyone in an open-plan office were to talk to their machine, the level of noise would dramatically increase, with associated difficulties. Confidentiality would also be harder to maintain.
All digitizing tablets are capable of high resolution, and are available in a range of sizes from A5 to 60 ¥ 60 in (1.52 ¥ 1.52 m). Their sampling rate can vary between 50 and 200 Hz, affecting the resolution of cursor movement, which gets progressively finer as the sampling rate increases. The digitizing tablet can be used to detect relative motion or absolute motion, but is an indirect device since there is a mapping from the plane of operation of the tablet to the screen. It can also be used for text input; if supported by character recognition software, handwriting can be interpreted. Problems with digitizing tablets are that they require a large amount of desk space, and may be awkward to use if displaced to one side by the keyboard.
The dataglove has the advantage that it is very easy to use, and is potentially very powerful and expressive (it can provide 10 joint angles, plus the 3D spatial information and degree of wrist rotation, 50 times a second). It suffers from extreme expense, and the fact that it is difficult to use in conjunction with a keyboard. However, such a limitation is shortsighted; one can imagine a keyboard drawn onto a desk, with software detecting hand positions and interpreting whether the virtual keys had been hit or not. The potential for the dataglove is vast; gesture recognition and sign language interpretation are two obvious areas that are the focus of active research, whilst less obvious applications are evolving all the time.
Printers take electronic documents and put them on paper -- scanners reverse this process. They start by turning the image into a bitmap, but with the aid of optical character recognition can convert the page right back into text. The image to be converted may be printed, but may also be a photograph or hand-drawn picture.
Another application area is in document storage and retrieval systems, where paper documents are scanned and stored on computer rather than (or sometimes as well as) in a filing cabinet. The costs of maintaining paper records are enormous,
Optical character recognition (OCR) is the process whereby the computer can 'read' the characters on the page. It is only comparatively recently that print could be reliably read, since the wide variety of typefaces and print sizes makes this more difficult than one would imagine -- it is not simply a matter of matching a character shape to the image on the page. In fact, OCR is rather a misnomer nowadays as, although the document is optically scanned, the OCR software itself operates on the bitmap image. Current software can recognize 'unseen' fonts and can even produce output in word-processing formats preserving super- and subscripts, centring, italics and so on.
(iii) Real keyboard -- you can't word process without a reasonable keyboard and stylus handwriting recognition is not good enough.
Some whole new application areas have become possible because of advances in memory and processing. For example, most applications of multimedia, for instance voice recognition and on-line storage and capture of video and audio, require enormous amounts of processing and/or memory. In particular, large optical storage devices have been the key to electronic document storage whereby all paper documents are scanned and stored within a computer system. In some contexts such systems have completely replaced paper-based filing cabinets.
In a menu-driven interface, the set of options available to the user is displayed on the screen, and selected using the mouse, or numeric or alphabetic keys. Since the options are visible they are less demanding of the user, relying on recognition rather than recall. However, menu options still need to be meaningful and logically grouped to aid recognition. Often menus are hierarchically ordered and the option required is not available at the top layer of the hierarchy. The grouping and naming of menu options then provides the only cue for the user to find the required option. Such systems either can be purely text based, with the menu options being presented as numbered choices (see Figure 3.8), or may have a graphical component in which the menu appears within a rectangular box and choices are made, perhaps by typing the initial letter of the desired selection, or by entering the associated number, or by moving around the menu with the arrow keys. This is a restricted form of a full WIMP system, described in more detail shortly.
processed in 0.003 seconds
| |
HCI Book 3rd Edition || old HCI 2e home page || search
|
|
feedback to feedback@hcibook.com | hosted by hiraeth mixed media |
|