HUMAN-COMPUTER INTERACTION
SECOND EDITION
Recordings of users' speech can also be very useful especially in collaborative applications; for example, many readers will have used voice-mail systems. Also, recordings can be attached to other artefacts as audio annotations in order to communicate with others or to remind oneself at a later time. For example, audio annotations can be attached to Microsoft Word documents.
Non-speech sound has traditionally been used in the interface to provide warnings and alarms, or status information. For example, there is experimental evidence to suggest that the addition of audio confirmation of modes, in the form of changes in key clicks, reduces errors [161]. Video games offer further evidence, since experts tend to score less when the sound is turned off than when it is on; they pick up vital clues and information from the sound while concentrating their visual attention on different things. Dual-mode displays are, in general, thought to be better since the presentation of similar information along different channels allows the brain to search along two paths, with the best path finishing first and therefore maximizing response time. The presentation of redundant information in this way may increase a user's performance since, for example, he may be able to remember the sound associated with a particular icon but not its visual representation. Ambiguity in one mode can also be resolved by using the information presented in the other. One such example is of a speech recognition system that also uses a camera to video the lip movements of the speaker. Indistinct words or phrases can be resolved more accurately by using the visual information as well as analyzing the sound.
We have previously discussed the role of speech in the interface, but non-speech sounds offer a number of inherent advantages. Speech is serial and we have to listen to most of a sentence before we can extract the meaning; since many words make up a message this can take a relatively long period of time. On the other hand, non-speech sounds can be associated with a particular action and assimilated in a much shorter period. Non-speech sounds can also be universal; in much the same
Soundtrack is an early example of a word processor with an auditory interface, designed for visually disabled users [76]. The visual items in the display have been given auditory analogs, made up of tones, with synthesized speech also being used. A two-row grid of four columns is Soundtrack's main screen (see Figure 15.3); each cell makes a different tone when the cursor is in it, and by using these tones the user can navigate around the system. The tones increase in pitch from left to right, while the two rows have different timbres. Clicking on a cell makes it speak its name, giving precise information that can reorient a user who is lost or confused. Double clicking on a cell reveals a submenu of items associated with the main screen item. Items in the submenu also have tones; moving down the menu causes the tone to fall whilst moving up makes it rise. A single click causes the cell to speak its name, as before, whilst double clicking executes the associated action. Soundtrack allows text entry by speaking the words or characters as they are entered, with the user having control over the degree of feedback provided. It was found that users tended to count the different tones in order to locate their position on the screen, rather than
These problems are reminiscent of those already discussed in speech recognition, and indeed the recognition problem is not dissimilar. The equivalent of co-articulation is also prevalent in handwriting, since different letters are written differently according to the preceding and successive ones. This causes problems for recognition systems, which work by trying to identify the lines that contain text, and then to segment the digitized image into separate characters. This is so difficult to achieve reliably that there are no systems in use today that are good at general cursive script recognition. However, when letters are individually written, with a small separation, the success of systems becomes more respectable, although they have to be trained to recognize the characteristics of the different users. If tested on an untrained person, success is limited again. Many of the solutions that are being attempted in speech recognition are also being tried in handwriting recognition systems, such as whole-word recognition, the use of context to disambiguate characters, and neural networks, which learn by example.
In all of these cases, the emphasis on ubiquity is clearly seen in the capture and integration phases. Electronic capture is moved away from traditional devices like the keyboard and brought closer to the user in the form of pen-based interfaces or
For users with speech and hearing impairments, multimedia systems provide a number of tools for communication, including synthetic speech and text-based communication and conferencing systems (see Chapter 13). Textual communication
Finally, users with learning disabilities such as dyslexia can find textual information difficult. In severe cases speech input and output can alleviate the need to read and write and allow more accurate input and output. In cases where the problem is less severe, spelling correction facilities can help users. However, these need to be designed carefully: often conventional spelling correction programs are useless for dyslexic users since the programs do not recognize their idiosyncratic word construction methods. As well as simple transpositions of characters, dyslexic users may spell phonetically, and correction programs must be able to deal with these errors.
One common approach is to convert the discrete structure into some measure of similarity. For a hypertext network this might be the number of links that need to be traversed between two nodes; for free text the similarity of two documents may
There are many different ways of traversing the network, and so there are many different ways of reading a hypertext document - the intention is that the user is able to read it in the way that suits him best. Links can exist at the end of pages, with the user choosing which one to follow, or can be embedded within the document itself. For example, in an on-line manual, all the technical words may be linked directly to their definitions in the glossary. Simply clicking on an unknown word takes the user to the relevant place in the glossary. Another unknown word encountered there can also be traced back to its definition and then the user can easily return to his original place in the manual. The positions of these links are known as hot-spots since they respond to mouse clicks. Hot-spots can also be embedded within diagrams, pictures or maps, allowing the user to focus his attention on aspects that interest him.
processed in 0.006 seconds
| |
HCI Book 3rd Edition || old HCI 2e home page || search
|
|
feedback to feedback@hcibook.com | hosted by hiraeth mixed media |
|