exercises - 9. evaluation techniques

In groups or pairs, use the cognitive walkthrough example, and what you know about user psychology (see Chapter 1), to discuss the design of a computer application of your choice (for example, a word processor or a drawing package). (Hint: Focus your discussion on one or two specific tasks within the application.)

answer

This exercise is intended to give you a feel for using the technique of cognitive walkthrough (CW). CW is described in detail in Chapter 9 and the same format can be used here. It is important to focus on a task that is not too trivial, for example creating a style in a word processing package. Also assume a user who is familiar with the notion of styles (and with applications on the same platform (e.g. Macs, PCs, UNIX, etc.)) but not with the particular word processing package. Attention should be given to instances where the interface fails to support the user in resolving the goal and where it presents false avenues.

EXERCISE 9.2

What are the benefits and problems of using video in experimentation? If you have access to a video recorder, attempt to transcribe a piece of action and conversation (it does not have to be an experiment - a soap opera will do!). What problems did you encounter?

answer

The benefits of video include: accurate, realistic representation of task performance especially where more than one video is used; a permanent record of the observed behaviour.

The disadvantages include: vast amounts of data that are difficult to analyse effectively; transcription; obtrusiveness; special equipment required.

By carrying out this exercise, you will experience some of the difficulties of representing a visual record in a semi-formal written format. If you are working in a group, discuss which parts of the video are most difficult to represent, and how important these parts are to understanding the clip.

EXERCISE 9.3

In Section 9.4.2 (An example: evaluating icon designs), we saw that the observed results could be the result of interference. Can you think of alternative designs that may make this less likely? Remember that individual variation was very high, so you must retain a within-subjects design, but you may perform more tests on each participant.

answer

Three possible ways of reducing interference are:

During the initial training period, swap back and forth between learning the two sets of icons, with the aim of getting the subjects used to swapping between the two sets of remembered icons. However, this design could be argued to suffer the same flaws as the original. If the abstract icons had been taught in isolation perhaps they might have fared far better.
We could invent a third set of 'random' icons (call them R). We could then interpose them in the experiment, that is present the icons in the orders RARN and RNRA. The intention is to swamp any transfer effect in the 'noise' of the random icons. It could be argued that our experiment then measures the robustness of the icon sets to such 'noise'!
We could give the subjects multiple presentations, for example ANAN and NANA presentation orders. This would not remove transfer effects, but it would give us some way to quantify them. Imagine that in the ANAN group the second presentation of the abstract icons was significantly worse than the first, but there was not a similar effect for natural icons in the NANA group. This would give us both positive evidence of a transfer effect, and perhaps some quantitative measure. However, even going from this additional evidence to a strong conclusion will be difficult.

Notice that all the above measures require additional subject time and one has to constantly weigh up the advantages of richer experiments against those of larger subject groups.

EXERCISE 9.4

Choose an appropriate evaluation method for each of the following situations. In each case identify

(i) The participants.
(ii) The technique used.
(iii) Representative tasks to be examined.
(iv) Measurements that would be appropriate.
(v) An outline plan for carrying out the evaluation.

(a) You are at an early stage in the design of a spreadsheet package and you wish to test what type of icons will be easiest to learn.
(b) You have a prototype for a theatre booking system to be used by potential theatre-goers to reduce queues at the box office.
(c) You have designed and implemented a new game system and want to evaluate it before release.
(d) You have developed a group decision support system for a solicitor's office.
(e) You have been asked to develop a system to store and manage student exam results and would like to test two different designs prior to implementation or prototyping.

answer

Note that these answers are illustrative; there are many possible evaluation techniques that could be appropriate to the scenarios described.

Spreadsheet package

(i) Subjects	Typical users: secretaries, academics, students, accountants, home users, schoolchildren
(ii) Technique	Heuristic evaluation
(iii) Representative tasks	Sorting data, printing spreadsheet, formatting cells, adding functions, producing graphs
(iv) Measurements	Speed of recognition, accuracy of recognition, user-perceived clarity
(v) Outline plan	Test the subjects with examples of each icon in various styles, noting responses.

Theatre booking system

(i) Subjects	Theatre-goers, the general public
(ii) Technique	Think aloud
(iii) Representative tasks	Finding next available tickets for a show, selecting seats, changing seats, changing date of booking
(iv) Measurements	Qualitative measures of users' comfort with system, measures of cognitive complexity, quantitative measures of time taken to perform task, errors made
(v) Outline plan	Present users with prototype system and tasks, record their observations whilst carrying out the tasks and refine results into categories identified in (iv).

New game system

(i) Subjects	The game's target audience: age, sex, typical profile should be determined for the game in advance and the test users should be selected from this population, plus a few from outside to see if it has wider appeal
(ii) Technique	Think aloud
(iii) Representative tasks	Whatever gameplay tasks there are - character movement, problem solving, etc.
(iv) Measurements	Speed of response, scores achieved, extent of game mastered.
(v) Outline plan	Allow subjects to play game and talk as they do so. Collect qualitative and quantitative evidence, follow up with questionnaire to assess satisfaction with gaming experience, etc.

Group decision support system

(i) Subjects	Solicitors, legal assistants, possibly clients
(ii) Technique	Cognitive walkthrough
(iii) Representative tasks	Anything requiring shared decision making: compensation claims, plea bargaining, complex issues with a diverse range of expertise needed.
(iv) Measurements	Accuracy of information presented and accessible, veracity of audit trail of discussion, screen clutter and confusion, confusion owing to turn-taking protocols
(v) Outline plan	Evaluate by having experts walk through the system performing tasks, commenting as necessary.

Exam result management

(i) Subjects	Exams officer, secretaries, academics
(ii) Technique	Think aloud, questionnaires
(iii) Representative tasks	Storing marks, altering marks, deleting marks, collating information, security protection
(iv) Measurements	Ease of use, levels of security and error correction provided, accuracy of user
(v) Outline plan	Users perform tasks set, with running verbal commentary on immediate thoughts and considered views gained by questionnaire at end.

EXERCISE 9.5

9.4 Complete the cognitive walkthrough example for the video remote control design.

answer

Continue to ask the four questions for each Action in the sequence. Work out what the user will do and how the sytem will respond. If you can analyse B and C, you will find that Actions D to I are similar.

Hint: Remember that there is no universal format for dates.

Action J: Think about the first question. Will the user even know they need to press the transmit button? Isn't it likely that the user will reach closure after Action I?

EXERCISE 9.6

9.5 In defining an experimental study, describe
(a) how you as an experimenter would formulate the hypothesis to be supported or refuted by your study
(b) how you would decide between a within-groups or between-groups experimental design with your subjects

answer available for tutors only

EXERCISE 9.7

9.6 What are the factors governing the choice of an appropriate evaluation method for different interactive systems? Give brief details.

answer available for tutors only

Individual exercises

ex.9.1 (ans), ex.9.2 (ans), ex.9.3 (ans), ex.9.4 (ans), ex.9.5 (ans), ex.9.6 (tut), ex.9.7 (tut)

Worked exercises in book

	Design an experiment to test whether adding colour coding to an interface will improve accuracy. [page 339]

	You have been asked to compare user performance and preferences with two different learning systems, one using hypermedia (see Chapter 21), the other sequential lessons. Design a questionnaire to find out what the users think of the system. How would you go about comparing user performance with these two systems? [page 351]