Definition
Usability testing measures the suitability of the software for its users, and is directed at measuring the effectiveness, efficiency and satisfaction with which specified users can achieve specified goals in particular environments or contexts of use. Effectiveness is the capability of the software product to enable users to achieve specified goals with accuracy and completeness in a specified context of use. Efficiency is the capability of the product to enable users to expend appropriate amounts of resources in relation to the effectiveness achieved in a specified context of use. Satisfaction is the capability of the software product to satisfy users in a specified context of use (see Jacob Nielsen's work including web site www.useit.com). Attributes that may be measured are
§ understandability (attributes of the software that bear on the users’ effort for recognising the logical concept and its applicability)
§ learnability (attributes of software that bear on the users’ effort for learning the application)
§ operability (attributes of the software that bear on the users’ effort for operations and operation control)
§ attractiveness (the capability of the software to be liked by the user).
Note: Usability evaluation has two purposes: the first is to remove usability defects (sometimes referred to as formative evaluation) and the second is to test against usability requirements (sometimes referred to as summative evaluation). It is important to have high level goals for effectiveness, efficiency and satisfaction, but not all means to achieving these goals can be precisely specified or measured. It is important that usability evaluation has the objective of "getting inside the user's head" and understanding why they have a difficulty using the proposed design, using methods that help understand the problems.
More information on setting criteria for effectiveness, efficiency and satisfaction can be found in ISO 9241-11: Guidance on usability, and ISO/IEC 9126-4: Quality in use metrics ("quality in use" is defined is a similar way to "usability" in ISO 9241-11).
Overall approach
A three-step approach is suggested overlaid on the V model
1. Establish and validate usability requirements (outside the scope of this standard)
2. Inspect or review the specification and designs from a usability perspective (covered in the process section of this standard)
3. Verify and validate the implementation (usability testing)
Usability test documentation: Usability testing may be documented to follow ISO 14598 (also note that a Common Industry Format is being developed by the Industry USability Reporting project (IUSR) and documented on www.nist.gov.uk). The documentation may include the following:
§ Description of the purpose of the test
§ Identification of product types
§ Specification of quality model to be used
§ Identification of contexts of use (including users, goals and environment)
§ Identification of the context for the test showing how closely this meets the actual context of use
§ Selection of metrics, normally measuring at least one metric for each of effectiveness, efficiency, satisfaction and where relevant safety
§ Criteria for assessment
§ Interpretation of measures of the usability of the software.
Selection of techniques: Many techniques are available for developing usability requirements and for measuring usability. Each project or business would make its own decision about selection of techniques, depending on cost and risk. Example - a review process only by the development team incurs lower preparation and meeting costs but does not involve the users, so it addresses in theory how a user might react to the system to be built. A review with users costs more in the short term but the user involvement will be cost effective in finding problems early. A usability lab costs a great deal to set up (video cameras, mock up office, review panel, users, etc) but enables the development staff to observe the effect of the actual system on real people. This option may be attractive where this form of testing is high priority, but for a relatively small number of applications. It is possible to make a simpler, cheaper environment; for example, Perlman’s use of a mirror on the monitor with an over-the-shoulder video camera, so that he could record the screen and the user's expression.
Test environment: Testing should be done under conditions as close as possible to those under which the system will be used. It may be necessary to build a specific test environment, but many of the usability tests may be part of other tests, for example during functional system test. Part of the test environment is the context, so thought should be given to different contexts, including environment, and to the selection of specified users.
Size of sample user group: Research (ref. http://www.usability.serco.com/trump) shows that if users are selected who are representative of each user group then 3-5 users are sufficient to identify problems. 8 or more users of each type are required for reliable measures. The Common Industry Format (http://www.nist.gov/iusr) also requires a minimum of 8 users. In contrast, Nielsen measured the number of usability problems, not user performance (Nielsen, J. & Landauer, T. K. (1993) A mathematical model of the finding of usability problems. In: CHI '93. Conference proceedings on Human factors in computing systems, 206-213). In practice the number required would depend on the variance in the data, which will determine whether the results are statistically significant. Another paper stated "achievement of usability goals can only be validated by using a method such as the Performance Measurement Method to measure against the goals, and this requires 8 or more users to get reliable results." Macleod M, Bowden R, Bevan N and Curson I (1997) The MUSiC Performance Measurement method, Behaviour and Information Technology, 16, 279-293.
Test scenarios: The user tests require production of test scenarios. These include user instructions, allowance of time for pre and post test interviews for giving instructions and receiving feedback, logging the session (for the designers and developers to observe if they cannot be present), suitable environment, observer training, and an agreed protocol for running the sessions. The protocol includes a description of how the test will be carried out (welcome, confirmation that this is the correct user, timings, note taking and session logging, interview and survey methods used).
Usability issues: These may be raised in any area that a user (novice or expert) can be affected by. This includes documentation, installation, misleading messages, return codes. These can often be perceived as low priority if the functionality is correct even if the system is awkward to use. For instance a spelling mistake or obvious GUI problem in a screen that is frequently in use will be more serious than one in an obscure screen that is only occasionally seen.
Building the tests
In this standard there is not space to describe all the usability testing techniques in detail. We have selected a few important ones, but other techniques are available. These are listed but not defined at the end of this document. Whatever techniques are used, you will need to decide the goals for the test, make a task analysis, select the context of use, and decide on appropriate satisfaction, effectiveness and efficiency measurements. These could be based on published methods such as mental effort measured by how hard an effort someone has to make to solve a problem (referred to as Cognitive Workload), or by simply timing several users at a task, or by asking the users for their views.
Early life cycle techniques: Some techniques can be used early in the lifecycle and so influence and test the design and build of the system, for example Heuristic Evaluation (Heuristic evaluation is a systematic inspection of a user interface design for usability. The goal of heuristic evaluation is to find the usability problems in the design so that they can be attended to as part of an iterative design process. Heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognised usability principles (the "heuristics").
Late life cycle techniques: Some techniques are used post software build, for example the use of survey and questionnaire techniques where the system is in use and observations of user behaviour with the system in a usability test lab.
An example of an observation technique is the Thinking aloud protocol. In this method the users describe what they are doing and why, their reaction to the system - they think aloud. This is recorded, either on a video recorder, an audiotape or by an observer sitting with the user. In this case, a "test lab" may be set up; mimicking the normal office set up. A video tape recorder is positioned behind/above the user, and the observer sits either with the user or behind a 2-way mirror. The users talk to the observers during the work to say what they are doing and what they are thinking. The purpose of the test is explained to the users - that it is a test of the system's usability not a test of the users. They are given instructions as to how to run the test, and observation and reporting rules. This type of test is explorative, using test scenarios which would be done first by the usability tester as use case tests, and then brought into the usability lab or thinking aloud with the user. It is important to consider the effect on the user of being observed; the tests must take place in an atmosphere of trust and honesty.
You may also wish to consider survey techniques and attitude questionnaires, either "home grown" or if you wish to measure against a benchmark the use of standardised and publicly available surveys such as SUMI and WAMMI, which are marked against a database of previous usability measurements. Part of the ESPIRIT MUSiC project, SUMI is developed and administered by the Human Factors Research Group at the University College Cork. SUMI is a brief questionnaire that is marked against a benchmark of responses to surveys of systems. WAMMI is the on-line survey administered as a page on the web site, and users are asked to complete it before they leave the page. This gives ongoing feedback to continue monitoring how the web site is used. Each organisation using the SUMI or WAMMI surveys send back their results to the HFRG who provide statistical results from the database build of all SUMI/WAMMI users.
Test case derivation: The test cases are the functional test cases but are measured for different outcomes, for example speed of learning rather than functional outcomes. Tests may be developed to test the syntax (structure or grammar) of the interface, for example what can be entered to a field as well as the semantics (meaning) for example that each input required, system message and output is reasonable to and meaningful to the user. These tests may be derived using black box or white box methods (for example those described in BS7925-2) and could be seen as functional tests ("we would expect this functionality in the system") which also are usability tests ("we expect that the user is protected from making this mistake"). Techniques outside BS7925-2, for example use cases, may also be used. Note: use cases are defined within UML (Unified Modelling Language) are often used in an Object Oriented (OO) development. However, UML constructs such as use cases may be used successfully in non-OO projects.
The context of use may be built into a checklist. This context checklist leads us to consider developing tests using equivalence partitioning and boundary value analysis (see BS7925-2 for the techniques). These techniques make it easier to define tests across the range of contexts without repetition or missing contexts. The partitions and boundaries in which we are interested are those between the contexts of use, rather than partitions and boundaries between inputs and outputs. We may wish to use the ideas of the techniques rather than necessarily following the standard to the letter. Use risk analysis for the effect of usability problems to weight the partitions and therefore include the most important tests.
Example: We may wish to test that a public emergency call button is available to everyone. What height should it be placed? Partitions to consider:
1. User over 2 m tall (user is extremely tall - may have to bend excessively)
2. User 1.8 m to 2m tall (tall)
3. User 1.3m to 1.79m tall (average)
4. User 1.0m to 1.29 m tall (user is short, must be able to reach)
5. User is less than 1.0 m tall (a child, or in a wheelchair, for example, must be able to reach)
6. User is prone (after accident, must be able to reach)
Type of technique
Examples
Comments
Inquiry
Contextual Inquiry
Ethnographic Study / Field Observation
Interviews and Focus Groups
Surveys
Questionnaires
Journaled Sessions
Self-reporting Logs
Screen Snapshots
These are a selection of methods that gather information as a system is in use, either by observation of the user, or by asking the user to comment.
Inspection
Heuristic evaluation
Cognitive Walkthroughs
Formal Usability Inspections
Pluralistic Walkthroughs
Feature Inspection
Consistency Inspection
Standards Inspection
Guideline checklists
These are all variations on Review, walkthrough and inspection techniques, with specialised checklists or specialist reviewers.
Testing
Thinking Aloud protocol
Co-discovery method
Question asking protocol
Performance measurement
Eye-tracking
These are techniques which help the user and the usability tester/analyst to discuss/discover how the user is using and thinking about the system
Related Techniques
Prototyping :Low-fidelity / High-fidelity / Horizontal / Vertical
Affinity Diagrams
Archetype Research
Blind voting
Card-Sorting
Education Evaluation
Although not primarily usability techniques, these are techniques that some writers on usability have recommended. They are all ways to elicit reactions from
No comments:
Post a Comment