Software Testing: December 2006

Tuesday, December 26, 2006

Is testing advancing or stagnating ?

The quality movement started in 1924 when Walter Shewhart gave his boss at Bell Labs a memo suggesting the use of statistics to improve quality in telephones. Later came Juran and Deming, and the movement was well on its way. Not surprisingly, the software industry eventually took up the challenge of systematically improving quality. Let’s look at how that began.

In 1976, Michael Fagan published his first paper on Design and Code Inspections. He talked about how inspections could reduce errors in software. In the same year, Glenford Myers wrote Software Reliability Principles and Practices. In this book, Myers talks about testing philosophies—emphasizing the importance of designing good test cases. He goes on to describe test cases which test the code as if it’s in a sealed, black box. In 1979, Myers wrote his fifth book, The Art of Software Testing, which soon became the bible of the Software Quality movement. In this book, he talks about the importance of Inspections, Black and White Box testing, and the benefits of regression testing.

Sounds like a solid beginning. So, what’s my point?

My point is this. I don’t think testing has advanced since Fagen and Myers wrote their first papers and books. We are still using the same methods to perform our work. We are still asking the same questions.

Now, I’m not suggesting that there haven’t been any important books written in the time since the books Myers wrote. In fact, since then many fine books have been written on the subject of software testing.

In 1983, seven years after Myers' software reliability book, Boris Beizer wrote Software Testing Techniques, a very good book on the subject of software testing. Beizer gives the terms Black Box and White Box testing new names—Functional Testing and Structural Testing respectively. But for the most part he talks about testing methods similar to Myers.

In 1995, a full nineteen years after Myers’ book, Edward Kit wrote Software Testing in The Real World, another good book on software testing. But still, Kit talks about Functional Testing (Black Box) as well as White Box Testing.

But if you have been in the business for any length of time, you get a distinct sense of déjà-vu. If you don’t believe me, take a look at the next testing conference advertisement you get in the mail. Then think about that talk you attended years ago. The one where the speaker described a testing oracle that would create test cases for you. Have you ever seen such a tool that really worked on real code? I doubt it.

What about the CMM and ISO 9000? These processes were going to help us produce high-quality software. How many of you are still using them? Have they solved your quality issues?

Like most of you, I create functional test cases, update regression test suites, and attend
an occasional code review, all in the name of process improvement. But I haven’t seen anything new or revolutionary impact my world.

Again, I’m not minimizing or trying to downplay software quality process. But, like most of you, I work in the real world of tight deadlines and poor requirements. Most of the time I don’t even have real functional specifications. Software engineering documentation—what’s that?

So thinking back to Myers’ 1976 book and all the testing books and conferences since, have we advanced or are we stagnating?

Let’s just say, I feel the algae growing.

Thursday, December 14, 2006

Testing For Developers

When we testers find yet another "Did you even run this?" bug, it's easy to believe developers purposely inject bugs just to taunt us. I have worked with a lot of developers over the years, and I've found that they generally do try to test their code but simply don't know how to go about doing so effectively.

As testers gain experience they build up a checklist of common problems, classes of bugs that crop up over and over, and areas that tend to be troublesome. Developers don't have our experience, so they don't have anything similar.

If developers don't know how to test very well, and testers have a simple checklist we use to find the most common bugs, then giving our checklist to developers should help out both sides.

That's my theory anyway, and this is my checklist. I'm testing it right now with my developers. Test it with yours/yourself and let me know what happens!

Customize This List: If you get a bug, determine what test (or even better, what general class of tests or testing technique) would have caught the bug, then add it to this list.
Use Your Tester: Brainstorm tests with your tester. Review your (planned/actual) tests with your feature team. Coordinate your testing with your tester, especially with respect to tests they have already written/are currently writing.
Focus On Your Customer: Think, "Where would the presence of bugs hurt our customers the most?", then let your answers drive your testing.
Test Around Your Change. Consider what it might affect beyond its immediate intended target. Think about related functionality that might have similar issues. If fixing these surrounding problems is not relevant to your change, log bugs for them. To quote a smart person I know, "Don't just scratch it. What's the bug bite telling you?"
Use Code Coverage. Code coverage can tell you what functionality has not yet been tested, but don't just write a test case to hit the code. Instead, let it help you to determine what classes of testing and test cases that uncovered code indicates you are missing.
Consider Testability. Hopefully you have considered testability throughout your design and implementation process. If not, think about what someone else will have to do to test your code. What can you do / do you need to do in order to allow proper, authorized verification? (Hint: Test Driven Design is a great way to achieve testable code right from the get-go!)

Ways To Find Common Bugs:
Reset to default values after testing other values (e.g., pairwise tests, boundary condition tests).
Look for hard coded data (e.g., "c:\temp" rather than using system APIs to retrieve the temporary folder), run the application from unusual locations, open documents from and save to unusual locations.
Run under different locales and language packs.
Run under different accessibility schemes (e.g., large fonts, high contrast).
Save/Close/Reopen after any edit.
Undo, Redo after any edit.
Test Boundary Conditions: Determine the boundary conditions and equivalency classes, and then test just below, at, in the middle, and just above each condition. If multiple data types can be used, repeat this for each option (even if your change is to handle a specific type). If multiple data types can be used, repeat this for each option. For numbers, common boundaries include:
smallest valid value
at, just below, and just above the smallest possible value
-1
0
1
some
many
at, just below, and just above the largest possible value
largest valid value
invalid values
different-but-similar datatypes (e.g., unsigned values where signed values are expected)
for objects, remember to test with null and invalid instances
Other Helpful Techniques:
Do a variety of smallish pairwise tests to mix-and-match parameters, boundary conditions, etc. One axis that often brings results is testing both before and after resetting to default values.
Repeat the same action over and over and over, both doing exactly the same thing and changing things up.
Verify that every last bit of functionality you have implemented discussed in the spec matches what the spec describes should happen. Then look past the spec and think about what is not happening but should.
"But a user would never do that!": To quote Jerry Weinberg, When a developer says, "a user would never do that," I say, "Okay, then it won't be a problem to any user if you write a little code to catch that circumstance and stop some user from doing it by accident, giving a clear message of what happened and why it can't be done." If it doesn't make sense to do it, no user will ever complain about being stopped.

Wednesday, December 13, 2006

Testers mentality

Hey one question can anyone tell what is the tester's mentality?

The good answer is

More than the product working perfectly without any crash assert, if it is not intuitive and user friendly it is useless. What will the customer use it for? Does it provide value for the scenarios the customer cares about? Think of the variety of customers - not everybody will have a top-of-the-line machine, or the same technical expertise, or the same expectation of the product. More than just breaking software, the tester should make sure it has the capabilities the customer needs.

If you aren't thinking about your customer, keeping them in mind as you test, then you aren't testng.

What do you think? put the comment regarding this.

Tuesday, December 12, 2006

Usability Sidebar

1. Speak the users' language. Use words, phrases, and concepts familiar to the user. Present information in a natural and logical order.

2. Be Consistent. Indicate similar concepts through identical terminology and graphics. Adhere to uniform conventions for layout, formatting, typefaces, labeling, etc.

3. Minimize the users' memory load. Take advantage of recognition rather than recall. Do not force users to remember key information across documents.

4. Build flexible and efficient systems. Accommodate a range of user sophistication and diverse user goals. Provide instructions where useful. Lay out screens so that frequently accessed information is easily found.

5. Design aesthetic and minimalist systems. Create visually pleasing displays. Eliminate information which is irrelevant or distracting.

6. Use chunking. Write material so that documents are short and contain exactly one topic. Do not force the user to access multiple documents to complete a single thought.

7. Provide progressive levels of detail. Organize information hierarchically, with more general information appearing before more specific detail. Encourage the user to delve as deeply as needed, but to stop whenever sufficient information has been received.

8. Give navigational feedback. Facilitate jumping between related topics. Allow the user to determine her/his current position in the document structure. Make it easy to return to an initial state.

9. Don't lie to the user. Eliminate erroneous or misleading links. Do not refer to missing information.

For more information click :http://stats.bls.gov/ore/htm_papers/st960150.htm

Monday, December 11, 2006

Usability Testing

Definition
Usability testing measures the suitability of the software for its users, and is directed at measuring the effectiveness, efficiency and satisfaction with which specified users can achieve specified goals in particular environments or contexts of use. Effectiveness is the capability of the software product to enable users to achieve specified goals with accuracy and completeness in a specified context of use. Efficiency is the capability of the product to enable users to expend appropriate amounts of resources in relation to the effectiveness achieved in a specified context of use. Satisfaction is the capability of the software product to satisfy users in a specified context of use (see Jacob Nielsen's work including web site www.useit.com). Attributes that may be measured are

§ understandability (attributes of the software that bear on the users’ effort for recognising the logical concept and its applicability)

§ learnability (attributes of software that bear on the users’ effort for learning the application)

§ operability (attributes of the software that bear on the users’ effort for operations and operation control)

§ attractiveness (the capability of the software to be liked by the user).

Note: Usability evaluation has two purposes: the first is to remove usability defects (sometimes referred to as formative evaluation) and the second is to test against usability requirements (sometimes referred to as summative evaluation). It is important to have high level goals for effectiveness, efficiency and satisfaction, but not all means to achieving these goals can be precisely specified or measured. It is important that usability evaluation has the objective of "getting inside the user's head" and understanding why they have a difficulty using the proposed design, using methods that help understand the problems.

More information on setting criteria for effectiveness, efficiency and satisfaction can be found in ISO 9241-11: Guidance on usability, and ISO/IEC 9126-4: Quality in use metrics ("quality in use" is defined is a similar way to "usability" in ISO 9241-11).

Overall approach
A three-step approach is suggested overlaid on the V model

1. Establish and validate usability requirements (outside the scope of this standard)

2. Inspect or review the specification and designs from a usability perspective (covered in the process section of this standard)

3. Verify and validate the implementation (usability testing)

Usability test documentation: Usability testing may be documented to follow ISO 14598 (also note that a Common Industry Format is being developed by the Industry USability Reporting project (IUSR) and documented on www.nist.gov.uk). The documentation may include the following:

§ Description of the purpose of the test

§ Identification of product types

§ Specification of quality model to be used

§ Identification of contexts of use (including users, goals and environment)

§ Identification of the context for the test showing how closely this meets the actual context of use

§ Selection of metrics, normally measuring at least one metric for each of effectiveness, efficiency, satisfaction and where relevant safety

§ Criteria for assessment

§ Interpretation of measures of the usability of the software.

Selection of techniques: Many techniques are available for developing usability requirements and for measuring usability. Each project or business would make its own decision about selection of techniques, depending on cost and risk. Example - a review process only by the development team incurs lower preparation and meeting costs but does not involve the users, so it addresses in theory how a user might react to the system to be built. A review with users costs more in the short term but the user involvement will be cost effective in finding problems early. A usability lab costs a great deal to set up (video cameras, mock up office, review panel, users, etc) but enables the development staff to observe the effect of the actual system on real people. This option may be attractive where this form of testing is high priority, but for a relatively small number of applications. It is possible to make a simpler, cheaper environment; for example, Perlman’s use of a mirror on the monitor with an over-the-shoulder video camera, so that he could record the screen and the user's expression.

Test environment: Testing should be done under conditions as close as possible to those under which the system will be used. It may be necessary to build a specific test environment, but many of the usability tests may be part of other tests, for example during functional system test. Part of the test environment is the context, so thought should be given to different contexts, including environment, and to the selection of specified users.

Size of sample user group: Research (ref. http://www.usability.serco.com/trump) shows that if users are selected who are representative of each user group then 3-5 users are sufficient to identify problems. 8 or more users of each type are required for reliable measures. The Common Industry Format (http://www.nist.gov/iusr) also requires a minimum of 8 users. In contrast, Nielsen measured the number of usability problems, not user performance (Nielsen, J. & Landauer, T. K. (1993) A mathematical model of the finding of usability problems. In: CHI '93. Conference proceedings on Human factors in computing systems, 206-213). In practice the number required would depend on the variance in the data, which will determine whether the results are statistically significant. Another paper stated "achievement of usability goals can only be validated by using a method such as the Performance Measurement Method to measure against the goals, and this requires 8 or more users to get reliable results." Macleod M, Bowden R, Bevan N and Curson I (1997) The MUSiC Performance Measurement method, Behaviour and Information Technology, 16, 279-293.

Test scenarios: The user tests require production of test scenarios. These include user instructions, allowance of time for pre and post test interviews for giving instructions and receiving feedback, logging the session (for the designers and developers to observe if they cannot be present), suitable environment, observer training, and an agreed protocol for running the sessions. The protocol includes a description of how the test will be carried out (welcome, confirmation that this is the correct user, timings, note taking and session logging, interview and survey methods used).

Usability issues: These may be raised in any area that a user (novice or expert) can be affected by. This includes documentation, installation, misleading messages, return codes. These can often be perceived as low priority if the functionality is correct even if the system is awkward to use. For instance a spelling mistake or obvious GUI problem in a screen that is frequently in use will be more serious than one in an obscure screen that is only occasionally seen.

Building the tests
In this standard there is not space to describe all the usability testing techniques in detail. We have selected a few important ones, but other techniques are available. These are listed but not defined at the end of this document. Whatever techniques are used, you will need to decide the goals for the test, make a task analysis, select the context of use, and decide on appropriate satisfaction, effectiveness and efficiency measurements. These could be based on published methods such as mental effort measured by how hard an effort someone has to make to solve a problem (referred to as Cognitive Workload), or by simply timing several users at a task, or by asking the users for their views.

Early life cycle techniques: Some techniques can be used early in the lifecycle and so influence and test the design and build of the system, for example Heuristic Evaluation (Heuristic evaluation is a systematic inspection of a user interface design for usability. The goal of heuristic evaluation is to find the usability problems in the design so that they can be attended to as part of an iterative design process. Heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognised usability principles (the "heuristics").

Late life cycle techniques: Some techniques are used post software build, for example the use of survey and questionnaire techniques where the system is in use and observations of user behaviour with the system in a usability test lab.

An example of an observation technique is the Thinking aloud protocol. In this method the users describe what they are doing and why, their reaction to the system - they think aloud. This is recorded, either on a video recorder, an audiotape or by an observer sitting with the user. In this case, a "test lab" may be set up; mimicking the normal office set up. A video tape recorder is positioned behind/above the user, and the observer sits either with the user or behind a 2-way mirror. The users talk to the observers during the work to say what they are doing and what they are thinking. The purpose of the test is explained to the users - that it is a test of the system's usability not a test of the users. They are given instructions as to how to run the test, and observation and reporting rules. This type of test is explorative, using test scenarios which would be done first by the usability tester as use case tests, and then brought into the usability lab or thinking aloud with the user. It is important to consider the effect on the user of being observed; the tests must take place in an atmosphere of trust and honesty.

You may also wish to consider survey techniques and attitude questionnaires, either "home grown" or if you wish to measure against a benchmark the use of standardised and publicly available surveys such as SUMI and WAMMI, which are marked against a database of previous usability measurements. Part of the ESPIRIT MUSiC project, SUMI is developed and administered by the Human Factors Research Group at the University College Cork. SUMI is a brief questionnaire that is marked against a benchmark of responses to surveys of systems. WAMMI is the on-line survey administered as a page on the web site, and users are asked to complete it before they leave the page. This gives ongoing feedback to continue monitoring how the web site is used. Each organisation using the SUMI or WAMMI surveys send back their results to the HFRG who provide statistical results from the database build of all SUMI/WAMMI users.

Test case derivation: The test cases are the functional test cases but are measured for different outcomes, for example speed of learning rather than functional outcomes. Tests may be developed to test the syntax (structure or grammar) of the interface, for example what can be entered to a field as well as the semantics (meaning) for example that each input required, system message and output is reasonable to and meaningful to the user. These tests may be derived using black box or white box methods (for example those described in BS7925-2) and could be seen as functional tests ("we would expect this functionality in the system") which also are usability tests ("we expect that the user is protected from making this mistake"). Techniques outside BS7925-2, for example use cases, may also be used. Note: use cases are defined within UML (Unified Modelling Language) are often used in an Object Oriented (OO) development. However, UML constructs such as use cases may be used successfully in non-OO projects.

The context of use may be built into a checklist. This context checklist leads us to consider developing tests using equivalence partitioning and boundary value analysis (see BS7925-2 for the techniques). These techniques make it easier to define tests across the range of contexts without repetition or missing contexts. The partitions and boundaries in which we are interested are those between the contexts of use, rather than partitions and boundaries between inputs and outputs. We may wish to use the ideas of the techniques rather than necessarily following the standard to the letter. Use risk analysis for the effect of usability problems to weight the partitions and therefore include the most important tests.

Example: We may wish to test that a public emergency call button is available to everyone. What height should it be placed? Partitions to consider:

1. User over 2 m tall (user is extremely tall - may have to bend excessively)

2. User 1.8 m to 2m tall (tall)

3. User 1.3m to 1.79m tall (average)

4. User 1.0m to 1.29 m tall (user is short, must be able to reach)

5. User is less than 1.0 m tall (a child, or in a wheelchair, for example, must be able to reach)

6. User is prone (after accident, must be able to reach)

Type of technique
Examples
Comments

Inquiry
Contextual Inquiry

Ethnographic Study / Field Observation

Interviews and Focus Groups

Surveys

Questionnaires

Journaled Sessions

Self-reporting Logs

Screen Snapshots
These are a selection of methods that gather information as a system is in use, either by observation of the user, or by asking the user to comment.

Inspection
Heuristic evaluation

Cognitive Walkthroughs

Formal Usability Inspections

Pluralistic Walkthroughs

Feature Inspection

Consistency Inspection

Standards Inspection

Guideline checklists
These are all variations on Review, walkthrough and inspection techniques, with specialised checklists or specialist reviewers.

Testing
Thinking Aloud protocol

Co-discovery method

Question asking protocol

Performance measurement

Eye-tracking
These are techniques which help the user and the usability tester/analyst to discuss/discover how the user is using and thinking about the system

Related Techniques
Prototyping :Low-fidelity / High-fidelity / Horizontal / Vertical

Affinity Diagrams

Archetype Research

Blind voting

Card-Sorting

Education Evaluation
Although not primarily usability techniques, these are techniques that some writers on usability have recommended. They are all ways to elicit reactions from

Software Testing