Using risk assessment techniques to improve the efficiency of testing.

Agile Bug Finding with Risk Targeted Testing in Java

Developers & Pigs

A trained pig can find truffles. I know a number of Java developers who I could liken to these pigs, and the nearest they get to truffles is the local chocolatier, because they can sniff out a bug just by watching the UI carefully or even the pattern of the network LEDs. It is not just that they can spot a flaw in the program they can also tell you what has likely gone wrong; and when asked how they knew they say something like 'well if you are going to get it wrong then that is where you will get it wrong'. In other words they use their experience of developing, testing, integrating and fixing to good effect. As an organisation you need to be able to do the same.

Home · Contact · Blog · General Interest · Software · JHosts · Gos4j · © Hugh Reid

Commercial Testing

These days in commercial Java software time to market or speed of enhancement is the key business driver behind delivery dates. And to deliver a quality product to time and budget required effective use of the resources. Testing is a large part of the project development effort, and when things get tight it is usually the first thing to get cut - either officially or unofficially. But how should you test for less?

The goal of testing is to highlight mistakes and verify the concept of 'fit for purpose' both now and in the future.

Mistakes can occur in all parts of the development process and cover requirements, architecture, design, code, documentation and even the test suites themselves. People make mistakes everywhere, all the time, and so there are likely to be mistakes in every single project artefact. Halstead described the 'number of mental comparisons' of a piece of work as the measure of 'Volume'; which means that the bigger the volume of the artefact the more likely there are to be errors.

Of the various types of testing Unit Testing is by far the most costly. The system and integration testing can be very expensive if you are developing a missile system, but generally in Java and open-source software the 'fit for purpose' level tests are delegated to the beta release. A nasty state of affairs, and a whole other topic. From now on I am discussing how to improve unit testing. Don't forget that units are not just source code; and don't forget that mistakes can be things like performance and security.

If you extensively unit test everything then you cover both the needs of now (finding mistakes) and the needs of future (preventing mistakes by continuous regression testing). This 'good coverage' is hard to acheive in a realistic situation, because corners are always cut either in the number of test cases or the quality of the test data. So how do you balance the needs of now with the needs of the future? The answer is in layered testing. A system test implicitly covers your entire system, a code review covers at least one set of sources and a unit test covers its unit. Each layer of test performs checks at its level of scope.

Repeated tests will increase your testing execution time, require greater maintenance and give false metrics for pass rates. One of the dangers of layered testing approaches is that tests will get repeated. To avoid this you should have a common test description set of all layers of testing.

Or try:
time to market
speed of enhancement
fit for purpose
RUP
TDD
McCabe
Cyclomatic Complexity
design risk
Related Pages

Going Extreme

Agile processes like XP dictate that you should use TDD (Test driven development) where your software is written to satisfy a set of pre-written test cases. There are arguments going on about how benefitial it is as a technique - does being able to pass your driving test make you a good driver? No, but it means that your driving is of a particular standard. XP suggests doing the evident cases first, RUP which does not mandate TDD suggests doing the highest risk elements first. Obviously projects use a mixture of inputs when prioritising the work.

The risk associated with any one particular artefact is a combination of project/product risk, 'is it going to work?', and risk of mistakes, 'have we broken it?'. Quanitifying that risk involves assessing the chance of failure to do what your supposed to do, quickly, safely, securely and without side-effect. The risk of failing to do what you are supposed to do comes from good analysis of your requirements, if you write good use cases and have a clear idea of their importance, then the software involved has a level of risk associated with the criticality of the use cases with which they are involved. Similar techniques can identify the risk of performance, safety or security problems. But how can you assess the risk of mistakes in software? And if you are using TDD how do you do this before the software is written?

Some people advocate the use of McCabe's metric of 'Cyclomatic Complexity' as a guide to risk. But in my opinion this gives skewed results, as developers tend to be more careful with complex code. To measure the risk for a unit I suggest that you use the 'Volume' measure only, because it is quite accurate and it is fast to get.

But how do you calculate the 'Volume' metric of software that you have not written yet? If you use TDD you will have this problem. You could just ignore it and use the other risk factors to guide you, but there is a way to estimate the metric from the design document or interface definition. The design document and/or javadoc of the interface your unit needs to meet will have English language operators and operands (otherwise known as nouns and verbs). If you total up the number of unique nouns and verbs in the descriptions of the class and the methods then you get the vocabulary count and then you use the formula (n * log n) to give you a quantifiable figure. I call this the 'design risk' which is different, but normally reflective of the real risk. This technique can also be applied to requirements documents by the way.

Risk Register

The risk register is an important part of the management of what to test and how. At the start of the project, you probably have a budget of X and you know that untested lines of code cost Y each and test cases cost Z each. You probably calculated X based on the estimated number of lines of code to deliver the required functionality and added to that the cost of the testing based on a certain number of test cases per requirement point.

Create a table of artefacts to manage their risks and the mitigation of those risks. For each artefact note the risk associated with the product, performance, safety, security and mistakes, and create a score to sum the weighted risks. At the beginning of the project, your list will contain the requirements and use case descriptions, and later on be populated with all the source code and even the tests themselves.

You then create a mitigation table that again for every artefact lists the mitigation activities that have been carried out against the artefact; system tests, code reviews, number of unit test cases, and most importantly the number of mistakes found so far. You can weight the different types of testing according to their degree of efficacy, which means that code reviews normally carry the highest weight. Also I usually use the SQRT of the number of bugs found as that limits the effect.

There is a third table and that is the one that relates each artefact to its development context. Elements recorded here are what is the parent artefact of this artefact, who developed the artefact (team/person), what month was the artefact developed in (August and December are bad months). You can see why this table is sometimes dubbed the 'Big Brother' table. The purpose of this table is to try and identify patterns of risk, this is best performed as a human task rather than an automated one.

Your spreadsheet is already goaning, but you need one last table to pull all these others together. In it you add the risks to the context average risks and subtract the mitigations. In some cases you may want to do this by type so that, for example, performance risks are mitigated by performance tests. You should get a score for each artefact that evolves through the lifecycle of both the artefact and its context.

Obviously you should concentrate on getting the development right, but when you get to the point of saying 'what do I test now?' you look at the risks, context and the mitigations and pick the highest score.

You should regularly do updates to the context data. For example if the code of one developer is more buggy than others then adjust the context table to up the risk associated with all the code of that developer. Finding and fixing the discovered bugs will aleady have de-risked that code, but the increased awareness will mean that you might be best off spending extra effort testing their other artefacts.

Early testing is obviously an advantage. And iterative development processes provide well defined cycles of all of the different levels of testing. In the early iterations the emphasis will be on the project risks and high level artefacts; the final iteration will be almost completely testing of low level artefacts (even the tests themselves).

You should not forget that you are using a layered approach, and so before doing some unit testing you should already have a system test, a component test and a design review in place for the ancestors of that artefact. Doing this will try and ensure that what you are testing is correct.

Making Testing Easier in Java

Here are some tips for making your unit tests and code reviews easier in Java:

  • Run the tests every time you change something. If you use Ant then have a test task that gets triggered on change. Keep your old test results for reference.
  • Use XML for your test data. Hard coded tests are more difficult to maintain. An XML database is the ultimate form, but using a simple tool like XStream can do what you need. Use the XML include directive to share data between sets of test cases.
  • Check your test classes with FindBugs. FindBugs will examine the logic, which should be very simple in test classes. Having passed that your test classes can be safely used.
  • Use MockObjects or real classes with Proxy interceptors to keep the unit tests at the unit level. This insulates test classes, test data and test running against changes to other parts of the system.
  • The test class hierarchy should mimic that of the real code under test, this allows abstract classes to be tested properly.
  • For reviewing code use an integrated tool like Jupiter for Eclipse, that stores the review information with the source code.
  • Build metrics collection into your test process. Use the extensions to Checkstyle to get some and test to get the others. For example try running a complex test case 1000 times to get an idea of performance, or at the end of the test serialize the object under test and see its size. You can then watch for dramatic changes in metrics.
  • Use architectural checks like Macker rules to preserve the architecture standards.
  • Use a tool like vDoclet to generate an initial set of test cases and your basic JUnit test class from the interface definition.
There are thousands of other ways to do help your testing process, these are only some suggestions.

Conclusion

In this sense the table of risks represents the project pig, searching for those elusive mushrooms of bugs just below the surface ready for the tester to dig out. The project manager and the context table is the pig's handler who has a good idea which woodland to go and search in.

Making testing more effective is a combination of testing what need testing and testing more. You use the risks to define what needs testing, you make testing easier to do more testing.

Copyright © Hugh Reid, Creative Commons License
This work is licensed under a Creative Commons License.