Monday, September 10, 2007

Taking Exception with Exceptions

Exceptions in Java are thrown when an abnormal condition arises. Exceptions not handled by a method are passed (via the throws keyword) to its parent. More often that not, test suites focus on the normal behavior of a method, and skip over the cases where the unhandled exceptions are thrown. A comprehensive unit testing effort should test each error condition separately. Thus, fewer thrown exceptions means fewer individual tests will be needed.

Consider a method with this signature:

public void foo() throws Exception1, Exception2, Exception3

What if there's a bug in this method that causes Exception1 to be thrown instead of Exception2? In order to fully test this method, our test suite needs to include at least three JUnit tests to confirm that the proper exception is thrown when the corresponding abnormal condition arises.

Even if the correct type of exception is thrown, it may be the case that all three exception types are handled the same way, or it may be that some of the exceptions can be handled by the method itself. If we can remove even one of the thrown exception types, we'll reduce our testing effort by one-third. There is no shortcut for reducing the number of thrown exceptions, of course -- indeed, each distinct exception may be necessary -- but by keeping in mind the number of thrown exceptions as you code, you may be able to streamline your testing effort down the road.

Labels: , , ,

Monday, July 23, 2007

Seven Metrics to Improve Your Unit Testing

Unit testing is tough, regardless of manual or automated, test-first or test-after. With any of the approaches, there are certain metrics to pay attention to while you code. Staying within acceptable levels means your code will be much more testable, whether you unit test or not!

  • Parameters - Each parameter added to a method's signature increases the number of possible ways that method can be invoked, eventually creating an unmanageable combinatorial problem. By limiting the number of parameters to three or fewer, you can easily increase the testability of your code.
  • Exceptions - Each thrown exception represents a distinct error condition that must be tested, and error conditions are very hard to test. Keeping the number of thrown exceptions to three or less greatly simplifies your testing effort.
  • Cyclomatic Complexity - Cyclomatic Complexity (CC) is a measure of the decision logic in a method. The more decisions your method has, the more ways it will behave. Keeping the number of CC paths to 10 or less greatly reduces the risk of defects as well as increases the overall testability of your code.
  • Lines of Code - One of the oldest and simplest metrics used to measure a method's testability. A shorter method is easier to understand and test.
  • Paths - This metric looks at distinct data paths. Data paths are paths that link a data element's definition to its use. In other words, if you define a data element, you should use it, and if you use it, you should test it. Data paths often overlap with Cyclomatic Complexity paths.
  • Static Invocations - Methods under test invoke methods from external classes, and these external classes can usually be mocked or dummied up to behave exactly as required for an individual unit test. Static invocations of external methods often have an unchangeable behavior that thwarts even the best testing effort. Avoid this unnecessary coupling to external classes, and keep your external static invocations to a minimum.
  • Anonymous Classes - Anonymous classes make unit testing difficult because they exist only within the method under test, and cannot themselves be unit tested. Although anonymous classes are convenient (especially for GUI programming), their lack of testability and mockability can hinder your testing effort.

Labels: , ,

Wednesday, May 02, 2007

Define-Use (DU) Paths and Unit Testing

Yesterday, I blogged about the use of cyclomatic complexity (CC) and unit testing. Cyclomatic complexity is one way of looking at the paths. Specifically, CC measures the control flow within a method.

Another way of looking at paths is to analyze a methods' data flow. A well researched metric for this analysis is called 'Define Use', or DU. A simplistic definition of DU is that if a variable is defined, it should be used, and if it is used, it should be tested.

Specifically, a DU path for a variable is a path from the defining node to the usage node, thus the "flow of data". DU paths are very good at identifying problems because the error usually occurs between the definition and usage.

CC, on the other hand, only looks at control flow. A DU path has a better chance of being executed compared to a CC path, but there are also other factors that play into that as well. Which one is better for unit testing? My personal opinion is a hybrid approach, one that looks at the linear combination of DU and CC paths.

Take a look at the same example used in the last blog:



We know that total paths = 4 and CC = 3, but what about DU? And better yet, what about trying to combine DU and CC? Would that combination benefit you? Let's take a look:

Path 1: TT (DU & CC)
Path 2: FT (DU & CC)
Path 3: TF (DU & CC)
Path 4: FF (DU)

In this example, there are also 4 DU paths. If you only test the 3 CC paths, you would end up with at most 66% coverage (path 1 is unrealizable). If you test the DU & CC combination, you would have 75% coverage.

What do you think?

Labels: , ,

Sunday, April 29, 2007

Cyclomatic Complexity and Unit Testing

I follow both Andrew Binstock and Andy Glover from Stelligent and have much respect for them both and would like to add to a blog that was posted.

I like their approach to using metrics to figure out the number of unit tests needed for a method. Actually, metrics should be used to determine the minimal number of unit tests. Cyclomatic complexity (CC) can be used for this, but so can other metrics like Define-Use. For those who don't know what CC is, it is defined as the minimal number of linearly independent paths within a unit of code (like a method).

The blog above provides a correlation between the number of CC paths and the number of unit tests. I think any unit test is better than no unit test, but I believe their correlation only applies if you don't know what the CC paths are.

For full disclosure, I am a co-founder of a product company (www.codign.com). The products are Eclipse plug-ins that design unit tests based on Cyclomatic Complexity AND Define-Use. We have a new release coming out very soon, which takes a very interesting spin on commodity metrics, but I'll blog about that later :)

Having worked for Tom McCabe for many years (he came up with Cyclomatic Complexity), I understand the benefits and pitfalls to this metric. The calculation is very simple: CC equals the number of decisions + 1. The proof is a bit more complicated :) , but that's the gist of it.

So, for example, a method with 2 decisions has 4 total paths and 3 CC paths. Let's take this example:



The questions I hear are, what are the three CC paths? Shouldn't you try to test them all? Can you test them all?

To figure out what the three CC paths are, you have to define your basis path (or path #1). This is very much a judgment call - it can be the longest path, the shortest path, the happy path, and so on. For us at Codign, our Path 1 (aka our Basis Path) is the path that executes all the true branches in a method. From there, or from any basis path, you simply work your way down by flipping decisions.

With the example above, there are four TOTAL paths:

Path 1: TT (both if statements execute 'TRUE')
Path 2: TF (first if statement is true, second is false)
Path 3: FT (first if statement is false, second is true)
Path 4: FF (first if statement is false, second is false)

There are 2 sets of CC paths:
Set 1
Path 1: TT
Path 2: FT
Path 3: TF

Set 2
Path 1: FF
Path 2: TF
Path 3: FT

Look at the code again and you'll notice that Path 1 from Set 1 (TT) cannot be tested (it is unrealizable). But why not test the other two paths? Given this information, is the effort required to test the other remaining basis paths that difficult?

As an aside, ever measure branch coverage? The above example can be tested with two tests (TF, FT) which would produce 100% branch coverage.

My point is this ... don't take CC at face value. I don't agree that, for example, if you have 20 CC paths you should only write 10 unit tests. I believe that, if a CC path is testable, you should test it.

But like I said earlier, not all CC paths are realizable, so is there a better way to identify the minimal number of unit tests? Ones that are realizable and measurable? There is, but I will blog about that later. :)

Labels: , , ,