Wednesday, May 02, 2007

Define-Use (DU) Paths and Unit Testing

Yesterday, I blogged about the use of cyclomatic complexity (CC) and unit testing. Cyclomatic complexity is one way of looking at the paths. Specifically, CC measures the control flow within a method.

Another way of looking at paths is to analyze a methods' data flow. A well researched metric for this analysis is called 'Define Use', or DU. A simplistic definition of DU is that if a variable is defined, it should be used, and if it is used, it should be tested.

Specifically, a DU path for a variable is a path from the defining node to the usage node, thus the "flow of data". DU paths are very good at identifying problems because the error usually occurs between the definition and usage.

CC, on the other hand, only looks at control flow. A DU path has a better chance of being executed compared to a CC path, but there are also other factors that play into that as well. Which one is better for unit testing? My personal opinion is a hybrid approach, one that looks at the linear combination of DU and CC paths.

Take a look at the same example used in the last blog:



We know that total paths = 4 and CC = 3, but what about DU? And better yet, what about trying to combine DU and CC? Would that combination benefit you? Let's take a look:

Path 1: TT (DU & CC)
Path 2: FT (DU & CC)
Path 3: TF (DU & CC)
Path 4: FF (DU)

In this example, there are also 4 DU paths. If you only test the 3 CC paths, you would end up with at most 66% coverage (path 1 is unrealizable). If you test the DU & CC combination, you would have 75% coverage.

What do you think?

Labels: , ,

Sunday, April 29, 2007

Cyclomatic Complexity and Unit Testing

I follow both Andrew Binstock and Andy Glover from Stelligent and have much respect for them both and would like to add to a blog that was posted.

I like their approach to using metrics to figure out the number of unit tests needed for a method. Actually, metrics should be used to determine the minimal number of unit tests. Cyclomatic complexity (CC) can be used for this, but so can other metrics like Define-Use. For those who don't know what CC is, it is defined as the minimal number of linearly independent paths within a unit of code (like a method).

The blog above provides a correlation between the number of CC paths and the number of unit tests. I think any unit test is better than no unit test, but I believe their correlation only applies if you don't know what the CC paths are.

For full disclosure, I am a co-founder of a product company (www.codign.com). The products are Eclipse plug-ins that design unit tests based on Cyclomatic Complexity AND Define-Use. We have a new release coming out very soon, which takes a very interesting spin on commodity metrics, but I'll blog about that later :)

Having worked for Tom McCabe for many years (he came up with Cyclomatic Complexity), I understand the benefits and pitfalls to this metric. The calculation is very simple: CC equals the number of decisions + 1. The proof is a bit more complicated :) , but that's the gist of it.

So, for example, a method with 2 decisions has 4 total paths and 3 CC paths. Let's take this example:



The questions I hear are, what are the three CC paths? Shouldn't you try to test them all? Can you test them all?

To figure out what the three CC paths are, you have to define your basis path (or path #1). This is very much a judgment call - it can be the longest path, the shortest path, the happy path, and so on. For us at Codign, our Path 1 (aka our Basis Path) is the path that executes all the true branches in a method. From there, or from any basis path, you simply work your way down by flipping decisions.

With the example above, there are four TOTAL paths:

Path 1: TT (both if statements execute 'TRUE')
Path 2: TF (first if statement is true, second is false)
Path 3: FT (first if statement is false, second is true)
Path 4: FF (first if statement is false, second is false)

There are 2 sets of CC paths:
Set 1
Path 1: TT
Path 2: FT
Path 3: TF

Set 2
Path 1: FF
Path 2: TF
Path 3: FT

Look at the code again and you'll notice that Path 1 from Set 1 (TT) cannot be tested (it is unrealizable). But why not test the other two paths? Given this information, is the effort required to test the other remaining basis paths that difficult?

As an aside, ever measure branch coverage? The above example can be tested with two tests (TF, FT) which would produce 100% branch coverage.

My point is this ... don't take CC at face value. I don't agree that, for example, if you have 20 CC paths you should only write 10 unit tests. I believe that, if a CC path is testable, you should test it.

But like I said earlier, not all CC paths are realizable, so is there a better way to identify the minimal number of unit tests? Ones that are realizable and measurable? There is, but I will blog about that later. :)

Labels: , , ,