Article: Avoiding 5 Common Pitfalls in Unit Testing

Llewellyn Falco and I recently wrote an article for DevelopMentor’s Developments newsletter entitled Avoiding 5 Common Pitfalls in Unit Testing.

You can read it at the DevelopMentor website:

http://www.develop.com/testingpitfalls

I’ve republished here for my readers. Enjoy!

[Update: We have also done a webcast demonstrating some of these ideas, which we wrote up here:

http://www.michaelckennedy.net/blog/2009/10/28/TDDInvadesSpaceInvaders.aspx]


Avoiding 5 Common Pitfalls in Unit Testing

by Llewellyn Falco and Michael Kennedy

When I started out with unit tests, I was enthralled with the promise of ease and security that they would bring to my projects. In practice, however, the theory of sustainable software through unit tests started to break down. This difficulty continued to build up, until I finally threw my head back in anger and declared that

“Unit Tests have become more trouble than they are worth.”

So we stopped. Not all once, but over the months our unit tests died a quiet death. When tests would stop working, we just ignored them. When new features were reported, they were developed without unit testing. At first, it seemed great. We were able to move without the baggage of maintaining the old tests! But soon all the original problems of having a system without tests came back to us. Things keep breaking, deadlines were increasingly pushed back. Releases came with an extraordinary amount of stress, late nights & weekends. The final straw came when we were forced to rush out an immediate update, and ended up taking down the company for 2 days straight. Our new motto became:

“Unit Testing: you’re damned if you do, you’re damned if you don’t.”

In the end, we decided that despite the hardship caused by maintaining unit tests, it just wasn’t feasible to operate without them. So we started down the road to re-incorporate testing into our software development process. As the months went by, however, we discovered that the hardships we remembered had not returned. Looking back, we realized that we had made many mistakes the first time around. The second time around we were smarter. So you, too, can enjoy the benefits of unit tests here are the 5 major pitfalls we encountered the first time around, and how you can avoid them.

Pitfall #1: Tests are hard to maintain.

Because tests were only there to service and support the production code, they became second class citizens. We would spend time carefully choosing method names, refactoring our code to keep our classes and methods small, and so on. But we never applied these same principles to our test code.

As a side-effect of adding back the old tests, we reviewed and cleaned them up with the same level of scrutiny we gave to our “real” code. Suddenly the tests were easier to maintain. While this should not be a surprise to anyone, it wasn’t util this moment that we realized why they had been so hard to maintain in the first place:

Our tests were hard to maintain because we weren’t maintaining them.

Solution: Going forward, we expect the same quality of code (or higher) in the unit tests as we do for our production code. That means

  • We remove duplication
  • We carefully consider method names
  • We create convenience functions for testing features
  • We keep our methods short
  • We code-review our unit tests

Pitfall #2: Tests are lot of work to write.

We found that in order to test even simple things we would have to write lots of code to setup and execute the scenario. Even something simple like “create a new user, and receive welcome email” would turn into 40-50 lines of step by step instructions.

Not only was this a pain to write the 1st time, it became a nightmare to maintain. Little changes would mean re-reading those functions to detect if the test was failing because we broke something, or simply because we had changed something. Once that was discovered, we would then have to update the now out of sync test code.

The solution we actually found may surprise you. We found that writing out our tests in English and then translating each line into 1 line of code naturally created the appropriate levels of abstraction and readability.

For example, let’s consider testing the following scenario:

Who are you receiving the most email from?

  1. Create you – the user
  2. Mike sends you 3 emails
  3. Mary sends you 4 emails
  4. Joan sends you 2 emails
  5. Verify your greatest “emailer” is Mary

This will naturally lead us to write the following test method and helper method:

[TestFixture]
public class AccountTests
{
    private MockMailServer mockMailServer = new MockMailServer();

    [Test]
    public void WhoAreYouReceivingTheMostEmailFrom()
    {
        User you = User.CreateNew( "you" );
        User mike = SendEmailHelper( CreateUser( "mike" ), you, 3 );
        User mary = SendEmailHelper( CreateUser( "mary" ), you, 4 );
        User joan = SendEmailHelper( CreateUser( "joan" ), you, 2 );

        Assert.AreEqual( mary, you.GetGreatestEmailer() );
    }

    private User SendEmailHelper(User from, User to, int quantity)
    {
        for ( int i = 0; i < quantity; i++ )
        {
            EMail mail = new EMail()
            {
                To = to,
                From = from,
                Body = "Sample",
                Subject = "test"
            };

            mail.SetFormat( Formats.Html );
            mockMailServer.Send( mail );
        }
        return from;
     }

Notice that to a programmer, the lines of code in the WhoAreYouReceivingTheMostEmailFrom test are as easy to read as the lines of English were. We were naturally motivated to create the “SendEmailHelper” function because that was required by one-to-one correlation between the lines of English and the lines of test code. However, without that helper, our test would have become an unreadable rat’s nest. This also naturally removes some duplication, increases maintainability, and allows for some reuse of the test convenience functions. This won’t be the only test that requires us to send email; for example, we may want to test that we can find out to whom you sent the most email.

Let’s compare this to how our tests would look if we had just hacked out the scenario:

[Test]
  public void WhoAreYouReceivingTheMostEmailFrom()
  {
      User you = User.CreateNew("you");
      User mike = CreateUser("mike");
      for (int i = 0; i < 3; i++)
      {
          EMail mail = new EMail{To = you,From = mike, Body = "Sample", Subject = "test"};
          mail.SetFormat(Formats.Html);
          mockMailServer.Send(mail);
      }

      User mary = CreateUser("mary");
      for (int i = 0; i < 4; i++)
      {
          EMail mail1 = new EMail{To = you,From = mary,Body = "Sample",Subject = "test"};
          mail1.SetFormat(Formats.Html);
          mockMailServer.Send(mail1);
      }

      User joan = CreateUser("joan");
      for (int i = 0; i < 6; i++)
      {
          EMail mail2 = new EMail {To = you, From = joan,Body = "Sample",Subject = "test"};
          mail2.SetFormat(Formats.Html);
          mockMailServer.Send(mail2);
      }

      Assert.AreEqual(mary, you.GetGreatestEmailer());
      }

Because we wrote the first version in English it’s also easier to detect a mistake. You may have noticed that the second example had a typo at line 31, making Joan the biggest emailer. In general long methods have the disadvantage of obscuring intent. Unfortunately the ‘follow a script’ aspect of testing lends itself to writing long methods. By writing the tests in English and then doing a 1-to-1 conversion to code we can counter this vulnerability.

Write the tests in English before you code them.

Pitfall #3: Adding a new feature breaks a lot of tests that I then need to adjust.

I always dreaded adding new features because I knew it meant the existing tests were going to complain about the changes. It seemed like the tests themselves were resisting change to my system, rather than supporting it. As I made changes and the tests broke, I was always trying to figure out if those changes were “expected” because of the new feature, or unintended bugs I had introduced into my software.

Nowadays, we always prep the system for the new feature. This allows us to isolate ‘expected’ changes from unintended bugs. Furthermore, once we have finished prepping for the new feature, we find it extremely easy to actually add that new feature. Best of all, if the unit test breaks now, we know it’s because of unexpected side effects of our changes.

This ‘prepping’ period falls under the title of ‘refactoring’ and requires the simple rule that during this stage you do not change the behavior, only the implementation. This sounds straight forward and simple, but in practice it requires a great deal of discipline.

Personally, I still find it a challenge to NOT fix a bug discovered during refactoring. I have to force myself to leave it alone and wait until I have finished refactoring before changing (Fixing) this behavior, but this discpline has paid off many, many times.

During this period, the support provided by your unit test suite really shines. Those tests allow me to rework my code with confidence. Afterwards the architecture in place has been custom designed to support the addition of this particular new feature, thus making its implementation quite straightforward. In our experience the ‘prepping’ work tends to actually take more time than we spend adding the actual feature itself, but the total time to implement is much less.

By spliting the work into two phases, we can emphasize the fact that the unit tests are supporting the existing system allowing its architecture to evolve so that extending it does not become increasingly difficult.

Before I would ask myself “How can I add this new feature?” Now I ask “How can I make it so this new feature will be easy to add?”

Prep the system for the new feature first.

Pitfall #4: When I change something a whole bunch of tests break even though I haven’t broken the system.

There are many ways to solve the same problem. In the past, we tended to test a specific implementation of a solution instead of testing that we had a solution. Because we were focused on the specifics of implementation, changes to our solutions kept breaking our tests, even though we still had a valid solution. Moreover, because the tests were closely tied to implementation, rediscovering the intent and separating it in the tests became cumbersome. As we became more proficient at writing tests in English, we were able to create unit tests that described the expected behavior. This conveys a higher level of intent, and made the tests much less brittle.

Let’s look at an example:

[Test]
  public void TestGatewayCallSuccessful()
  {
      var gateway = new Gateway {Mask = "ExampleCode.*"};
      var enviroment = new Dictionary();
      enviroment.Add("path", "ExampleCode.HelloWorld");
      string result = gateway.ExecuteRequest(enviroment);
      Assert.IsTrue(result.Contains("Hello World"));
  }

  [Test]
  public void TestGatewayBlocksInvalidMasks()
  {
      Assert.IsFalse(Gateway.IsValidForMask(
          "Example.*", "ExampleCode.HelloWorld"));

      Assert.IsFalse(Gateway.IsValidForMask(
              "ExampleCode.*.Extras.*", "ExampleCode.HelloWorld"));

      Assert.IsTrue(Gateway.IsValidForMask(
              "ExampleCode.*", "ExampleCode.HelloWorld"));
  }

In this example, we wrote our second test TestGatewayBlocksInvalidMasks so we could easily test a few examples to make sure our implementation was correct. In doing so we exposed a method IsValidForMask, which was an implementation detail and was only made public in order to make testing easy and intentional. We did this because actually executing something to get the failure was much more involved as evidenced by the first test (TestGatewayCallSuccessful).

Let’s take a look at the specific solution we’ve come up with:

public class Gateway : IRunner
  {
      public string Mask { get; set; }

      public String ExecuteRequest(Dictionary environment)
      {
          string path = environment["path"];
          AssertValidClass(path);
          IRunner instance =
                (IRunner)Activator.CreateInstance(Type.GetType(path));

          return instance.ExecuteRequest(environment);
       }

       private void AssertValidClass(string path)
       {
          if (!IsValidForMask(Mask, path))
          {
              throw new Exception(String.Format(
                    "Invalid Path '{0}' for Mask '{1}'", path, Mask));
          }
       }

       internal static bool IsValidForMask(String mask, String path)
       {
           mask = mask.Replace(".", "\\.").Replace("*", ".*");
           Regex regex = new Regex(mask);

           return regex.IsMatch(path);
       }
 }

As we can see, even though we are only creating this gateway once each time a call to ExecuteRequest is made we have to recreate the regex expression (line 31 & 32). It would be nice to do this just once. Let’s take a look at a more efficient solution:

public class Gateway : IRunner
  {
      private string mask;
      private Regex regex;

      public string Mask
      {
         get { return mask; }
         set
         {
            mask = value;
            regex = new Regex( mask.Replace( ".", "\\." ).Replace( "*", ".*" ) );
         }
      }

      public String ExecuteRequest(Dictionary environment)
      {
           string path = environment["path"];
           AssertValidClass(path);
           IRunner instance = (IRunner) Activator.CreateInstance(Type.GetType(path));

           return instance.ExecuteRequest(environment);
       }

       private void AssertValidClass(string path)
       {
           if (!regex.IsMatch(path))
           {
               throw new Exception(String.Format("Invalid Path '{0}' for Mask '{1}'", path, Mask));
           }
       }
 }

Unfortunately this refactoring breaks the first set of tests. Notice that the function we are calling no longer even exists. However, let’s look at what happens if we write our tests for the behavior rather than the implementation:

[TestFixture]
  public class GatewayBehaviorTests
  {
     [Test]
     private void TestGatewayCallSuccessful()
     {
         string result = Run("ExampleCode.*", "ExampleCode.HelloWorld");
         Assert.IsTrue(result.Contains("Hello World"));
      }

      [Test]
      public void TestGatewayBlocksInvalidMasks()
      {
          AssertValidForMask(false, "Example.*", "ExampleCode.HelloWorld");
          AssertValidForMask(false, "ExampleCode.*.Extras.*", "ExampleCode.HelloWorld");
          AssertValidForMask(true, "ExampleCode.*", "ExampleCode.HelloWorld");
      }

      private static String Run(string mask, string path)
      {
          var gateway = new OptimizedGateway {Mask = mask};
          var enviroment = new Dictionary();
          enviroment.Add("path", path);
          string result = gateway.ExecuteRequest(enviroment);
          return result;
      }

      private void AssertValidForMask(
      bool exceptionExpected, string mask, string path)
      {
          Exception found = null;
          try
          {
             Run(mask, path);
          }
          catch (Exception ex)
          {
             found = ex;
          }
          Assert.AreEqual(exceptionExpected, found == null);
       }
}

There are a few things to notice now:

  • This test not only works for the new code, but the old code as well. This is because the behavior has not changed, just the implementation.
  • We have not sacrificed readability or clarity of intent. In fact the first test TestGatewayCallSuccessful has actually gained readability.
  • There is the introduction of helper methods in the unit test. We find that this is a side effect of writing tests for readability and intention.

In the end, we realized there is a particular code smell for this problem.

If a different implementation of a solution requires different tests, you are testing to the wrong level.

Pitfall #5: There are just too many possibilities to test

When we first began unit testing, we felt that we had to test as many inputs as possible because we believed the purpose of the unit tests was to ensure complete quality of our code. What we have learned is that the world is not black and white, and neither is testing. It is not the case that we either have verified code or unverified code. There are levels of protection. In fact there is a level at which you get diminishing returns from new inputs and, surprisingly, that number is often very small.

For example: Imagine the following scenarios :

Scenario 1
You have a method

public int doSomething(int a, int b) {/*...*/})

Does this method work?

Will it blow up if I run it?

On a scale of 1-10 what is your confidence level?

 

Confidence Level 2: In our case, our confidence started out at 2. All we know is that it compiled. Any number of things could be wrong..
Scenario 2
Now, assume you have an invocation of the method

doSomething(2,3);

When you run this, it does not crash although you have no way to check its result.

What’s your confidence level now?

Confidence Level 6:  As soon as it’s been executed, our confidence jumps up to a 6. We know that most bugs come from incorrect wiring, or null pointers, and so on. Now we know it’s not blowing up, but still don’t really know that it’s working
Scenario 3
Now imagine that you have a test

assertEquals(8, doSomething(2,3));

This test passes.

What’s your confidence level now?

 

Confidence Level 8:  Just a single confirmation pulls us all the way up to a confidence level of 8. Notice that we are still just at 1 test case. A few more and we’ll be in the 9’s, but how many more cases would you need to say with absolute confidence that this works? (hint: 2^32 * 2^32).

Tests are like seatbelts: just because they won’t guarantee your survival in all crashes, it doesn’t mean you shouldn’t wear them. Take the extreme case of a motorcycle helmet. You are only protecting a small part of your body, but you are significantly improving the odds of survival if something goes wrong.

A general rule of thumb for the number of cases to tests is “3 is a big number”.

  1. Test the happy path
  2. Test an edge case
  3. Test an error case, if you have one

Start with the happy path. If you still are worried, try an edge case. Wait until a problem presents itself before you test further.

Spend your time where it counts.

One comment

Submit a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s