Programmer Test in eXtreme Programming

1       Introduction

The only business rule that never changes is “rules are always in the process of changing”. However, traditional software development practices, such as the waterfall approach requires the business requirements to be frozen at certain point of time in the software development life cycle, with the understanding that any subsequence change of requirement would incur higher cost and prolong delivery schedule.

Agile development methodologies are largely seen as the solution to the problem for many software developers. eXtreme Programming (XP) is largely seen as the most prominent member of the family of agile development methodologies.  

With my first hand working experience in XP project and decades of programming experience, I did some research, reading and thinking on the testing aspect of XP practices. This article presents these aspects to you with the hope that they might be helpful to your software development project or at least trigger some additional thinking and another perspective.

In chapter 2, I establish two principles of the testing: Testing is used to find errors instead of proving the nonexistence of error. Complete testing aimed to prove the correctness of program is not possible.

In chapter 3, I transform the definition of Unit Test to Programmer Test.

In chapter 4, I discuss various attributes of Programmer Test. the topics include whether we should or should not use stub objects, write the test first before we write the production code, write tests for everything.

In chapter 5, I present a modified programming process for XP programming, this modified process can be used for no XP programming as well. In this modified process, I list some guidelines and principles for programmer test and categorize test cases into 3 categories: Driver Test Case, Assumption Test Case and Bug Detector. Furthermore, I discuss how moving the test case writing from beginning of the programming activity into the programming activity itself.

Finally, I conclude the article with chapter 6 suggesting that this process promises to produce high quality test cases at lower cost; it also promises positive attitude among programmers, because it does not aim 100% correctness when we know it is not possible.        

2       Principles of Testing

Glenford J. Myers in his book The Art of Software Testing [1] said:

“Testing is the process of executing a program with the intent of finding errors.”

on the same topic, Myers listed down a few examples of wrong definition on testing: “Testing is the process of demonstrating that errors are not present," "The purpose of testing is to show that a program performs its intended functions correctly," and "Testing is the process of establishing confidence that a program does what it is supposed to do."

This was established based on the assumption that the program contains errors (a valid assumption for almost any commercially valuable program) and we want to find as many of these errors as possible and remove them.

In the same book, Glenford J. Myers continued on his definition of test:

Another way of reinforcing the proper definition of testing is to analyze the use of the words "successful" and "unsuccessful," in particular their use by project managers in categorizing the results of test cases. Most project managers call a test case that did not find an error a "successful test run," whereas a test that discovers a new error is usually called "unsuccessful." This is often a sign that the wrong definition of testing is being used, for the word "successful" denotes an achievement and the word "unsuccessful" denotes something undesirable or disappointing. However, since a test case that does not find an error is largely a waste of time and money, the descriptor "successful" seems inappropriate.

We could possibly make the final statement after we are done with the testing would more appropriately be phrased as “the program passed all test cases that we currently have”. I emphasize that this statement is different from “after the testing, we conclude that the program does what it was designed to do without defect”. Why can we not say so? 

Bill Hetzel said in his book; The Complete Guide to Software Testing [2]Complete Testing Is Not Possible”. To further explain this testing principle, he said:

Many programmers seem to believe that they can and do test their programs "thoroughly." When asking about testing practices, I frequently hear expressions such as the following: "I'll stop testing when I am sure it works"; "We'll implement the system as soon as all the errors are corrected." Such expressions assume that testing can be "finished," that one can be certain that all defects are removed, or that one can become totally confident of success. This is an incorrect view. We cannot hope to achieve complete testing, and, as we shall see, the reasons are both practical limitations and theoretical impossibility.

3       Defects vs. Bugs and Customer Test vs. Programmer Test


In the Book titled “Refactoring: Improving the Design of Existing Code”, [3], Martin Fowler described Unit Test as

The Tests I am talking about is Unit Test, I write them to improve my productivity as programmer… Unit tests are highly localized, each test class works within a single package. It tests the interfaces to other package, but beyond that it assumes the rest just works.

From this description, we can almost give unit test another name, Programmer Test, Personally, I prefer to use the term of Programmer Test, as everyone knows who the programmers, (these who write programs, like you and me) while Unit Test may cause confusion as sometimes we might have different definition on Unit, how big is a unit? Can it be a method, a class or a package? It could be either one, in certain circumstance.


The term of Programmer Test was also introduced by James W. Newkirk and Alexei A. Vorontsov in their book Test-Driven Development in Microsoft .NET (5)

Programmer Tests

If we are talking about the programmers writing the code, it is useful to think of these tests as focused on technology, so you can refer to them as technology facing or programmer tests. Some people refer to this type of test as a unit test; we are specifically not calling it that because unit testing's purpose is much different from what we are doing in TDD(Test-Driven Development). Because the audience for these tests is programmers, it is critically important that the tests be written in a language that the programmers understand. (It's even better if they are in the same language as the production code.) If the tests are in the same language, the programmer doesn't have to change paradigms to write tests.

Customer Tests

Tests that customers use to specify the functionality they need and how it should work can be referred to as business facing or customer tests. These types of tests are often referred to as acceptance tests or functional tests. As in the case of programmer tests, the tests need to be written in a language that the customer understands. The goal is to empower the customer to be able to write tests. The customer tests in this book are written using a tool named Framework for Integrated Test.
From this we can see it is more appropriate to use the term Programmer Test instead of Unit Test in the world of XP Programming

William E. Perry in his book titled “Effective Methods for Software Testing” gave the definition for Defect as follows:

A defect is a variance from a desired product attribute.  Testers look for defects. There are two categories of defects:
1.         Defect from product specifications. The product built varies from the product specified. For example, the specifications may say that A is to be added to B to produce C. If the algorithm in the built product varies from that specification, it is considered to be defective.        


2.         Variance from customer/user expectation. This variance is something that the user wanted that is not in the built product, but also was not specified to be included in the built product. The missing piece may be a specification or requirement, or the method by which the requirement was implemented may be unsatisfactory.

 Defects generally fall into one of the following three categories:    

 1.        Wrong. The specifications have been implemented incorrectly. This defect is a variance from customer/user specification.   

 2.        Missing. A specified or wanted requirement is not in the built product. This can be a variance from specification, an indication that the specification was not implemented, or a requirement of the customer identified during or after the product was built.       

3.         Extra. A requirement incorporated into the product that was not specified. This is always a variance from specifications, but may be an attribute desired by the user of the product. However, it is considered a defect.


This is a very comprehensive definition indeed; however, it is properly not as easy to realize from a programmers perspective. After all, as a programmer we write programs based on what we think we were told. If we mishear, misinterpret or misunderstand what the user requirements are, we would never be able to write the program to meet the user requirement no matter how we as programmers test our program.  To give us the right perspective as programmer, there is a need to further categorize Defect. Every programmer is an expert on debugging, and since we are talking about programmer test, the word of a Bug naturally comes to my mind. Let us differentiate bugs from defects.

In my simple definition, defects represent the discrepancies between what the user wants the software to do and what the software does. Bugs represent the discrepancies between what the programmer wants the software to do and what the software does. From this definition, one can conclude that all bugs are defects but not all defects are bugs.

Since we are taking about programmer testing, we are definitely talking about bugs rather than defects. From now on we will focus on programmer testing, bug detection and bug fixing, after all, that’s what we do as programmers.

4       Discussion on Principles of Programmer Testing

4.1    How testing is conducted- usage of stubs and mocks  

Before we start on how testing is conducted, it is important to define a clear goal for programmer testing. Glenford J. Myers definition for testing may not be comprehensive enough to cover all aspects of all different types of testing, but it accurately described the goal of the Programmer Testing as: Spend as little effort as possible to identify as many bugs as possible in our program and fix them.

This may sound pretty simple. However, it is not as trivial as it may appear at first. You may respond to a request to fix a bug in a program or you may get a response like this when you ask your fellow programmer to fix a bug,

“It is not my program.”

This is because in the classic programming world programmers claim ownership of the program they wrote. If I am not the author of that program, (it can be as big as a package, as small as a method) I am not going to own it, hence I am not responsible for fixing its bugs.

As an author of one Program Unit, I want to do my job in the best possible way, and I want to reduce or eliminate the impact this Program Unit has on other program units. Essentially, I would like to draw a line between my Program Unit and other Program Units. This type of impact includes (not is not limited to) availability of the other Program Units, the quality of other Program Units. This is called Isolated Test. In his book Test-Driven Development: By Example, [7] Kent Beck introduced Isolated Test as following:

(Without isolated test), a huge stack of paper (failing test cases) didn’t usually mean a huge list of problems (bugs). More often it meant one test had broken early, leaving the system in an unpredictable state of next test.    


Two aspects of Beck’s statement are worth mentioning. Let me address them in detail:

First of all, we want the execution of test cases to be independent of each other. It should not happen that when we run Test Case A, it passes, when we run the Test case B, it passes as well, but if we run the Test Case A and then run the Test case B, Test Case B will fail, because Test Case A left some data in the system which causes the Test Case B to fail.

Secondly, when we have Test Case B that makes a call to a method of a class, and we have another test case, Test Case A that tests the method of the same class. If Test Case A fails, most likely Test Case B will fail as well. This is what we call interdependence among test cases.

Some programmers do not like this type of interdependence among test cases, while other programmers like it. Both sides hold their reasons for their beliefs.
Kent Beck looks belonging to the first group. In the same book, he said,

…The main lesson I took was that the test should be able to ignore one another completely, if I had one test broken, I wanted one problem, if I had two test broken, I wanted two problems. (Page 125)

This is a perfectly valid point, and it is every programmer’s hope. However, our program is interdependent in nature. Take your program as an example, when you compile your newly written program for first time, most likely you will see a long list of compilation errors. This was especially true in the past when we did not have good IDEs which supports background syntax check. What does a programmer do when faced with a long list of compiling errors? We look at the first few of them and fix them, then compile the program again; anyway, it is quite fast to do so.

We do the same with automatic testing. We fix a few bugs that we think are not the errors caused by other bugs and run the test again; anyway, and it is quite fast to do so.

If you want to achieve what Kent back said, you create stubs of other Program Units. Then use these stubs in your unit test. These program units are available and written by your fellow team members or even by yourself.

Such considerations are helpful in the classic programming world. To do so, we programmers invented the term of Unit Test, i.e. to test simply a program unit. As a programmer I do not want the bugs in other Program Units to affect my test result. Rather I want to see the test result of my program unit succeed - wrong again! According to Glenford J. Myers, if a test case does not catch a bug, most likely the test cases were badly designed rather than the program was perfectly written.

Two economic considerations need to be taken into consideration in praxis:

  1. As a programmer, you have to spend more time to write your code. You need to write the code used for production release; you need to write your stubs. In General, the more code you write, the more bugs you will create. After all, usually programmers produce programs with bugs. I have seen my fellow programmers busy creating stub classes and debugging them, and some time the release production code was functioning just fine, the bugs were in the stubs. I have also seen programmers spend 10 minutes writing a piece of product code, (i.e. a call to a stored procedure to get a data set; very simple code no more then 10 lines; the stored procedure also very simple, just one select statement). In order to create the stub, the programmer was required to build a disconnected Data Set. He spent at least 20 minutes creating the stub, and used it for his testing. I am not even addressing the fact that there might be bugs in the stub class and he need to test and debug them out.

  1. You prevent bugs from being detected, if these bugs are interface related. Also, you prevent valid changes in supporting components / methods from notifying you, should these changes not reach you via human communications.

In short, we spend more effort and detect fewer bugs.

However, in the world of XP Programming, all program units are collectively owned by every programmer in the team. There is neither your program unit, nor my program unit, they are all ours, and we are all responsible for detecting bugs and fixing them. With that mind-set shift, Unit Tests become irrelevant. Stubs are not necessary in most cases. If I can use one test case to detect bugs in component A as well as bugs in component B, I must be happier to be able to use one test case to detect bugs in both component A and component B than detecting bugs in only one of the components. As the goal of Programmer Test is to spend the least amount of effort as possible to identify as many bugs as possible in our program and fix them.

Now let us take a moment to look at how bugs are created.  We programmers are all very smart, at least we think we are, otherwise we would not become programmers. But why year by year we continually produce software products full of silly bugs?

I do not have large amount of statistical data on this matter (I will continue research on the matter), but I would like to share with you what I observed in over 20 years of my programming experience:   

1.                            I have had very few situations where by I need to write a program from ground zero, regardless of programming language. Most of my programs are created by copying something from somewhere.
2.                            Most of the bugs I have created are the result of copying something from somewhere and forgetting to change something.
3.                            Another major source of bug creation is that programmers do not think thoroughly and miss a piece of logic when a program is written.


As programmers, we always test our code, and testing is not new to us at all. Coding, testing, and debugging are part of our life.

Thought there are lots of tools for code generation and lots of tools for helping us with debugging, we still write and debug most of our code manually. Personally I believe it will remain the same for near future.

How about testing? If I only need to execute a test case once, I do not mind testing it manually. However software needs programmer test, user test, integration test and finally regression test. We need to run the test case many times. Especially in the XP Programming world, the same test case needs to be executed whenever we refractor our code.  

As always, if something can be done by computers, we would like to let computers do the work for us. First, computers are fast. Computers are able to run 100 test cases less than a minute. Secondly, computers are consistent, and there is no difference between the first execution of a test case and the 100th.  Thus we decide to write programs for these test cases to let computers run the test cases for us.

When Kent Beck talks about the relationship between stress and tests (Automatic Tests), he says:

This is a positive feedback, the more stress you are, the less testing you will do. The less testing you do the more error you will make the more stress you will feel. … With Automatic Tests, when I start to feel stress, I run the tests; Tests are the programmer’s stone, transmuting fear to feel boredom. “No, I didn’t break anything, The test are still green” the more stress I feel, the more I run the tests, Running the tests immediately gives me a good feeling and reduced the number of errors I make, which further reduces the stress I feel.

I agree with most of what Kent Beck said here, however, I would like to go a step further by introducing the activity of bug fixing. Let me get started by saying we programmers always make bugs. It might be hard to accept for some very talented programmers, but it is a fact we must face. The next principle I would like to establish is: Passing Tests will not ensure the correctness of program. Passing all your tests only means there are no known /anticipated bugs in your program.

The more tests we create, the more bugs will be discovered. If you go and fix all discovered bugs, (I bet you will surely do!) the less stress you will feel, because you make your program better than it was before. However, since the act of testing is just to identify bugs, which will not make you feel less stressed, while fixing the bugs will reduce stress! Testing is the process of transforming what we call unknown risks into known risks, and the activity of fixing bugs is the process of eliminating risks. That will reduce the stress level!

Our tests being all green may not be a good thing all the time, as we assume there are bugs in our programs. Try to recall when the last time you responded to someone with “No, I didn’t break anything, tests are still green”, what happened before you said that? I imagine properly something went wrong and someone came to ask you with the question like “did you do it?” or after you refactor some segment of the code.

“All green” may have different meanings in different scenarios. When something went wrong with a program, our test may still “all green”. It means we have at least one unknown bug, which is very worrisome; when you are done with your refactoring work for a fraction of your system that means your refactoring work did not create known /anticipated bugs, which is a good thing to see. 

But here’s the more important question: How do we measure the quality of our test cases?  One could write a huge number of test cases, but if these test cases are poorly designed, they would detect few bugs, and there would be of little value for the program under test. The quality of the test cases consists of two aspects. First, how many bugs have been detected by executing these test cases. Second, how precise it is when these bugs are detected. One could design a test case in the UI level, and it could detect a bug in the Data Access layer. That is good and bad, it is good in the sense that, we know there is a bug in our program, (it is better than not knowing), it is bad in the sense that we do not know where it is, we need to add a few more test case to figure out where the bug is located (which component, which class, which method, and which line).

4.2    Which comes first, test or code?


When should you write your tests? Before or after you write your code that is to be tested? Kent Beck, in his book Test-Driven Development: By Example (Addison-Wesley Professional, 2003), defines test-driven development using the following rules:

§  Never write a single line of code unless you have a failing automated test.
§  Eliminate duplication



In his book titled “Extreme Programming Adventures in C#” [6] Ron Jeffries says:

The XP discipline that Chet and I follow is that when we find a defect, we write a test that shows the defect and then fix it. The reasoning is that if our tests were good, there would be no defects.

It appears to me that Ron Jeffries writes code first. Once a defect(bug) is found, he writes a test to automatically detect the defect(bug) and then fix it. Finally, he runs the automatic test to show the defect (bug) is fixed.

Please pay attention to his book title, it is titled as “Extreme Programming Adventures in C#” (which is not Test-Driven Development).

I personally lean towards Ron Jeffries’ approach. As we have discovered in the previous sections, Complete Testing Is Not Possible. It is impossible for us to write a test to guarantee the correctness of the code we are going to write. Even if it is possible to do so, the effort spent in writing that test would be much more than the effort we spent in writing the code itself. 

From my decades of programming experience, I have come to realize that no matter how good a test case is, if you let me know the detail of the test case design, I can almost always put a bug into the code but still pass the test. That is because testing is instance based while programming is rule based. For example, when we are asked to program a method for Addition, it is better if we are not told how a tester will test it, so that we will make the best effort to ensure that it will work for any case. After the method is completed, we sit back and think: since we made a method of Addition, it should work for all inputs. What about passing 1 and 200, will be give back 201? This is our first test case. Since we randomly chose 1 and 200 for our test case, we will be able to confidently say, the method works with all positive integers. This statement may not be true if we write the test first for the reason I stated above. That is why we want to write the code before we write the test or even better let someone else to write the test case.

Following the XP principle, “Write the code when you need it”, I prefer to write the test when I need it, instead of trying to think about what could be wrong, and write the test for what could be wrong. As a matter of fact, anything can be wrong.  I find Ron Jeffries’ approach more practical. “Let’s deal with it when it happens” rather than “what if it happens”.

4.3    What to test and what not to test?

There are two schools of theory on what to test and what not to test.  Kent says in his book Test-Driven Development: By Example:

§  Never write a single line of code unless you have a failing automated test.
§  Eliminate duplication

Put it in a crystal clear way: with Test-Driven programming, you will write automatic test cases for everything you write, no matter whether it is a method or a property; no matter whether it is private, public or protected; and no matter how huge the code segment is, 5 lines or 500 lines. 

In most cases, your code base for test cases will be larger than the code base for production release code. In order to write the test case for private members, your test case fixture class needs to be in the same component / project.  It is not easy to strip test code when you make your production release. I have not seen suggestions or advices on how to strip off these test codes when it goes to production release in any XP books.

However, I figured out that there are some very fine difference between Test driven programming and Extreme Programming. In his book, Extreme Programming Explained [9] when Kent talks about testing strategy, he says:

You should test things that might break. If code is so simple that it can’t possibly break, and you measure that the code in question doesn’t actually break in practice, then you shouldn’t write a test for it. If I told you to test absolutely everything, pretty soon you would realize that most of the tests you were writing were valueless, and if you were at all like me, you would stop writing them. “this testing stuff is for the birds”.

Testing is a bet. The bet pays off when your expectations are violated. One way a test can pay off is when a test works that you didn’t expect to work. Then you better go find out why it works, because the code is smarter than you are. Another way a test can pay off is when a test breaks when you expected it to work. In either case, you learn something. And software development is learning. The more you learn, the better you develop.

So if you could, you would only write these tests that pay off. Since you can’t know which tests would pay off (if you did then you would already know and you wouldn’t be learning anything), you write tests that might pay off. As you test, you reflect on which kinds of tests tend to pay off and which don’t and you write more of the ones that do pay off, and fewer of the ones that don’t. (P117)


Which is in line with Martin Fowler’s thought. In his book “Refactoring Improving the design of existing code” [3] Martin gave the following description when he talked about testing:    

The style I follow is to look at all the things the class should do and test one of them for any condition that might cause the class to fail. This is not the same as “test every public method.” Which some programmers advocate. Testing should be risk driven; remember, you are trying to find the bugs now or in the future. So I don’t test accessors that just read and write a field, because they are so simple, I am not likely to find a bug there.
This is important because trying to write too many tests usually leads to not writing enough. I’ve often read books on testing, and my reaction has been to shy away from the mountain of stuff I have to do to test. This is counter-productivity, because it makes you think that to test you have to do a lot of work. You get many benefits from testing even if you do only a little testing. The key is to test the areas that you are most worried about going wrong. That way you get the most benefit for your testing effort.   (P 97)

  
Many XP programmers understand that you test everything, while Kent says “You should test things that might break”. This is important for all XP programmers to clearly understand that you do NOT test everything, otherwise you would write test case forever.

Although there are some fine difference between Test Driven Programming and XP Programming, both insist on writing the test cases before you write the code, (if indeed you decide to write test case for the code you are going to write), and aim to use the test cases to prove correctness of the code you are going to write.

With the understanding that correctness proving is not possible, for better efficiency and for keeping programmer motivated, I would like to set the goal of test case writing more realistic: to find more than 80% of the bugs before we step into manual testing and debugging. We will write test cases together with testing (manual) and debugging process aim to find the rest of the bugs in program. In this way, we avoid writing test cases throughout the programming / debugging process rather than the beginning of the programming process.

First we set the goal of test writing as finding bugs effectively. To effectively write test cases, you write the code first, and then you write your test cases. You move on until you find it is not effective to continue writing test cases. This does not pose stress to our programmers, as we did not aim to achieve correctness of the program by writing these test cases. It is very possible that 40% of the bugs are still undiscovered in the program we wrote, but in terms of time, it is very likely that we only spent less than 20% of what we would spend if we would aim to proven the correctness of the program and we have the opportunity to catch them along the way in a more effective way. The test cases I am talking about here are called Driver Test Case. You use this type of test case to test your code, methods, or properties, before they are consumed by the consumers.

When a program unit (methods or properties) is consumed, you write some test cases for the program unit to assert some assumptions on how the program unit should function. We call this type of the test case Assumption Test Case.  Realistically speaking, we could catch more than 20% of the bugs in the program and these test cases are very useful when some modification is needed at later stage of the project, as it is consumer oriented. When a method is consumed by 3 different client program calls for different purposes, we write assumption tests for the different purposes of when the method is consumed. At a later stage, if we change the method for one of the purposes, and as the result of the change, it no longer serves other 2 purposes; our assumption test for 2 other purposes will most likely notify us. This greatly reduces the risk of the change when it happens.  

Throughout the programming development process, we test, (manual testing) and debug.,  When we find a bug, we write a test case to identify the bug and then fix it, and then we run the test case again to prove the bug is fixed. We call this type of test case Bug Detector Test.  In this process, we aim to find the rest of the bugs in the program.

4.4    What do we do when a text case broken?

The immediate reaction towards a broken test could be either (a) fix the code that caused the test case that has been broken or (b) change the test case so that it will not break any more. However, this may not be the right solution towards a broken test at all time; it all depends on what went wrong. In general, the last thing you will do is to fix the code in the test case. Because, first of all, test cases are not production codes- fixing them will not add value to your production code except that it makes you feel better to see that all tests are passed.  Secondly, a test case broken is like a ringing alarm- by turning it off will not fix the cause of the alarm. We fix the text case only when we conclude everything is fine or the problem in production code is fixed.  One should investigate the root cause of the broken test, and then take appropriate action towards the broken test case.  Let’s have a closer look at all possible scenarios and suggested actions.

1.                  There is a bug in concerned the program unit.
Action: fix the bug in the program unit and re-run the test.

2.                  There is a bug in the test case design.
Action: Fix the bug in the test case and re-run the tests (we must make sure the production code is just fine, and the only problem is in the test case before we make changes in the test case)

3.                  A change has been made to the production code based on the new requirement. However the new changed logic has not been implemented in the test case yet.
Action: Make necessary changes in up steam program units to make sure the new requirement is implemented fully in the production code and then change your test case to reflect the new requirement and re-run the test case.

4.                  A change has been made to method (Method A) based on the new requirement for the Program Unit 1. However Method A is also used (invoked) by another program unit (Program Unit 2), the assumption test case (Test Case A) for Program Unit 2 is broken because of the changes made in Method A for Program Unit 1. This is illustrated in following figure:


Action: This is a relatively complex case, but it happens all the time. When it happens, the first question we want to ask ourselves is: Can Method-A be used for Program Unit 2 any more? Sometimes we can make it still useable for Program Unit 2  by making some changes in Module-A, sometimes Method-A can no longer be used b y Program Unit 2, and we got to write another method for Program Unit 2. One should investigate the situation and make a judgment call on the question. Once we have the answer for this question, we will know what action to be taken for this broken test case. 
Scenario 1: Make changes in Method A so that it can be used by both Program Unit 1 and Program Unit 2, make changes in Program Unit 1 / Program Unit 2 in the way they call Method A. change Test A as well as other test cases to reflect the changes we just made, and re-run the tests.
Scenario 2: Add another method to the class where Method A belongs to, which will be used by Program Unit 2. In the process of working on the new method, a driver test case (driver test cases) should be developed. Change the program Unit 2 so that it calls the new method instead of Method-A. Detach Test Case A from Method A and attached it to the new method, and make necessary changes in Test Case 1 and then re-run the tests.

5       Modified Programming process in XP

Before we dive into the details on the implementation for programmer testing, it may be helpful to look at some guidelines.

5.1    General Guidelines

1.                            Do not modify your system architecture for the convenience of automated testing
2.                            Do not modify production code for the purpose of automated testing
3.                            Do not compile test code into production release
4.                            Do not try to fix the bug before you have an automated test which can detect the bug.

Remember: The goal of testing is to spend the least amount of effort possible to identify as many bugs as possible in our program and fix them

1.                            Create a Fixture class for each class in each of your components in your solution.
2.                            When a bug is identified (by you or your fellow programmers), our rules of engagement requires us to write an automatic test case, to show the existence of the bug first and later to verify that we have fixed it later
3.                            Use the same language to write the automated test case as you write your program.
4.                            Create mock-up classes only when necessary.
5.                            Create stub classes when the real class is not available due to project progress or environmental availability and replace them when the real classes become available.
6.                            Write automatic test cases in the respective test classes where the code modification will be carried out.
7.                            If you are working on SQL stored procedure, write your automated test cases using SQL statement, and run with the schedule job.[1]
8.                            Use as much product code as possible in your automated test cases.

5.2    Programming Process

The following proposed process assumes that you, as a XP programmer, are given the task to write a feature which crosses Data Access Component, Business Component and some User Interface layer.

When you need to add a new class in any component, you will add the Fixture Class in the same component at the same time. If a new class is to be named “AccountEvent”, the Fixture class is to be named “AccountEventFixture”.. If you are using the NUnit suite for your programmer test, your code will look similar to what is shown below:

[TestFixture]
public class AccountEventFixture
{
   [TestFixtureSetUp]
   public void FixtureSetUpMethod()
   {
      //TODO: Do something here
   }
   [TestFixtureTearDown]
   public void FixtureTearDownMethod()
   {
      //close the connection to the database
   }
   [SetUp]
   public void TestCaseSetupMethod()
   {
   } 
   [TearDown]
   public void TestCaseTearDownMethod()
   {
   }
   [Test]
   public void CreateAccountingEvent()
   {
      //load one record using the open database connection
   }
   [Test]
   public void SaveAccountingEvent()
   {
   }
}

Here, CreateAccountingEvent and SaveAccountingEvent are 2 test cases.  The execution of this code fragment will be as following:

 FixtureSetUpMethod()
TestCaseSetupMethod()
CreateAccountingEvent()
TestCaseTearDownMethod()

TestCaseSetupMethod()
SaveAccountingEvent ()
TestCaseTearDownMethod()
FixtureTearDownMethod

This article is not intended to spend time on how to use NUnit. If you need more information on NUnit, you can get a copy of the book [5]. In this book there is a whole chapter on NUnit test tool. If you need to get the NUnit software, you can download it by visiting www.nunit.org.

For your information, Microsoft Visual Studio .Net 2005, implements the NUnit suite. It goes even a step further: after you complete coding your class /method, you can ask Visual Studio to generate the Fixture class and methods for you.

Once you have your classes created, it is time to code your methods, properties for the feature you are working on. When a new member (method, property) is needed, you will write the shell first for the new member and then continue to work on the logic for the member.

When you are done coding with the member, by your judgment, you may or may not write an automated test case for it. This is a driver to see your code in action during Active Compiling. The normal compiling (I call it static compiling) detects syntax errors, but it does not detect reference errors for late binding; it does not detect constant  value related errors in making calls to methods and properties (wrong stored procedure used, wrong database table column names referenced are typical errors in this category). Active Compiling is designed to detect these errors before the testers see them. This is to make your members executable at a minimum. If your judgment leads you to the decision to create an automated test case for it, you will do it now.

In the object oriented programming world, it is almost certain that you have more than one component in your solution; more than one class in each component; more than one member in each class. It is also almost certain that one specific member of a class (method or property) is being referenced at multiple places for different purposes.   (By the way, this is the beauty of OO programming; no duplicated code, business logic is encapsulated, and abstracted into classes and members)

In the project life cycle, it is common that when we modify the logic of a method of a class for one purpose, it breaks the assumption when the same method is used for another purpose.  How nice if the test cases can notify us when this happens!  This is what you want to do to gain this benefit:

While you are coding you realize you need to make a call to a member of another class. If the member is relatively complex, and it could be referenced for other purpose or has to be referenced for other purpose, you write a few (one or two) tests to assert your assumptions with the understanding that if these tests fail, your program will not function as expected. With these test cases, you will be notified through out the project life cycle when these tests fail. I am sure you will take some actions on it.  Possible solutions include modifying your code so that it does not call the method, (by calling something else), or modifying the implementation of the method so that the assumptions you made on the method still stand (making sure it does not break the assumptions of others).

This is where the testing starts (manual testing). More often than not, you will find bugs in your code. If you were in the classic programming world, you would quickly fix the bug and test it again (manual). When you are certain that the bug is fixed, you move on.

In the XP programming world, we do things slightly differently. In his book titled “Extreme Programming Adventures in C#” [6] Ron Jeffries said:

The XP discipline that Chet and I follow is that when we find a defect, we write a test that shows the defect and then fix it. The reasoning is that if our tests were good, there would be no defects.

To make it even more clearly, I would like to re-phrase it as follows:

1.                  We write our code as we did before
2.                 We write drivers (tests) for Active Compiling for these considerably large members. (Driver  Test Case)
3.                 When we make calls to other members which could be used for multiple purposes and they are relatively large, we write test cases to state our assumptions (Assumption Test)
4.                 In the process of debugging / testing, when we find a bug, we write an automated test case in its corresponding fixture class to show the existence of the bug. If our automated test case is written correctly, the test should fail. Then we start to fix the bug. When we think the bug is fixed, we run the test again. If it passes the test, it indicates we have fixed the bug in question and we move on. (Bug Detector Test)
5.                 When a change made to the production code that caused a test to broke, we take appropriate action on product code first and then fix the test case code when necessary.

6       Conclusion

Programmer test is one of the major activities for any programmer, as we always test our code before we declare the completion of the programming task. During the life time of the system, changes are unavoidable.

During the development phase, whether you use XP methodology or water fall methodologies like RUP, changes are unavoidable. Steve McConnell in his book “Code Complete Version 2” said:

Just as the more you work with the project, the better you understand it, the more they (customer) work with it, the better they understand it. The development process helps customers better understand their own needs, and this is a major source of requirements changes. A plan to follow the requirements rigidly is actually a plan not to response to your customer.   (P40)


In addition to these types of requirement changes, due to the changes of the business process, newly developed business opportunities lead to the changes in requirement during the course of the development process and the post production phase are also expected. For these types of the changes, it is also true that “A plan to follow the requirements rigidly is actually a plan not to response to your customer”.

Once a change is needed to be done within the system, it will break the quality/cost /time-to-market balance. When I use the term of quality here, I do not only mean correctness of the system, instead I refer to following quality aspects of the system:

·         Correctness
·         Readability
·         Maintainability
·         Extendibility

Depending on the situation of each individual project, the project manager makes the call on the quality/cost/time-to-market balance. More often than not, readability, maintainability and extendibility give in to the cost and time to market. XP insists on quality in all situations. It is no doubt that XP projects produce higher quality of the system, but it is not necessarily cost effective in all development / support projects, nor feasible in all business scenarios.    

Regardless what methodology you are using, once any change is made to the system, regression tests are highly recommended especially at the later stage of the system’s life time, (later phase of the development stage, and post production release phase). That is when automated test cases that you built for the system during the development process pay off. These test cases can be further developed into regression test plans. In fact, I highly recommend you to do so. Hence, regardless what methodology you are using, automated test cases help you produce high quality software without the high cost of testing and the programmer test process produces these automated test cases in the most effective manner.

With the proposed process, automated test cases are developed throughout the project life cycle, unlike classic water-fall process, where the test cases are developed during the testing phase and unlike the Test driven process, where the test cases are developed upfront.

With this process, we write different tests in various stage of the coding effort for different purposes. We write drivers for Active Compiling when we write our members; we write Assumption Test when we use the members and we write Bug Detectors when we debug the members. Programmers are more motivated to write test cases; managers are happier as he or she sees programmers spending more time writing production code. The customers are also happier as the test cases developed through out the project life cycle can be used as regression test and impact analysis.

This process promises to produce high quality test cases at lower cost; it also promises positive attitudes among programmers, because it does not aim 100% correctness when we know that this is impossible to achieve.         

7       Reference

1.                  The Art of Software Testing (Glenford J. Myers  published by John Wiley & Sons   1979)( Chapter 2: The Psychology and Economics of Program Testing)
2.                  The Complete Guide to Software Testing, Second Edition  ( Bill Hetzel  published  by John Wiley & Sons © 1998 (Chapter 2 - Principles of Testing)
3.                  Refactoring: Improving the Design of Existing Code (Martin Fowler Published by Addison Wesley Professional 1999)(chapter 4 Building Tests)
4.                  Effective Methods for Software Testing (William E. Perry   Published by John Wiley & Sons © 2000)
5.                  Test-Driven Development in Microsoft .NET (James W. Newkirk and Alexei A. Vorontsov published by Microsoft Press 2004)( Chapter 1: Test-Driven Development Practices)
6.                  Extreme Programming Adventures in C# (Ron Jeffries published by Microsoft Press © 2004 )
7.                  Test-Driven Development: By Example (Kent Beck,  published by Addison-Wesley Professional, 2003)
8.                  Code Complete V2 (Steve McConnell 2004, published by Microsoft Press)
9.                  Extreme Programming Explained (Kent Beck, Addison-Wesley Professional, 2003)


[1] There is an open source tool kit call tsqlunit, which implemented Nunit test framework in TSQL platform. For detail information, visit http://tsqlunit.sourceforge.net/tsqlunit_cookbook.htm for documentation and http://sourceforge.net/projects/tsqlunit for downloading the tool kit.


Note: this article was authored in 2004, when I was first time working in XP programming. I published the article within  Avanade as an IP.

No comments:

Post a Comment