next_inactive up previous


Guidelines for creating test programs in the CISST-ERC development library

Ofri Sadowsky
Department of Computer Science (NEB B26), Whiting School of Engineering,
Johns Hopkins University 3400 N. Charles st., Baltimore, MD 21218

May 2003


Contents

1 Scope

This document is aimed at defining guidelines for creating test programs in the CISST-ERC development library. Software testing is an inexact science, whose main goal is to foresee the unforeseeable. Basic computability theorems show that it is impossible to design a 100% proof test program. The best we can hope for in the dynamic environment of the ERC development library is to be able to collect over time a test suite for various features of the system. Therefore, this document does not try to teach how to write good test programs, but rather is an organizational document, whose goal is to set rules that will facilitate control over the collection of test programs, extensibility of the collection, and an automation of the testing process. To meet this goal, we first have to analyze the behavior of test programs, set definitions of inputs and outputs, and to classify the test programs to cover sufficient variety. The first part of the document makes this analysis. The second part sets the rules based on the analysis results.


2 Classfying test programs

The ERC development library includes a broad spectrum of features. Among them: 2D and 3D vectors and matrices, general linear algebra, optimization, computational geometry, communications and distributed computing, robot control and sensory systems, computer graphics and image processing, user interface, and possibly more. While each sub-package has its unique testing requirements, and needs a specific set-up protocol for the testing, we can still try to identify features which are common to all test programs. To achieve that, we start by listing the requirements for testing different features of the library. The list is sorted by the features to be tested, and may grow or change in the future. Below, we discuss the requirements for testing each of the above categories, and identify needed resources.

2.1 Object I/O

An important feature of the ERC development library is storing and retrieving object information and configuration data. While the requirement identification and design of serialization and other types of I/O are planned for elsewhere, we can still outline some possible tests for I/O operations. We should also note that having an object I/O interface is a useful feature for the test framework as well. For example, object I/O can be used to import test cases into the framework and to dump the configuration of a failed test for debugging purposes. Therefore, we discuss the tests for this feature set first.

2.1.1 Serialization

We would like to be able to serialize objects in and out of our application. The serialized format can take many shapes. However, the following seems to be the most reliable test for serialization:
  1. Get an object (from an input source, through serialization, hard coded, random, etc.).
  2. Serialize it out.
  3. Deserialize from the same stream.
  4. Compare the result to the original object.
The test may use temporary files to store the serialized object(s), and may use external input to fetch the original object. Note that the test requires an effective method for comparing instances of the same class, and that the deserialized object must be identical to the original object. Also note that a process of serializing an object x, deserializing into object y, then serializing y again and comparing only the serialized output from x and y is an insufficient test. That is because if information is lost during the serialization of x, it is not recovered in y, and we cannot detect this by comparing the serialized y with the serialized x.

2.1.2 Parsing

The term ``parsing'' usually applies to machine processing of data in human readable format. While in essence it is equivalent to serialization, it is more loosely associated with single objects. Parsing always requires external input to be parsed. Of course, one could hard-code the input to the parsing as well, but it seems to miss the point. When dealing with parsing of text files, we should pay attention to the following items:
  1. There is no direct way to verify the correctness of parsing a text file, when it is the sole source of input. To overcome this problem, we can adopt one or more of the following strategies:
    1. Import objects from two files (one is text, and the other may be text or binary) and compare them. This tests the validity of one I/O method against another.
    2. Add checksums at the end of sections in the text file, and use them to verify the correctness of the parsing. The checksums must be optional, so that a human composer can create the text file without worrying about them. Then, we may provide a small utility program to add the checksums to a text file.
  2. Due to the limited precision of both binary and text floating point representation, we may have trouble in restoring identical objects from a text file. This affectes directly the ability to calculate checksums, for example. If we want to use checksum as part of our parsing format, we have to be sure that it is robust to perturbations of low-order bits in a floating-point representation.

2.2 General execution-control issues

Here we list some issues that relate to testing any piece of software, regardless of correctness or dependencies on other packages. In other words, we test if a package is ``well behaved'' regarding the criteria below.

2.2.1 Memory leaks

Our test corpus should include tests for memory leaks, for example, in containers or library objects with dynamic memory allocation. There are existing test tools to detect memory leaks, such as pstack and gmmprof, or the MFC internal classes for memory monitoring. We should find a way to incorporate them into our development environment, preferably in a portable manner. It remains to be decided if all the test programs should run in a memory monitoring environment, or only some should have memory leak detectors invoked. Running all the tests with memory monitoring probably takes longer and may lead to unexpected behavior resulting from possible bugs in the memory monitor rather than in our code. Running only some tests in a monitored environment leads to questions of which tests are monitored and which are not, and how we can invoke the monitor for some tests and not for others. These should be answered better when we learn more about memory monitors.

2.2.2 Time-bounded execution

A possible programming error is writing code that does not terminate. It is not necessarily an infinite loop, but may be the result of an error in the termination conditions, wrong assumptions on convergence, etc. In addition, modifications to an algorithm or a data structure can significantly increase the runtime of the algorithm. We would like to be able to handle such cases in an automatic testing module. It is therefore suggested that each test program should have a runtime specification in the form of estimated average runtime, estimated worst-case runtime, and estimated time to decide it's an infinite loop (some or all of the values may be identical). The test program or the test framework should issue a warning message in the test log for exceeding each of these times, and the test framework should kill the test program if it exceeds the time bound to determine it's an infinite loop. Since running the same program on different machines takes different times, we should use a normalized time unit to measure program runtimes. For example, we can have one program that iterates a simple loop (not empty –- it may get optimized and got rid of by the compiler) $k$ times, and state that this takes $l$ normalized time units. By measuring its runtime and dividing it by $l$, we immediately get the normalized time unit. Then, we may be able to run each test procedure in a ``diagnostic'' mode to provide an estimate of its runtime for future tests. The subject of time-bounded execution raises the issue of providing some query interface to each test program, that enables the user to find which tests are included, run them selectively and in different modes, and so on. This will be discussed with more details later.

2.2.3 Exception handling

It is important to verify that our exception handling mechanisms work. For that, we try to generate artificially exception conditions or invalid input. We also need a methodology of specifying which exceptions are generated during the test. Possibly, the easiest way is to write one test program for each exception case, but that can be tedious. The subject should be thought over more when exception handling scheme is developed. In addition to normal functionality, if we want to ensure that our software is robust, it should also be tested for exception conditions. As we have chosen to use the CppUnit package as our test framework, we can also use its exception testing features. See more about it in this document and in the CppUnit documentation. Additionally, when performing automatic testing, we need to have tools to handle program crashes, record them, and overcome them to continue the test. We would not like a single bug in one of the library classes or test programs to stop an automatic test process.

2.2.4 Concurrency

Concurrency is among the hardest targets for testing. The reason is, a context switch between concurrent processes can generally occur at any time. It is possible to insert breakpoints or controlled preemptions (CPU yield) in strategic locations in the code to control context switches, but I suspect it turns out to be an infinite amount of work. Hope can rise from the fact that in most cases, the cycle of a concurrent thread is short enough in time to be executed many times in a relatively small number of seconds. By probability theory, and especially by Murphy's Theorem, if anything can go wrong in a minute or so of concurrent running, it will. As of now, it seems that the best testing strategy for concurrency is to write normal test programs which involve concurrent execution, let them run for a while, and see if they crash, fails to meet a goal, etc. As of (August 2003) our code is not designed for thread safety, and is not meant to run concurrently. The best we can do to detect concurrency errors is to treat them like any other error, and enforce the strategies listed under ``Time-bounded execution'' and ``Exception handling''.

2.3 Algorithmic correctness and computational precision

We must be able to verify that our computational modules perform according to the specifications. The simplest way to do that is by knowing in advance what is the output of an algorithm for a given input. However, it may not be as simple as that in most cases, as discussed below.

2.3.1 Expected output for a given input

A straightforward example for testing expected output is the following. Take two known vectors, add them, and compare the result to a third vector that was evaluated in advance. The test fails if the addition result is not equal to the expected. The operands and the known result can be hard coded or can be stored in an input file. In any case, they are determined before the actual execution of the test. This test strategy has several drawbacks. Firstly, when our operations involve floating point multiplications, there is an accumulated error. Different computation environments (machines, compilers, etc.) may accumulate the errors in a different way, making it hard to predict the exact expected result. Therefore, in many cases we can only specify the precision of the output within a certain tolerance. Secondly, sometimes the computation of the known value is complicated enough, and cannot be carried out without running the same algorithm. This means, that if we use a computer to evaluate the expected result, we basically let the algorithm test itself, which defeats the idea of testing. On the other hand, carrying out the computation manually may be too tedious and error prone to start with. Thirdly, testing against known inputs lacks coverage of the problem domain. Usually, we cannot even run through the entire domain, and would like to generate a reasonable random sample of it. But then, we do not want to evaluate the results for each and every element of the domain, just to be able to draw random samples. This leads us to a more sophisticated strategy of testing, that verifies internal consistency of implemented methods, and thus indirectly indicates to the correctness of more than one method at the same time. In other words, we verify the correctness of an output indirectly by using other methods and checking the output consistency. We assume that the chance that all the methods demonstrate consistent behavior while being in fact incorrect is small enough. Thus, we can use random inputs and with a high likelihood detect errors in the code. The strategy is as follows.

2.3.2 Independent verification and randomized testing

Independent verification is a sophisticated test, that can test several functions at the same time. For example, evaluate the cross product of two vectors, then verify that the cross product is perpendicular to both and has an appropriate magnitude. Here we test cross product, dot product, vector norm and vector angle. If all the results are consistent, we assume that the functions are evaluated correctly. Another example: solve a linear equation system, then see that the result is a solution by multiplying it by the original matrix. Note that for independent verification we can choose a random input, which usually makes the test more robust. However, attention should be paid to numerical precision issues. In many cases, the examples above included, one cannot simply test for equality of independent results, due to roundoff errors. In such cases, a reasonable error tolerance should be spcified in the test, and the test should fail if the results do not agree with each other within that tolerance. Another aspect to consider is the generation of random input. There is a number of different randomization methods, for example in [Numerical Recipes], and they differ in runtime and randomization quality. Although all these methods are in fact pseudo-randomized, and not randomized in the strong sense of unpredictability, we can, in fact, take advantage of the pseudo-random property. That is because being able to repeat a certain pseudo-random sequence may enable us to reproduce inputs that were detected as error-generating. Therefore, when reporting about a failed test, we may wish to dump out not only the objects for which the error was detected, but also the randomization seeds.

2.3.3 Control groups

In some cases, such as computational learning, we have no direct way to verify that our result is accurate. Instead, we rank the accuracy of the result by using a control group. For example, to evaluate the robustness of a registration algorithm to noisy input, we can select a training group of points, inject random noise, and compute the registration. The result depends on our random noise, and often cannot be predicted with enough precision. However, we can use another group of points –- the control group –- and measure the precision of applying the computed transformation to the contol group by comparing it with application of the originally known transformation. In a test of this kind, the pass/fail criterion is fuzzy, and depends on statistical significance rules, the magnitude of the noise, etc. In general, the decision if the test was successful should be left to a human observer. However, the process of running the test can be automated, and at the very least it can be used to verify successful termination of algorithms, that is, termination with no exceptions.

2.4 Supervised tests

Many of our tests, such as the ones listed below, require a supervised set-up, and cannot be run automatically. For these, a detailed instruction sheet should be attached to the test, so that the test operator can run it smoothly. Some of the tests listed below fall outside the scope of our planned automated test framework. Yet we list them to be sure we don't miss important questions.

2.4.1 Interactive tests

Automated testing of interactive behavior is impossible. The best we can do is simulate input sequences by fixed or randomized input. However, in some cases, and especially with GUI, we can at least check that the interactive modules run without exceptions for certain correct and exceptional input sequences. We can consider using some scripts to generate artificial events in a sequence spcified by an external resource file. This seems to go beyond the scope of this document as of (August 2003), and may be dealt with later.

2.4.2 Distributed objects

There are several difficulties in testing the functionality of distributed objects. One is setting up the environment: do we test them as independent processes on a single machine, or do we actually run them on separate hosts. If we choose the second option, we need a way to remotely launch a remote object from the main test machine. Another problem is testing for exceptions and failures, which requires an even more complicated set-up. A third is the scope of the test. To test algorithmic correctness, we would generally prefer a direct method rather than involving distibuted computation. Therefore, we are left with testing the networking functionality, which simplifies our work, but still requires special set-up. Currently, we do not plan to add distributed objects to the features of our test framework.

2.4.3 Interfacing external devices

External devices increase the complexity of the system by requiring an interface abstraction layer, increasing the number of potential failures of the system, requiring additional set-up for testing, and exposing the system to the unexpected behavior of the physical environment. Special attention should be paid to testing robot interfaces, as they are an essential part of our development. Note that as the device tests require the device being connected to the computer, these tests usually cannot be run unsupervised. These subjects are discussed in more detail next.

2.4.3.1 Abstraction layer tests and diagnostics

Any device abstraction layer should have a standard diagnostics package that's capable of performing elementary tests for the device. These can include: initialization, sending commands, running device-internal diagnostics, and testing simple configurations or instruction sets. The diagnostics can use fixed configurations, which are either hard-coded or stored in input files. Diagnostics should be simple to run and require little more than connecting the external device to the computer, switching it on, and running the test. Therefore, they are likely to fit into our test framework, even if they would not be run automatically. For many external hardware items, a diagnostics package is provided by the vendor. In the simplest case, our own diagnostics can wrap the vendor's. In more sophisticated cases, our diagnostics may involve invoking API commands. Yet, they should be simple and short.

2.4.3.2 Sensory systems

Sensory systems can include trackers, digitizers, encoders, force sensors, cameras, imaging systems, temperature sensors, and any other device that reports to the computer of an external physical event (except for standard input devices, such as keyboard, mouse, etc.). Robots represent a higher level of complexity, and perhaps only robot encoder testing falls into this category. Under normal circumstances, we are not taking responsibility for calibrating sensors, unless the calibration is part of the standard protocol of using them. However, we may still wish to have at least a diagnostics package for the correctness of the sensory input. These may require supervision and should have an appropriate instruction sheet.

2.4.3.3 Robot motion tests

As part of our diagnostics, we may wish to verify that robot motion commands are executed correctly. For example, that a certain positioning command brings the end effector to the desired pose. Motion tests can be more complicated, and test also the dynamics of the robot. These can include velocity control, real-time avoidance of forbidden configurations, responding to exception conditions, and possibly more. The entire subject is left for the developers of the MRC package, and will not be discussed further here.

2.5 Requirements summary

We can now make a list of requirements from our test framework. These will be classified as input requirements, output requirements, and test characteristics.

2.5.1 Input requirements

  1. Object input. We should be able to import objects in text or binary formats from external sources (e.g., files, standard input) into the test program.
  2. Randomization. We need a reliable pseudo-random generator, which when initialized with the same seed yields the same sequence. We may want to record randomization seeds for debugging purposes.
  3. Configuration files. To test external device interfaces, output LOD, etc., we may depend on common configuration files. These files should be located easily by the test program.
  4. Time control. We need to be able to measure the execution time of a program, either externally (from a launching script) or internally. It is good to have each program able to measure its runtime in ``standard'' units, that is, compared to some small common denominator.

2.5.2 Output requirements

  1. Object output. We should be able to export objects from the test program in a way that enables to reproduce an error condition.
  2. Test log. A test program should be able to store output directly to a test log, and flush it immediately. It cannot rely on accumulating failed tests into some buffer like JUnit, for example, does, as the test program may crash and leave the buffer unflushed.
  3. Termination conditions. We need to a way identify that a program terminated successfully, terminated unsuccessfully, crashed, or was killed from the outside, and record these conditions to the test log.

2.5.3 Test characteristics

All test programs should have a fixed set of command line options to provide standard operations. Below is a list of suggested standard operations for a test program.
  1. Test query. The program should list the tests it performs. The list should include the name of the test and optionally a description.
  2. Test selection. The program should let the user select through a command-line option a single test to be executed. The test may be described by its name or a number, and should be consistent with the query output. If no test is selected, the program should run all the tests.
  3. Time diagnostics. The program should be able to estimate the time needed to run each of the tests, in ``standard'' units. The estimation does not have to be accurate, but should provide a reasonable upper bound. With more sophisticated work, we can try to deduce tighter bounds for the execution time of a test, but for now we will suffice with that.
  4. Randomization seed. The program should be initializable with an external randomization seed to reproduce identical random sequences.
In addition, a test program may or may not require specifications of the number of iterations to perform, test-specific inputs, and so on. These cannot, in general, be part of the standard command-line interface. However, when running in time diagnostics mode, the program may ask the user for numbers of iterations, test-specific input, etc., in order to estimate the runtime. Therefore, the program may have the same interaction with the user in diagnostic mode as in test mode.

2.5.4 Additional featurs

The following features are optional for our test suite. We may or may not implement them in different stages of the development.
  1. Tcl/Tk graphical user interface.
  2. Automated scripts to find and run tests.
  3. Implementation of test modules as DLLs/shared objects that can be dynamically loaded into a standard tesing ``shell'' or GUI.

3 Test framework

Based on the analysis in Section [*], we seek for a software framework to satisfy our requirements. We would like to explore existing solutions first, then propose an organizational method and framework for our tests.

3.1 Existing tools for software testing

There is a large number of unit-test frameworks available from Gnu. Examples include: Another large collection of testing frameworks, with design similar to the JUnit package, is available at http://www.xprogramming.com/software.htm. One particular test framework to be discussed is the CppUnit - http://cppunit.sourceforge.net. There is a decision to be made if we want to develop out internal test framework, or to rely on external packages. For any option, there is likely to be a learning curve.

3.2 CppUnit

The CppUnit project has combined and built on previous works of Michael Feathers and Jerome Lacoste. It is released under the GNU Lesser General Public License. The main design and usage features of the CppUnit framework include: Examining these features suggests that we can use the CppUnit as a basis for our test framework, even when it does not provide what we need by default. For example, to satisfy the requirement for time control, we can create a Runner class that times the cases and outputs the times. We should probably be able to solve the other requirements one way or another. I believe it is worthwhile to begin the development of our test framework with CppUnit, and extend it towards our needs. If during the work we discover that CppUnit cannot handle one of our requirements, we can try to build on top of the existing test cases a new framework which does. But for the meanwhile, I believe CppUnit can be a good start.

3.2.1 Programming rules for using CppUnit for CISST

CppUnit provides a test framework, but does not say much about the practical programming rules for employing it. As CppUnit is an adaptation of JUnit, we may be able to learn something from the JUnit documentation. However, JUnit is not very specific about the rules either, as much as I could tell. For the most part, JUnit conventions rely on the use of the Java reflection mechanism and the well-developed (compared to C++) type information mechanism. For what it's worth, the JUnit convention seems to be as follows: We can adopt similar rules in our test framework, with name changes to match our own naming conventions. I would also suggest that every test function (with a signature void TestSomething();) will be a simple wrapper for another function that performs the actual test. The actual test class can look something like the following.
class TriangleTest : public CpUnit::TestCase
{
public:
  void TestArea()
  {
    TestArea(TheTriangleInstance);
  }

  // This is the actual test
  static void TestArea(Triangle & t);

private:
  Triangle TheTriangleInstance;

  CPPUNIT_TEST_SUITE(TriangleTest);
  CPPUNIT_TEST(TestArea);
  CPPUNIT_TEST_SUITE_END();
};
The CPPUNIT_ macros create an interface to the TriangleTest class that enables it to create a list of available tests. An additional step enables us to register TriangleTest in a global test registry. When we come to executing the tests, we do not even have to explicitly instantiate a TriangleTest object, as the CppUnit framework can do it for us.

3.3 Automated tests

This section will most likely be revised, and is presented here as an initial suggestion only. We can use an automated script to run any test program that does not require manual set-up.

3.4 File system organization

The rules below will be revised. Currently, a description of the directory structure for the whole CISST development environment is presented elsewhere. Each test program should have its own project directory, and be configured to build using CMake. The name of the executable should be the same as the name of the project directory. The name of the source file where the main() function is should be xxxMain.cpp (for a C++ file), where xxx is the name of the project directory. The names of alternative main input files should be xxx-##.in, where xxx is the name of the project and ## indicates the sequential number of the input file (use leading zero??). Standard configuration files will be refered to in the config-files directory as specified in the general programming guidelines. The setting of path variables for the local installation can be done using CMake. Any additional data files required for the tests should be in a directory named Data under the project directory. The test program should be accompanied by a brief and meaningful explanation of what is being tested, including an explanation about each input file. The explanation should be in a file named README in the project directory. Names of project directories for test programs should start with Test. If a test program is a diagnostic for a specific class, the project name will be TestClassname, where Classname is the name of the class without any prefix. For example, for the class vctVec3 the name of a test program will be TestVec3.

About this document ...

Guidelines for creating test programs in the CISST-ERC development library

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -nofootnode -show_section_numbers -split 0 TestPlan.tex

The translation was initiated by Ofri Sadowsky on 2003-08-27


next_inactive up previous
Ofri Sadowsky 2003-08-27