Test data generation

Test data generation, an important part of software testing, is the process of creating a set of data for testing the adequacy of new or revised software applications. It may be the actual data that has been taken from previous operations or artificial data created for this purpose. Test Data Generation is seen to be a complex problem and though a lot of solutions have come forth most of them are limited to toy programs. The use of dynamic memory allocation in most of the code written in industry is the most severe problem that the Test Data Generators face as the usage of the software then becomes highly unpredictable, due to this it becomes harder to anticipate the paths that the program could take making it nearly impossible for the Test Data Generators to generate exhaustive Test Data. However, in the past decade significant progress has been made in tackling this problem better by the use of genetic algorithms and other analysis algorithms. Moreover, Software Testing is an important part of the Software Development Life Cycle and is basically labor-intensive. It also accounts for nearly third of the cost of the system development. In this view the problem of generating quality test data quickly, efficiently and accurately is seen to be important.

A program P could be considered as a function, P:S → R, where S is the set of all possible inputs and R the set of all possible outputs. An input variable of function P is mapped to an input parameter of P. P(x) denotes execution of program for certain input x.

A Control Flow Graph of a program P is a directed graph G = (N, E, s, e) consisting of a set of nodes N and a set of edges E = {(n, m)|n, m ∈ N } connecting the nodes.

Each node denotes a basic block which itself is a sequence of instructions. It is important to note that in every basic block the control enters through the entry node and leaves at the end without stopping or branching except at the end. Basically, a block is always executed as a whole. The entry and exit nodes are two special nodes denoted by s and e respectively.

An edge in a control flow graph represents possible transfer of control. All edges have associated with them a condition or a branch predicate. The branch predicate might be the empty predicate which is always true. In order to traverse the edge the condition of the edge must hold. If a node has more than one outgoing edge the node is a condition and the edges are called branches.

A Test Data Generator follows the following steps

The basis of the Generator is simple. The path selector identifies the paths. Once a set of test paths is determined the test generator derives input data for every path that results in the execution of the selected path. Essentially, our aim is to find an input data set that will traverse the path chosen by the path selector. This is done in two steps:

...
Wikipedia