How do we measure the progress of testing? When do we release the software? Why do we devote more time and resources for testing a particular module? What is the reliability of software at the time of release? Who is responsible for the selection of a poor test suite? How many faults do we expect during testing? How much time and resources are required to test a software? How do we know the effectiveness of test suite? We may keep on framing such questions without much effort? However, finding answers to such questions are not easy and may require significant amount of effort. Software testing metrics may help us to measure and quantify many things which may find some answers to such important questions.
10.1 Software Metrics
“What cannot be measured, cannot be controlled” is a reality in this world. If we want to control something we should first be able to measure it. Therefore, everything should be measurable. If a thing is not measurable, we should make an effort to make it measurable. The area of measurement is very important in every field and we have mature and establish metrics to quantify various things. However, in software engineering this “area of measurement” is still in its developing stage and may require significant effort to make it mature, scientific and effective.
10.1.1 Measure, Measurement and Metrics
These terms are often used interchangeably. However, we should understand the difference amongst these terms. Pressman explained this clearly as [PRES05]:
“A measure provides a quantitative indication of the extent, amount, dimension, capacity or size of some attributes of a product or process. Measurement is the act of determining a measure. The metric is a quantitative measure of the degree to which a product or process possesses a given attribute”. For example, a measure is the number of failures experienced during testing. Measurement is the way of recording such failures. A software metric may be average number of failures experienced per hour during testing.
Fenton [FENT04] has defined measurement as:
“It is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules”.
The basic issue is that we want to measure every attribute of an entity. We should have established metrics to do so. However, we are in the process of developing metrics for many attributes of various entities used in software engineering.
Software metrics can be defined as [GOOD93]: “The continuous application of measurement based techniques to the software development process and its products to supply meaningful and timely management information, together with the use of those techniques to improve that process and its products.”
Many things are covered in this definition. Software metrics are related to measures which, in turn, involve numbers for quantification, these numbers are used to produce better product and improve its related process. We may like to measure quality attributes such as testability, complexity, reliability, maintainability, efficiency, portability, enhanceability, usability etc for a software. We may also like to measure size, effort, development time and resources for a software.
Software metrics are applicable in all phases of software development life cycle. In software requirements and analysis phase, where output is the SRS document, we may have to estimate the cost, manpower requirement and development time for the software. The customer may like to know cost of the software and development time before signing the contract. As we all know, the SRS document acts as a contract between customer and developer. The readability and effectiveness of SRS document may help to increase the confidence level of the customer and may provide better foundations for designing the product. Some metrics are available for cost and size estimation like COCOMO, Putnam resource allocation model, function point estimation model etc. Some metrics are also available for the SRS document like number of mistakes found during verification, change request frequency, readability etc. In the design phase, we may like to measure stability of a design, coupling amongst modules, cohesion of a module etc. We may also like to measure the amount of data input to a software, processed by the software and also produced by the software. A count of the amount of data input to, processed in, and output from software is called a data structure metric. Many such metrics are available like number of variables, number of operators, number of operands, number of live variables, variable spans, module weakness etc. Some information flow metrics are also popular like FANIN, FAN OUT etc.
Use cases may also be used to design metrics like counting actors, counting use cases, counting number of links etc. Some metrics may also be designed for various applications of websites like number of static web pages, number of dynamic web pages, number of internal page links, word count, number of static and dynamic content objects, time taken to search a web page and retrieve the desired information, similarity of web pages etc. Software metrics have number of applications during implementation phase and after the completion of such a phase. Halstead software size measures are applicable after coding like token count, program length, program volume, program level, difficulty, estimation of time and effort, language level etc. Some complexity measures are also popular like cyclomatic complexity, knot count, feature count etc. Software metrics have found good number of applications during testing. One area is the reliability estimation where popular models are Musa’s basic execution time model and Logarithmic Poisson execution time model. Jelinski Moranda model [JELI72] is also used for the calculation of reliability. Source code coverage metrics are available that calculate the percentage of source code covered during testing. Test suite effectiveness may also be measured. Number of failures experienced per unit of time, number of paths, number of independent paths, number of du paths, percentage of statement coverage, percentage of branch condition covered are also useful software metrics. Maintenance phase may have many metrics like number of faults reported per year, number of requests for changes per year, percentage of source code modified per year, percentage of obsolete source code per year etc.
We may find number of applications of software metrics in every phase of software development life cycle. They provide meaningful and timely information which may help us to take corrective actions as and when required. Effective implementation of metrics may improve the quality of software and may help us to deliver the software in time and within budget.
10.2 Categories of Metrics
There are two broad categories of software metrics namely product metrics and process metrics. Product metrics describe the characteristics of the product such as size, complexity, design features, performance, efficiency, reliability, portability, etc. Process metrics describe the effectiveness and quality of the processes that produce the software product. Examples are effort required in the process, time to produce the product, effectiveness of defect removal during development, number of defects found during testing, maturity of the process [AGGA08].
10.2.1 Product metrics for testing
These metrics provide information about the testing status of a software product. The data for such metrics are also generated during testing and may help us to know the quality of the product. Some of the basic metrics are given as:
(i) Number of failures experienced in a time interval
(ii) Time interval between failures
(iii) Cumulative failures experienced upto a specified time
(iv) Time of failure
(v) Estimated time for testing
(vi) Actual testing time
With these basic metrics, we may find some additional metrics as given below:
(ii) Average time interval between failures
(iii) Maximum and minimum failures experienced in any time interval
(iv) Average number of failures experienced in time intervals
(v) Time remaining to complete the testing.
We may design similar metrics to find the indications about the quality of the product.
10.2.2 Process metrics for testing
These metrics are developed to monitor the progress of testing, status of design and development of test cases and outcome of test cases after execution.
Some of the basic process metrics are given below:
(i) Number of test cases designed
(ii) Number of test cases executed
(iii) Number of test cases passed
(iv) Number of test cases failed
(v) Test case execution time
(vi) Total execution time
(vii) Time spent for the development of a test case
(viii) Total time spent for the development of all test cases
On the basis of above direct measures, we may design following additional metrics which may convert the base metric data into more useful information.
(i) % of test cases executed
(ii) % of test cases passed
(iii) % of test cases failed
(iv) Total actual execution time / total estimated execution time
(v) Average execution time of a test case
These metrics, although simple, may help us to know the progress of testing and may provide meaningful information to the testers and project manager.
An effective test plan may force us to capture data and convert it into useful metrics for process and product both. This document also guides the organization for future projects and may also suggest changes in the existing processes in order to produce a good quality maintainable software product.
10.3 Object Oriented Metrics used in Testing
Object oriented metrics capture many attributes of a software and some of them are relevant in testing. Measuring structural design attributes of a software system, such as coupling, cohesion or complexity, is a promising approach towards early quality assessments. There are several metrics available in the literature to capture the quality of design and source code.
10.3.1 Coupling Metrics
Coupling relations increase complexity, reduce encapsulation, potential reuse, and limit understanding and maintainability. The coupling metrics requires information about attribute usage and method invocations of other classes. These metrics are given in table 10.1. Higher values of coupling metrics indicate that a class under test will require more number of stubs during testing. In addition, each interface will require to be tested thoroughly.
Coupling between Objects. (CBO)
CBO for a class is count of the number of other classes to which it is coupled.
Data Abstraction Coupling (DAC)
Data Abstraction is a technique of creating new data types suited for an application to be programmed.
DAC = number of ADTs defined in a class.
Message Passing Coupling. (MPC)
It counts the number of send statements defined in a class.
Response for a Class (RFC)
It is defined as set of methods that can be potentially executed in response to a message received by an object of that class. It is given by
RFC=|RS|, where RS, the response set of the class, is given by
Information flow-based coupling (ICP)
The number of methods invoked in a class, weighted by the number of parameters of the methods invoked.
Information flow-based inheritance coupling. (IHICP)
Same as ICP, but only counts methods invocations of ancestors of classes.
Information flow-based non-inheritance coupling (NIHICP)
Same as ICP, but only counts methods invocations of classes not related through inheritance.
Count of modules (classes) that call a given class, plus the number of global data elements.
Count of modules (classes) called by a given module plus the number of global data elements altered by the module (class).
Table 10.1: Coupling Metrics
10.3.3 Inheritance Metrics
Inheritance metrics requires information about ancestors and descendants of a class. They also collect information about methods overridden, inherited and added (i.e. neither inherited nor overrided). These metrics are summarized in table 10.3. If a class has more number of children (or sub classes), more amount of testing may be required in testing the methods of that class. More is the depth of inheritance tree, more complex is the design as more number of methods and classes are involved. Thus, we may test all the inherited methods of a class and testing effort well increase accordingly.
Number of Children (NOC)
The NOC is the number of immediate subclasses of a class in a hierarchy.
Depth of Inheritance Tree (DIT)
The depth of a class within the inheritance hierarchy is the maximum number of steps from the class node to the root of the tree and is measured by the number of ancestor classes.
Number of Parents (NOP)
The number of classes that a class directly inherits from (i.e. multiple inheritance).
Number of Descendants (NOD)
The number of subclasses (both direct and indirectly inherited) of a class.
Number of Ancestors (NOA)
The number of superclasses (both direct and indirectly inherited) of a class.
Number of Methods Overridden (NMO)
When a method in a subclass has the same name and type signature as in its superclass, then the method in the superclass is said to be overridden by the method in the subclass.
Number of Methods Inherited (NMI)
The number of methods that a class inherits from its super (ancestor) class.
Number of Methods Added (NMA)
The number of new methods added in a class (neither inherited, nor overriding).
Table 10.3: Inheritance Metrics
10.3.4 Size Metrics
Size metrics indicate the length of a class in terms of lines of source code and methods used in the class. These metrics are given in table 10.4. If a class has more number of methods with greater complexity, then more number of test cases will be required to test that class. When a class with more number of methods with greater complexity is inherited, it will require more rigorous testing. Similarly, a class with more number of public methods will require thorough testing of public methods as they may be used by other classes.
Number of Attributes per Class (NA)
It counts the total number of attributes defined in a class.
Number of Methods per Class (NM)
It counts number of methods defined in a class.
Weighted Methods per Class (WMC)
The WMC is a count of sum of complexities of all methods in a class. Consider a class K1, with methods M1,…….. Mn that are defined in the class. Let C1,……….Cn be the complexity of the methods.
Number of public methods (PM)
It counts number of public methods defined in a class.
Number of non-public methods (NPM)
It counts number of private methods defined in a class.
Lines Of Code (LOC)
It counts the lines in the source code.
Table 10.4: Size Metrics
10.4 What should we measure during testing?
We should measure every thing (if possible) which we want to control and which may help us to find answers to the questions given in the beginning of this chapter. Test metrics may help us to measure the current performance of any project. The collected data may become historical data for future projects. This data is very important because in the absence of historical data, all estimates are just the guesses. Hence, it is essential to record the key information about the current projects. Test metrics may become an important indicator of the effectiveness and efficiency of a software testing process and may also identify risky areas that may need more testing.
We may measure many things during testing with respect to time and some of them are given as:
1) Time required to run a test case.
2) Total time required to run a test suite.
3) Time available for testing
4) Time interval between failures
5) Cumulative failures experienced upto a given time
6) Time of failure
7) Failures experienced in a time interval
A test case requires some time for its execution. A measurement of this time may help to estimate the total time required to execute a test suite. This is the simplest metric and may estimate the testing effort. We may calculate the time available for testing at any point in time during testing, if we know the total allotted time for testing. Generally unit of time is seconds, minutes or hours, per test case. Total testing time may be defined in terms of hours. Time needed to execute a planned test suite may also be defined in terms of hours.
When we test a software, we experience failures. These failures may be recorded in different ways like time of failure, time interval between failures, cumulative failures experienced upto given time and failures experienced in a time interval. Consider the table 10.5 and table 10.6 where time based failure specification and failure based failure specification are given:
Sr. No. of failure occurrences
Failure time measured in minutes
Failure intervals in minutes
Table 10.5: Time based failure specification
Time in minutes
Failures in interval of 20 minutes
Table 10.6: Failure based failure specification
1) Time taken to experience ‘n’ failures
2) Number of failures in a particular time interval
3) Total number of failures experienced after a specified time
4) Maximum / minimum number of failures experienced in any regular time interval.
10.4.2 Quality of source code
We may know the quality of the delivered source code after reasonable time of release using the following formula:
Where WDB: Number of weighted defects found before release
WDA: Number of weighted defects found after release
The weight for each defect is defined on the basis of defect severity and removal cost. A severity is assigned to each defect by testers based on how important or serious is the defect. A lower value of this metric indicates the less number of error detection or less serious error detection.
We may also calculate the number of defects per execution test case. This may also be used as an indicator of source code quality as the source code progressed through the series of test activities [STEP03].
10.4.3 Source Code Coverage
We may like to execute every statement of a program at least once before its release to the customer. Hence, percentage of source code coverage may be calculated as:
The higher value of this metric given confidence about the effectiveness of a test suite. We should write additional test cases to cover the uncovered portions of the source code.
10.4.4 Test Case Defect Density
This metric may help us to know the efficiency and effectiveness of our test cases.
Where Failed test case: A test case that when executed, produced an undesired output.
Passed test case: A test case that when executed, produced a desired output
Higher value of this metric indicates that the test cases are effective and efficient because they are able to detect more number of defects.
10.4.5 Review Efficiency
Review efficiency is a metric that gives insight on the quality of review process carried out during verification.
Higher the value of this metric, better is the review efficiency.
10.5 Software Quality Attributes Prediction Models
Software quality is dependent on many attributes like reliability, maintainability, fault proneness, testability, complexity, etc. Number of models are available for the prediction of one or more such attributes of quality. These models are especially beneficial for large-scale systems, where testing experts need to focus their attention and resources to problem areas in the system under development.
10.5.1 Reliability Models
Many reliability models for software are available where emphasis is on failures rather than faults. We experience failures during execution of any program. A fault in the program may lead to failure(s) depending upon the input(s) given to a program with the purpose of executing it. Hence, time of failure and time between failures may help us to find reliability of software. As we all know, software reliability is the probability of failure free operation of software in a given time under specified conditions. Generally, we consider the calendar time. We may like to know the probability that a given software will not fail in one month time or one week time and so on. However, most of the available models are based on execution time. The execution time is the time for which the computer actually executes the program. Reliability models based on execution time normally give better results than those based on calendar time. In many cases, we have a mapping table that converts execution time to calendar time for the purpose of reliability studies. In order to differentiate both the timings, execution time is represented byand calendar time by t.
Most of the reliability models are applicable at system testing level. Whenever software fails, we note the time of failure and also try to locate and correct the fault that caused the failure. During system testing, software may not fail at regular intervals and may also not follow a particular pattern. The variation in time between successive failures may be described in terms of following functions:
μ () : average number of failures upto time
λ () : average number of failures per unit time at time and is known as failure intensity function.
It is expected that the reliability of a program increases due to fault detection and correction over time and hence the failure intensity decreases accordingly.
(i) Basic Execution Time Model
This is one of the popular model of software reliability assessment and was developed by J.D. MUSA [MUSA79] in 1979. As the name indicates, it is based on execution time (). The basic assumption is that failures may occur according to a non-homogeneous poisson process (NHPP) during testing. Many examples may be given for real world events where poisson processes are used. Few examples are given as:
* Number of users using a website in a given period of time.
* Number of persons requesting for railway tickets in a given period of time
* Number of e-mails expected in a given period of time.
The failures during testing represents a non-homogeneous process, and failure intensity decreases as a function of time. J.D. Musa assumed that the decrease in failure intensity as a function of the number of failures observed, is constant and is given as:
Where : Initial failure intensity at the start of testing.
: Total number of failures experienced upto infinite time
: Number of failures experienced upto a given point in time.
Musa [MUSA79] has also given the relationship between failure intensity (λ) and the mean failures experienced (μ) and is given in 10.1.
If we take the first derivative of equation given above, we get the slope of the failure intensity as given below
The negative sign shows that there is a negative slope indicating a decrementing trend in failure intensity.
This model also assumes a uniform failure pattern meaning thereby equal probability of failures due to various faults. The relationship between execution time () and mean failures experienced (μ) is given in 10.2
The derivation of the relationship of 10.2 may be obtained as:
The failure intensity as a function of time is given in 10.3.
This relationship is useful for calculating present failure intensity at any given value of execution time. We may find this relationship
Two additional equations are given to calculate additional failures required to be experienced to reach a failure intensity objective (λF) and additional time required to reach the objective. These equations are given as: Where âˆ†μ: Expected number of additional failures to be experienced to reach failure intensity objective.
: Additional time required to reach the failure intensity objective.
: Present failure intensity
: Failure intensity objective. and are very interesting metrics to know the additional time and additional failures required to achieve a failure intensity objective.
Example 10.1: A program will experience 100 failures in infinite time. It has now experienced 50 failures. The initial failure intensity is 10 failures/hour. Use the basic execution time model for the following:
(i) Find the present failure intensity.
(ii) Calculate the decrement of failure intensity per failure.
(iii) Determine the failure experienced and failure intensity after 10 and 50 hours of execution.
(iv) Find the additional failures and additional execution time needed to reach the failure intensity objective of 2 failures/hour.
(a) Present failure intensity can be calculated using the following equation:
(b) Decrement of failure intensity per failure can be calculated using the following:
(c) Failures experienced and failure intensity after 10 and 50 hours of execution can be calculated as:
(i) After 10 hours of execution
(ii) After 50 hours of execution
(d) and with failure intensity objective of 2 failures/hour
(ii) Logarithmic Poisson Execution time model
With a slight modification in the failure intensity function, Musa presented logarithmic poisson execution time model. The failure intensity function is given as:
Where θ: Failure intensity decay parameter which represents the relative change of failure intensity per failure experienced.
The slope of failure intensity is given as:
The expected number of failures for this model is always infinite at infinite time. The relation for mean failures experienced is given as:
The expression for failure intensity with respect to time is given as:
The relationship for additional number of failures and additional execution time are given as:
When execution time is more, the logarithmic poisson model may give large values of failure intensity than the basic model.
Example 10.2: The initial failure intensity of a program is 10 failures/hour. The program has experienced 50 failures. The failure intensity decay parameter is 0.01/failure. Use the logarithmic poisson execution time model for the following:
(a) Find present failure intensity.
(b) Calculate the decrement of failure intensity per failure.
(c) Determine the failure experienced and failure intensity after 10 and 50 hours of execution.
(d) Find the additional failures and additional and failure execution time needed to reach the failure intensity objective of 2 failures/hour.
(a) Present failure intensity can be calculated as:
= 50 failures
= 50 failures
= 6.06 failures/hour
(b) Decrement of failure intensity per failure can be calculated as:
(c) Failure experienced and failure intensity after 10 and 50 hours of execution can be calculated as:
(i) After 10 hours of execution
(ii) After 50 hours of execution
(d) and with failure intensity objective of 2 failures/hour
(iii) The Jelinski – Moranda Model
The Jelinski – Moranda model [JELI72] is the earliest and simples software reliability model. It proposed a failure intensity function in the form of
Where = Constant of proportionality
N = total number of errors present
i = number of errors found by time interval ti.
This model assumes that all failures have the same failure rate. It means that failure rate is a step function and there will be an improvement in reliability after fixing an error. Hence, every failure contributes equally to the overall reliability. Here, failure intensity is directly proportional to the number of errors remaining in a software.
Once we know the value of failure intensity function using any reliability model, we may calculate reliability using the equation given below:
Where λ is the failure intensity and t is the operating time. Lower the failure intensity and higher is the reliability and vice versa.
Example 10.3: A program may experience 200 failures in infinite time of testing. It has experienced 100 failures. Use Jelinski-Moranda model to calculate failure intensity after the experience of 150 failures?
Total expected number of failures (N) = 200
Failures experienced (i) =100
Constant of proportionality () = 0.02
= 2.02 failures/hour
After 150 failures
= 0.02 (200-150+1)
Failure intensity will decrease with every additional failure experience.
10.5.2 An example of fault prediction model in practice
It is clear that software metrics can be used to capture the quality of object oriented design and code. These metrics provide ways to evaluate the quality of software and their use in earlier phases of software development can help organizations in assessing a large software development quickly, at a low cost.
To achieve help for planning and executing testing by focusing resources on the fault prone parts of the design and code, the model used to predict faulty classes should be used. The fault prediction model can also be used to identify classes that are prone to have severe faults. One can use this model with respect to high severity of faults to focus the testing on those parts of the system that are likely to cause serious failures. In this section, we describe models used to find relationship between object oriented metrics and fault proneness, and how such models can be of great help in planning and executing testing activities [MALH09, SING10].
In order to perform the analysis we used public domain KC1 NASA data set [NASA04] The data set is available on www.mdp.ivv.nasa.gov. The 145