Anne-Marie Sassen
Software Engineering Division - Sema Group Sae, Spain
and
Radu Marinescu
"Politehnica" University of Timisoara, Romania
Abstract
Audit-Reengineer is a product based on Concerto2/Audit, SEMA´s tool for quality assessment, and on the results of ESPRIT project 21975 FAMOOS. In this article we will describe the specific functionality Audit-Reengineer contains for reengineering, with a special emphasis on the results we obtained in using object-oriented metrics for problem detection in legacy systems.
The increasing reliance on information technology for consumer and industrial goods imposes new requirements on software flexibility. The major trends in customer requirements are customer specific modifications and software versions (custom made systems), much faster response to change requests and new requirements (evolution), and the ability to easily modify the software based on the immediate customer needs (tailoring).
Object oriented programming has often been promoted as the most effective approach to build inherently flexible software, and was quickly adopted by industry in the recent years. There are already applications consisting of millions of lines of code developed in several hundred man-years. While the benefits of object oriented technology are widely recognized, its utilization does not necessarily result in general, adaptable families of systems. These large applications are often suffering from improper use of object oriented techniques (like inheritance) and the lack of object oriented methods being geared towards the construction of families of systems instead of building single applications. The result is a new generation of inflexible legacy systems.
In order to better meet customer requirements, the industrial users need to re-engineer these monolithic object oriented legacy systems to flexible frameworks and libraries of small, understandable software components. Such frameworks will allow a greater flexibility to varying needs of different categories of customers, as well as an easier integration of new requirements:
2. The FAMOOS Approach to Re-engineering
Within the FAMOOS project an approach to re-engineer object oriented legacy systems to frameworks has been developed. This approach is described in the FAMOOS Handbook [Ciupke 99]. Because of the complexity of managing systems consisting of millions of lines of code, besides this handbook, an adequate tool is needed to support the FAMOOS methodology. This tool is Audit-Reengineer.
Within FAMOOS the following reengineering life cycle model is defined:
3. Functionality of Audit-Reengineer
Audit-Reengineer is based upon the tool Concerto2/Audit [Audit 98]. It provides:
After a careful evaluation of existing metrics for detecting problems of flexibility problems within the FAMOOS project [Mar 97] we selected the following metrics for Audit-Reengineer. We evaluated the hypotheses formed on earlier case studies with two new case studies. Case A was a C++ system developed with the Microsoft Visual C++ programming environment, in conjunction with the Microsoft Foundation Classes for Windows NT, and has 51 classes, and 54.104 lines of code. Case B was a C++ system developed with the Borland C++ compiler for Windows'95 using the Object Windows Library with 81 classes, and 16.581 lines of code.
4.1. Weighted Method Count (WMC) [Chid 94]. The WMC of a class is defined as the sum of the complexity of each method of that class. The way complexity is defined for an implementation of this metric is a decision that can taken be in different ways. We have followed the suggestion of Li and Henry [LiHe 93] by using McCabe's Cyclomatic Complexity Metric, defined as "the number of linearly independent paths and therefore, the minimum number of paths that should be tested" [McCa 76]. Another possibility is to assign a unitary complexity to each method. In that case the WMC of a class is equal to the number of methods of that class.
We developed the following hypotheses about this metric:
Hypothesis 1. The outliers are the central (major) classes
in the project, being the main control classes.
Hypothesis 2. The outliers are more error prone and harder
to maintain.
Hypothesis 3. Outliers have few or no children; classes
with high WMC values having many descendants are critical from the maintenance
point of view and their redesign should be considered.
Hypothesis 4. Classes with large numbers of methods are
likely to be more application specific, limiting the possibility of reuse.
| Site Name | Minimum | Average | Maximum |
| Case A | 1 | 41 | 260 |
| Case B | 1 | 26 | 179 |
Hypotheses Validation
Average Method-Complexity of a Class
We also analyzed the classes that gather much complexity in a few methods.
These are classes with a high ratio of WMC (based on McCabe's cyclomatic
complexity) and WMC (based on unitary complexity of each method) [Mari97].
We expected that these methods could be split, distributing in this way
the complexity towards more methods. This has also the advantage of increasing
the potential reusability of the class. Speaking to the developers
of Case A we found out that the few methods of the outlier classes
contained huge selector structures ("switch-case" in C/C++). The
designers admitted that although these huge methods do neither affect the
maintainability nor the understandability of the class, it would be a wise
decision to split them in more methods. This observation encourages us
to look for a future validation of this observation on other projects.
In Case B we also found the classes with a few number of very large
methods this way. In this case, classes could not be split, but complexity
could have been distributed over more methods.
4.2 Data Abstraction Coupling (DAC) [LiHe 93]. A class can be viewed as an implementation of an abstract data type (ADT). A variable declared within a class X may have a type of ADT which is another class definition, causing a particular type of coupling between the X and the other class, since X can access the properties of the ADT class. The DAC of a class is defined as the number of ADT's defined in a class, and hence it measures coupling complexity.
We developed the following hypotheses about this metric.
Hypothesis 1. Outlier
classes are mainly the central control classes of the system. DAC
outliers that are not central classes are undesired and their redesigned
should be considered.
Hypothesis 2. The
outliers are harder to maintain, as they as they will often be due to change
because of the classes they are depending on.
| Site name | Minimum | Average | Maximum |
| Case A | 0 | 9.24 | 37 |
| Case B | 0 | 4.03 |
|
Hypotheses Validation
4.3 Change Dependency Between Classes (CDBC) [Hitz 1996]. This metric determines the potential amount of follow-up work to be done in a client class (CC) when the server class (SC) is being modified, by counting the number of methods in the CC that might need to be changed because of a change in SC.
The metric is defined not on a single class, but on the a client class
- server class pair . This allows us to define hypotheses from different
perspectives:
Hypothesis 1. Client
perspective:
- CC with high (and many couples) are in the "heart" of the design (similar
to WMC)
- CC outliers are the very hard to maintain as they strongly depend on
many other classes
- CC outliers are more error prone and harder to understand.
Hypothesis 2. Server perspective:
classes that are mostly used by other classes should be stable. If not,
they should be consolidated.
Hypothesis 3. Pairs of classes
with mutual strong dependency (high CDBC) are not desirable.
Hypotheses Validation
4.4 Tight Class Cohesion (TCC) [Biem 95] is defined as the relative number of directly connected methods. Two methods are directly connected if they use one common data member of the class. In the previous definition a data member is considered to be "of the class" if and only if it is declared in that class, meaning that inherited data members are not counted for this metric. This can be justified by the fact that this metric is a measure of cohesion, while the use of inherited data members is a matter of coupling. The outliers for the TCC metric are considered the classes with the lowest TCC values.
We have formulated following hypotheses about this metric:
Hypothesis 1. Classes with
low TCC are not cohesive and might be split in two or more classes.
Hypothesis 2. Classes with
low TCC incorporate more than one functionality.
| Site Name | Minimum | Average | Maximum |
| Case A | 0.02 | 0.28 | 0.67 |
| Case B | 0.07 | 0.45 | 1 |
Hypotheses Validation
Conclusive Observation
Comparing the results for TCC of Case A with Case B and
the case studies analyzed in the past, both the average and the maximum
values are very low. Our assumption is that the reason for this is not
the poor quality of the structural design, but the strong altering impact
that the large number of "false positive" classes cause on the statistic
values. This leads us to the conclusion, that in order to interpret in
an efficient manner the results of this metric, a filtering of the "false
positives" is necessary.
4.5 Reuse of Ancestors (RA) [Mar 98]) The RA metric, measures
the
real
code reuse of an ancestor class A within a derived
class C. RA is calculated as the sum of the reuses of class A
in all methods of class C, divided by the total numbers of methods
defined in class C. In [Mar 98] two different ways to calculate
the reuse at the method level were proposed:
i. the reuse of class A in method m is 1, if m uses at least one
member of A, and it is 1 otherwise. (RA-Unitary)
ii. the reuse of class A in method m is the relative number of members
from A that are used in method m. (RA-Percentage)
We have defined the following hypotheses about RA:
Hypothesis 1. Outliers
are classes that reuse a lot of code from ancestor classes and while reuse
is a form of coupling the maintenance effort for these classes will be
high.
Hypothesis 2. Very
high values are a sign of misusing subclassing.
Hypothesis 3. Low
average values RA is a sign of a poor OO design.
| Site Name | Minimum | Average | Maximum |
| Case A | 0.02 | 0.43 | 0.60 |
| Case B | 0.10 | 0.45 | 0.90 |
Observations
In Case A taking a closer look at the ancestor class for the
class pairs that have a RA value (RA-Unitary) higher than 0.3 we
observe that a few number of ancestors are reused in the derived classes.
Speaking to the designers, we found out that the ancestor together with
the set of derived classes that use it build a semantically related class
cluster. This could be an important information for the model capture.
phase of the re-engineering process.
A second observation was made when calculating for a class the reuse of all its ancestors (based on RA-Percentage). The majority of the outliers where "light classes" i.e. classes with few methods. That shows us that most of the use of ancestor member takes place in a few number of methods. This is normal, as classes that were derived in order to reuse the code of their ancestors are only refining the ancestor class. Classes that do not conform to this rule, are not good.
Hypotheses Validation
Concerning our hypotheses, in Case B we could not confirm
them. The outliers indeed re-use a lot of code of their parents. All outliers
were a specific kind of form that could be printer. It was in fact no case
of misuse in sub classing, but just using the advantages of object oriented
programming. The outliers were all leaf classes.
In this article we have described Audit-Reengineer, a tool for reengineering of large object oriented legacy system, trying to assess the suitability of the metrics included in the tool from the perspective of re-enineering. We validated the tool on two medium sized case studies. The general conclusion is that the metrics included in the tool work very well for model capture. For problem detection however, we found less evidence of their suitability. A likely reason for this is that the case studies were well designed and that they are no legacy systems. In fact they are maintained until now without any specific difficulty. Therefore, our next step in the evaluation of Audit-Reengineer will be to apply it to real legacy systems.
If you have comments or suggestions, email me at anne-marie.sassen@sema.es