Informatica 27 (2QQ3) 425-432 425 Empirical Assessment of Methods for Software Size Estimation Aleš Živkovič, Marjan Heričko and Tomaž Kralj University of Maribor, Faculty of Electrical Engineering and Computer Science, Institute of Informatics Smetanova 17, SI-2000 Maribor ales.zivkovic@uni-mb. si, http://lisa.uni-mb. si Keywords: Software metrics, Function Points, Software Size Estimation, Empirical Analysis Received: July 20, 2003 In the software industry, many projects fail due to both the misjudgment of a project's size and faulty estimates correlated to this elementary metric. Several methods for software size estimation are present. The Function Points Analysis (FPA) method, however, is most frequently put into practice. After Albrecht introduced the FPA method, several variations evolved. All methods share the same fundamental idea, but differ in procedural steps and metric units. A descriptive approach is usually used for method comparison. To avoid the weaknesses of a descriptive approach, a mathematical model is defined and used for theoretical comparison. The complexity of the mapping functions prevent detailed comparisons -- consequently only general characteristics become evident. Characteristics exposed with a formalization of the rules were further studied in different test scenarios using historical data from past projects. Empirical results showed some limitations of the mapping function and anomalies in the data set used. The possible reasons for deviations in the data set were also analyzed. 1 Introduction Software size estimation is a crucial element in a project manager's decision-making process, with regard to the project's duration, budget and resources. In the past, different methods were developed. Albrecht introduced the function point analysis method in 1979 [1], since then it has been the target of many scientific studies [4, 5, 6]. Some modifications have also been made resulting in new methods like Feature points, Full Function Points, Function Weight, Function Bang, Mk II Function Points Analysis, COSMIC-FFP and NESMA. A comparison of different methods, based on verbal descriptions, lack the formalism needed to understand and compare them. In this paper, a mathematical foundation for describing the methods is established first, and then three popular methods are mapped into the universal form and compared. To compare the mapping functions, the empirical method is used. The paper is divided into four sections. In the first section, the methods for software size estimation are briefly presented. The subsequent section introduces the formal model for representing software-sizing methods. The third section describes test scenarios and presents results. The conclusion and plans for future work can be found in the last section. 1.1 Function Points The idea behind function points is quite simple [2]. Every information system processes some data that can be stored in the application database or is gained from external applications. Four operations are performed on data records: create, read, update and delete. Besides that, information systems use several query functions for data retrieval and report construction. Each record consists of several fields of basic data types or another record that can be further decomposed. The FPA method quantifies: the number of fields in each record, the distinct operations performed on these records, and the number of these operations that are necessary to perform a business function. The sum over all business functions, multiplied with some empirically determined weights, represents unadjusted function points. The final calculation is made using a value adjustment factor (VAF) that measures system complexity. Figure 1 shows an overview of the tasks performed within the scope of the FPA method. Define ILFs (Internal Logical Files) Identity data functions Define EIFs (External Interface Files) Calculate FP identify transactional functions Define EIs (External Input) Determine VAF (Value Adjustment Factor) <> <> Define EQs (External Inquiry) Define EOs (External Output) Figure 1: Business Use Case diagram for FPA method 426 Informatica 27 (2003) 425-432 A. Zivkovic et al. 1.2 COSMIC-FFP The COSMIC-FFP method [10] reached standardization in 2003 as ISO/IEC 19761 and is the only method in accordance with ISO/IEC TR 14143 [7]. Its approach to size measurement is different from the original FPA, since data elements do not contribute directly to the size. The focus of interest is on data movement, which is defined by units of measure called Cosmic functional size units (Cfsu). In [10] the conversion factor for function points is given based on a sample application portfolio of 14 applications from two different systems. In general, the conversion factor is close to one, when comparing unadjusted function points. Methods distinguish between four different data movements (entry, exit, read and write), and the sum of their size represents the size of the system measured. Beside raw measurement rules, the method clearly defines its applicability in different circumstances (e.g. software domain, project phase). Its popularity among practitioners is growing. 1.3 Mark II FPA In 1988, Charles Symons developed a variation of the FPA method [3] adding several new steps into the measurement process. Additional steps are bound into calculating the effort, productivity and influence in the technical complexity of a specific solution. A functional size itself is calculated as the weighted sum over all logical transactions of the input data element types (N), data entity types referenced (Ne) and output data element types (No). For the weights, the industry average is used with values Wi=0.58, We=1.66 and Wo=0.26. Compared to the original FPA method, the major difference is that MK II FPA is a continuous measure with linear characteristics. Therefore MK II FPA produces increasingly higher size estimates for projects with more than 400 function points. The primary domain for MK II FPA is business information systems. If applied to other domains, special attention has to be given to components with complex algorithms, since sizing rules do not take into account their contribution. For use with real-time systems, additional guidelines may be necessary [3]. 2 FPC formal model All methods for software size estimation lack formal foundations in their origin descriptions. There were some attempts [8, 9] in the past to add formalism to functional size measurement. Fetcke's model is applicable to different methods since it introduces an additional level of abstraction. The approach proposed by Diab et al. is designated to COSMIC-FPP and has a specific purpose. In our research, the model defined by Fetcke is used as a basis and further refined by the definition of a mapping function. 2.1 Generalized representation According to measurement theory, every measurement can be represented as a function that maps empirical objects into numerical. The FPA method defines a function that maps a software system into a number. That number represents the size of the system. Since the FPA method is technologically independent, it introduces its own concept for representing a software system. The abstraction of a software system is data oriented and has two steps. 1. The software documentation is transformed into elements defined by the method. 2. Method elements are mapped into a numerical value representing the size of the system expressed in function points. The procedure is presented in Figure 2 as a UML activity diagram. It shows a specific example of where Software Requirements Specifications (SRS) serve as an input to the FPA elements identification process. Elements are identified according to rules, and the outcome is a data-oriented abstraction of the software system. The second activity represents the mapping function. Several tables are used for the transformation of a separate element count into function points. The final result is the number of function points. I SRS : Documentation Identifcation of FPA elements Map FPA elements to size FPA : Rule Data abstraction of the system '-if FPA : Element FP : Number FPA : Table Figure 2: Data abstraction steps for FPA method 2.2 Data-oriented abstraction Different methods enumerated in the introduction use different names for data abstraction elements; the rules for element identification are different, mapping functions also differ. However, similarities exist that can be described by the following core concepts: • The user concept covers the interaction between a user and the system. • The application concept represents the whole system as an object of the measurement. • The transaction concept is the logical representation of the system's functionality. Transaction is the smallest independent unit of interest. • The data concept deals with the subject of change within the system. The data element is the smallest unit of user observation. • The type concept simplifies data handling via the abstraction of individual data elements. On a higher level of abstraction, an application is represented with data and transactional types. The data EMPIRICAL ASSESSMENT OF METHODS FOR. Informatica 27 (2003) 425-432 427 type is a set of data elements handled within the system. The transactional type is a sequence of logical activities. Fetcke defined seven classes of logical activities [8]: • Entry activity. The user enters data into the application. Exit activity. Data is outputted to the user. Control activity. The user enters control information data. Confirm activity. Confirmation data is outputted to the user. Read activity. Data is read from a stored data group type. Write activity. Data is written to a stored data group type. Calculate activity. New data is calculated from existing data. In Figure 3, a UML class diagram for data-oriented abstraction can be found. Based on the abstract presentation, mapping for a specific method can be made. H = ((,..-X, Fl5..., Fa) (E 1) The transactional type Ti is a vector of activities T = (p15..., p) (E 2) An activity is further described by four attributes: • its class dik e {Entry, Exit, Control, Confirm, Read, Write, Calculate}, • for read and write activities, the data group type referenced rik, • the set of data elements Dk handled and • for calculate activities, the set of data elements calculated Ck. In the equation E3, i can have values from 1 to t and represents the transaction activity it conforms to. k runs from 1 to n identifying activity within the transaction. Pk = (&lk, rlk, Dk, C lk ) The data group type Fj is a set F, = {(di,gJi),K,(diri,g, )} (E 3) ,rj