44 SOURCE CODE ANALVSIS INF0RMATICA4/1988 UDK 681.3:519.683 Radovan Andrejčič Univerza v Mariboru Bojan Pečeit Isicra-Delta, Ljubljana In a software 1 lot of costs are programmers source c solving the similar some kind of rules f called "programming 135 programs we each consisting of testing, sampling, a Without knowing inte source code between groups. Differences ife cycle the most expensive is the phase of maintenance. A caused by different solutions of similar problema. The ode is an exhibitional example of a variety of algorithms tasks. In larger programmer groups there are often arranged or programming the source code. They are nearly always standards". re analyzed from four programmers groups (coraputer centers) 2-4 programmers. Statistical methods such as statistical nalysis of variance, etc were used as an unbiased judge. rmediate appointments about the programming was compared the programmers, programmers within groups and the code between between programmers within groups were surprisingly small. Vzdrževanje je najdražja faza veliko stroškov povzroča raznol programerskih skupinah si oblikuje skoraj vedno imenujejo "programski Delo opisuje analizo 135 (računalniških centrov) sestavi j nepristranskih statističnih testih poznavanja dogovorov o programi programerji, programerji v okv nepričakovano majhne razlike med p živijenskega ikost reSevan jo neke vrste standardi" . programov iz enih iz 2 - vzorčenjih, ranju je bil iru skupine rogramerji v o cikla programske opreme. Zelo ja podobnih problemov. V vecjih pravila programiranja, ki jih štirih programerskih skupin 4 programerjev, ki temelji na analizi variance itd. Brez a primerjana programska koda med in skupinami. Presenečajo kviru skupin. 1 Introduction Many software life cycles from different authors have been proposed. They differ in unimportant details. It is common to ali of them, that the phase of maintenance is the most expensive. This phase is now the major programming activity, and very soon more programmers will be performing maintenance than development IJones p. 35]. How to reduce costs of maintenance? Few would disagree that the quality software is not less expensive. But what is the quality software anyway? 2 Software Quality It is as hard to define as defining a "good car driving". It is differently comprehended from a programmer to another programmer, from one manager to another etc. With the most known facets the software quality can be defined [by Arthur) as: software quality F(correctness, efficiency, flexibility, integrity, raaintainability, portability, reliability, reusability, testability, usability) where each facet can be further reviewed through more criterias. For an example the niaintainability can be presented as: maintainability = F(concision, consistency, modularity, simplicity, instrumentation, self-documentation) Some of these criterias are easy to measure, others are not. Everyone can explain modularity, but descriptions vary from one person to another - from equalling modularity with the structured programming, over equalling with a "no GOTO programming", to a philosophy of cohesion and coupling. And concision and simplicity? Specter of ansvvers is nearly unlimited. Different comprehensions cause different solutions. And this is very often a reason which makes programmers spend more time and money to understand the other programmer than to solve the problem. Achieving an uniform coding through exact standards is not realistic. "Many rules do have legitimate exceptions" (Grauer p. 921. But on the other hand - nearly every group of programmers or computer center elaborates its own philosophy of programming. That guidelines are usually called "programming standards". So, a kind of uni£ormity is possible. But how muc h? 45 Table 1 Illustration about the Sample Size 3 Source Code Analysis 3.1 Technics It is of course impossible, or at least too expen3ive to extract data froni a sample by hand. A tool or tools are needed. Our research of the source code has based on two prograras. The first one has been oriented on the analysis of the WORKING:-STORAGE section. Its input has been the cross reference and the raap listing produced by the corapiler. Results have given information about distributions of variable descriptions, number of references, number of words in variables, paragraphs, USAGE clauses, etc. The other program has been oriented on a procedure division. It has produced a table of usages of the COBOL reserved verbs. Occurrences of each verb have also been analyzed in the IF statement. Logical operators have been counted detail in either IF and PERFORM UNTIL statements. This program has also given a number of coroments, number of paragraphs, sections, library lines of COPY statements, total number of verbs etc. Both programs as well as the whole research were done under the DELTA/V V2.0 operating system. Because the raajbrity of the sample programs were written for the PDP-11 Computer with the DELTA/M operating system, a little recoding was sometimesiv needed. What does this mean for the transportability of programs? (This interesting question is not the subject of this paper). . 3.2 Sample 3.2.1 Criterion for a Sample Collect only a t operational immediately in a sample or just th čase, the an the applica differences show greater ing a echni one aris - eve e si alysi tion betwe simi nd analyzin cal probl Very impo en: which ry program gnificant s gives the But this en similar larity than g a sample is not em, but also an rtant question is programs to include of an application ones? In the first exact answer about perfection can hide programs. It might it really exists. In our research the second method has been used. 3.2.2 Sample Size Four applications (programraing groups) from different computer centers were included into the sample.. It was common to ali of them that they used the same computer language - COBOL and each group had forraed some kind of its own programming rules. It is not worth mentioning that they aH sweared on the structured programming (which was prescribed in their "standards"). In this paper applications are marked with letters "A" through "D" and programmers within a group with numbers. Data in table 1 have no significant meaning. They are presented just as an illustration of a sample size. IProg Iramr 1 Al ! A2 ! A3 ! A4 ! BI ! B2 ! B3 ! Cl ! C2 ! Dl ! D2 ! D3 ! sA < sB ! sC ! sD ISUM No p rgms 11 10 5 9 16 17 2 15 10 22 8 10 35 35 25 40 135 "No of .lines 5274 5517 1344 4550 13977 17706 2548 10599 6437 45908 15606 11149 16685 34231 17036 72663 140615 ! Average ilin/progr ! 479.45 ! 551.70 ! 268.80 ! 505.56 ! 873.56 ! 1041.53 ! 1274 ! 706.60 ! 643.70 ! 2086.73 ! 1950.75 ! 1114.90 ! 476.71 ! 978.03 ! 681.44 Stahdar! deviat.! 277.0 ! 331.6 ! 180.0 ! 209.4 ! 349.6 ! 555.2 ! 393 ! 195.9 ! 323.1 ! 617.6 ! 775.2 ! 389.3 ! 281.9 ! 475.8 .1 256.4 ! ! 1816.58 ! 731.2 ! 1 1041.59 731.9 ! Exec. 1 verbs ! 2544 ! 3079 ! 706 ! 2908 ! 5116 i 6984 J 1029 ! 3042 ! 2187 ! 19147 ! 6834 ! 3518 ! 9237 ! 13229 ! 5229 ! 29499 ! 57194 ! 3.3 Analyzing Comment Statements Comment 3.3.1 Iroportance of the Statements "Although COBOL is often thought of as a self-documenting language, this is only partially true. With a careful choice bf words, each statement can indeed be self-documenting, but it cannot explain its own purpose: it merely states its contribution to a technique or algorithm" [Ledin, Kudlik, Ledin p. 97]. Comments are stili needed, they become even more and more important. Specially in the last time, when prograras are often not maintained by the original author. As iurdon says "No programmer, no matter how wise, how experienced, how hard pressed for time, no matter how well intentioned, should be forgiven an uncommented program". 3.3.2 Number of Comments per Source Code C Absolute number of comments in a program does not have any meaning. It needs to be compared with the number of source lines, or the number of executable statements, or with the reserved COBOL verbs. Table 2 presents data about the number of source lines per comment where source lines per comment (SLO is calculated as SLC = total number of lines number of comments Table 2 - Source Lines per Comment Prog ramr Al A2 A3 A4 BI B2 B3 Cl C2 Dl 02 D3 No p rgms 11 10 5 9 16 17 2 15 10 22 8 . 10 No of comment 257 318 95 223 1820 2720 432 2611 1387 8140 2922 2418 Average SLC 20.54 17.35 14.14 20.37 7.68 6.51 5.9 4 .-06 4.64 5.64 5.34 4.61 Standar deviat. 5.76 3.48 4.43 5.90 •^ 2.52 1.54 1.50 0.49 1.63 0,42 1.22 0.5 Sum of squares 5003.5 3131.2 1097.7 4046.0 1044.3 761.1 74.1 250.8 241.7 702.8 239.9 215.1 46 3.3.3 Differences in Percentage of Comments between Applications Is there any significant stability in comnienting programs? This answer was researched with the analysis of variance. In table 3 there is an analysis of not only differences between programs and programmers, but also of differences between applications. [Andrejcic p. 1611. Table 3 - Analysis of Variance between Applications and Programmers 1 Source of IDeglSum of! Mean ICalcu- ! F ! ! Variation !fre!square!square!lated F! tablel + + + + + + + Applications! 3 ! 4506 . 4 ! 1502 .1! 37.33 ! 23.70! Programraers ! 8 ! 207.3! 25.91! 2.92 ! 2.663! Programs !123!1091.2! 8.87! ! ! The first null hypothesis - that differences between groups (applications) are not significant can be absolutely rejected '^(3'0.001) ~ 23.7). The second null hypo£hesis, that differences between programmers do not exist can be rejected too, but the risk is this time for a bit greater over half a per cent 'E'(8'0.01) ~ 2.663). This has given a reason for a čletailed investigation about differences between programmers within groups. (See table 4). Table 4 - Analysis of Variance between Programmers within Applications + = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =:=: = = = = = + ISource !App!Deg! sum !mean!calcula!F(O.1)! !of var.!lic!fre!squar!squ.!ted F ! table! lbetween! A 1 prog ! B 1 ramm ! C 1 ers ! D 184.3!61.4! 2.125 ! 2.28 ! 13.7! 6.9! 1.502 ! 2.49 ! 2.0! 2.0! 1.549 ! 2.88 ! 3.62! 3.6! 7.262 ! 2.44 ! between! prog ! ramms ! 1 A B C D 31 32 23 37 896.2 146.6. 30.0 18.5 28.9! 4.6! 1.3! 0.5! These analyses have shown, that the only application in which significant differences exist was the application "D" (for the risk of 10%, but it is not greater for the risk of 0.1% ~ ^(2-0.001) " 7.29)1 The t-test proved that the programraer "D3" had more comments than the other two. Results have shown greater stability in viriting comments than it had been expected. Comparing the average (9.03 source lines per comment) of this sample with the previous investigation gives also unexpected results. Al-Jarrah-Torsun (page 344) have • counted an average of 66.5 source cards per comment card. It is such a great difference, that it needs no special statistical prof. It does not also need the result of Smolej-Korelic - 23.82 lines per comment. 3.3.4 Correlation between Program Length and Number of Comments Naturally, it is expected that longer programs are more complex and for this reason they need to be more commented. But the previous investigation of dr. Smolej and Korelic(*) discovered unexpected negative correlation .between comments and characteristics of complex programs. Table 5 - Correlation between Program Length and Densitv of Comments Prog!Coe.cor.! t(table) Al A2 A3 A4 BI B2 B3 Cl C2 Dl D2 D3 0.160 -0.186 -0.702 0.771 -0.186 0.148 1.000 0.277 -0.535 0.304 0.705 0,409 0 0 1 3 0 0 1 1 1 2 1 486 525 707 199 708 580 038 790 428 435 269 t(0.50; t(0.50, t(0.10; t(0.01. t(0.40. t(0.50. t(0.20, t(0.10; t(0.10. t(0.05. t(0.20. 9) 8) 3) 7) 14) 15) 13) 8) 20) 6) 8) = = = = = = = = = = = 0.703 0.706 2.353 3.499 0.868 0.691 1.350 1.860 1.725 2.447 1.397 1 1 J 1 1 ! 1 1 ! 1 1 1 According to table 5 programmer "A4" is the only one who can be assumed to have larger programs less commented. The risk of rejecting the null hypothesis, that the correlation coefficient is not significant, is about over 1%. The nearest result of the programmer "D2" increases this risk up to over 5%. Four programme "C2") had eve correlation. This programs had r Programmer "A3" h -0.702, but his too small to rej Programmer "C2" correlation -0.535 (10) was much cl null hypothesis. rs (•'A2", "AS", "BI" and n negative coefficient means that the larger elatively more comments. ad this coefficient even amount of the sample (5) was ect the null hypothesis. with the coefficient and greater amount of sample oser to the rejection of the 3.3.5 Sampling Contents Of Comments "The mere presence of comments, however, does not ensure a wel1-documented program, and poor comments are sometiroes Morse than no comments at ali" IGrauer p. 103]. There are also knoun the first rules which suggest how to write comments (to explain reason, not to duplicate code, etc). Their present usage can be compared with the first considerations about structured programming in the early '70. How to establish the qualitY of comments? At least two problems occur. The first one is to distinguish good comments from bad ones. It is impossible to do it automatically. A man as an observer and arbiter is needed. And this causes the second problem. The amount of comments is too large to examine every corament line. Dr. Smolej and Korelic have analy2ed 238 programs written by 8 programmers from one Computer center with the goal to find representative characteristics of an average program. 47 Sainpling was chosen to illustration of the quality Samples of comments were collect programmer. Each line was estlmated as good or bad. Criter comment line were easy to sati that might be of any help in unde code was accepted as good. comparing with the nearest lines was done. Also the AQL was very the risk of 5%. MIL. STO. 105D d plan was used. give an of comments. ed from each subjectively ias for a good fy. Each line rstanding the No additional of source code low - 10% with ouble sampling Table 5 - Sampling the Quality of Comments !Prog Numb. First ! rainr i comm. ! amo I Al 1 A2 J A3 1 A4 ! BI 1 B2 i 33 ! Cl i C2 1 Dl ! D2 ! D2 257 318 95 223 1921 2720 432 2610 1378 8140 2923 2418 20 32 13 20 80 80 32 80 80 80 80 80 AC 3 5 2 3 11 11 5 11 11 11 11 11 sample RE 7 9 5 7 16 16 9 16 16 16 16 16 Second res lamo 4 14 4 12 46 39 17 5 7 3 7 6 20 13 AC 8 6 sample RE 9 7 res 7 10 Acc ! Rej ! AC ! RE ! RE ! RE ! RE ! RE ! RE ! AC ! AC ! AC ! AC ! AC ! Results have very clearly rejected programmers with the less commented programs and accepted prograiramers with better results. What coincidence? Obviously, some groups take a great čare about this problem, while the others do not! 3.4 Analyzing User-Defined Words In applications "C" and "D" similarities are imroediately seen. Not only tests of the mean values and F-tests of the intermediate differences of standard deviation, but also much harder the Kolmogorov-Smirnov test of goodness of fit have proved that there were no differences in distributions between programmers within groups. This was specially surprising in the application "D", where distribution of each programmer was bimodal. (Every programmer had more variables with length of 13, 14 or 15 characters than with 10, 11 or 12). The first explanation waa, that this was caused by the influence of the "COPY" statements. But further analvses had contradicted this suspicion. 348] The Kruskal-Wallis procedure [Andrejcic p. 12 1369 12 100 <12 + 1) 361 144 - 3 3 2 3 (12 + 1) = 7.47 7.81 •(0.05:3) reason to reject the null hypothesis. compared with chi-square gave no that there were no significant differences between applications. However, the result was very near to the border value for the risk of 5%. 4 Conclusion 4.1 Interpretation Beside correct comments a mnemonical significant data names are very important for understanding a data flow. Nearly ali authors who deal with programming techniques suggest to use as many of the 30 characters as needed to make names in a program easy to understand. Not only to the original author, but for others as well. Maybe this is a reason for a surprise when Al-Jarrah-Torsun discovered that the average iiser defined name had "only" 7.81 characters. In the next table distributions of ali user-defined names are shown. Results are grouped into classes of three lengths. The hyphen is counted as the other characters. Only programmers "BI" and "B2" have user-defined names longer than 18 characters. The first discovery was that the relative number of comments is increasing (comparing with the oldest analysis by Al-Jarrah-Torsun and a bit younger by Smolej-Korelic). Ali applications were produced with the interactive editor, while Al-Jarrah-Torsun wrote about cards. So, maybe also the economical effects can have some influence on the density of comments. Not only the density, but also the constancy was surprising. It vos even not effected by the program length, as it had been measured by the previous analysis. Influence of the group agreements on the programmer were reflected immediately. This brings to a conclusion, that commenting is given more and more čare. It has now its plače also in "the programming standards". Table 7 - Distributions of Lengths of the User-Defined Names !Length ! 1-3 ! 4-6 1 7-9 ! 10-12 ! 13-15 ! 16-18 ! 19-21 ! 22-24 ! 25-27 ! 28-30 !average 1 rank Al ! A2 ! 100 334 419 72 22 6.6 8 66 ! 548 ! 270 ! 153 ; 50 ! 11 i 7.2 ! 6 ! A3 95 105 82 4 4.9 12 1 1 1 A4 248 333 323 5.2 11 BI B2 B3 96 ! 19 ! 15 596 1070 315 255 186 142 105 94 44 10.7 1 995 1400 760 301 126 38 . 1 8.7 2 308 266 58 9 2 6.9 7 ! Cl ! 384 11643 1 341 1 76 1 30 1 2 1 5.6 1 9 1 r 1 1 1 1 1 ! 1 1 C2 272 1026 221 76 10 5.5 10 1 Dl 1 180 12168 13966 1 385 11155 1 21 1 7.9 1 5 D2 D3 59 1 30 813 1269 238 376 16 8.0 4 731 1072 179 296 13 8.2 3 1 1 1 1 ! 48 Sampling of comraents gave some disappointing results, or at least unexpected. It was verv easy to distinguish between the good and the bad comments. Criterias were easy to achieve, but results rejected programmers with the less coramented programs. gave no reason to contradict the hvpothesis that the both saraples had statistically equal mean values. 4.2 Comment of the Analysis After the research was finiahed, each "programming standard" was studied in detail. Results of the analysis were compared with these prescriptions. In applications"C" and "D" detailed programming guideJines about the form of comment vere stated, while in others they were omitted. Al-Jarrah user-defined n expectation th to be on avera p. 343] was no average of 8 value. Equal user-defined Descriptional the long user- Torsun found that the average ame had 7.81 characters and their at it "was expected to find them ge much longer" [Al-Jarrah-Torsun t in plače. It seems that an characters is the most common ity of distributions of the names vas greater than expected. estimates about the importance of names vere: Bi - very important B2 - very important B3 - very important Cl - less important C2 - less important Dl - very important D2 - important D3 - important It needs to be stated clearly, that the goal of this analysis vas not to point to the quality of the aoftvare. The goal vas to find aimilarities and differences betveen applicationa and programs vithin an application. And this paper is only to give a short illustration of the analysis, so only a part of the research is shovn. There are of course more calculations and comparisons. This analysis neither measures nor estiraates the quality of applications, It is impossible to do it just on some facets about the State of the source code. It is veli knovn, that the quality of the softvare is designed and determinated in the previous phases of the softvare life cycle. The quality of the source code is not the most important component of the softvare quality. So, it cannot be made equally vith the softvare quality vhich make part of the linear equation [by ROLAND] An interviev vith programmers on the application "A" vas not possible. Ansvers vere as expected, except the programmer 33's and Dl • s. Programmer "B3" vas a beginner and the vorst typist. "D3" vas also very bad, the vorst in his group, but they both ansvered under irapreaaion of the group agreements. If the programmer "B3" vould be separated, the Kruskal-Mallis procedure 12 1156 11 9 2 (11 + 1) 289 144 - 3 * (11 + 1) = 8.18 vould reject the null hypothesis (vith the risk of 5%), that there vere no differences betveen applications about lengths of the user names. This vould prove, that the statistical significant differences exist. For this reason the correlation betveen the typing speed and the length of user-names vas not analyzed. As there vere nearly no differences betveen programmers vithin groups, results vere obviously more depended on agreements than the dexterity. The suspicion, that the uniformity of distribution vas caused by the COPy statements in the WORKING-STORAGE section vas comprehended. The amount of user-names froro the library files vas found to be very lov. With the method of comparing the mean value vith the constant it vas evidenced that each programmer had different average than the saraple of Al-Jarrah-Torsun (7.81) vith the great«st risk of 3.67 for the programmer "D2". Coincidently, the vhoie sampje together had an average of 7.79 vith standard deviation of 3.52, so critical risk (CR) lAndrejcic p. 100] 7.79 CR = 0.916 3.62 \f 27464 vhere W's are veighting factors and X'a are softvare metrics - each of vhich may be or may not be given in turn by linear equation3 of the same form, and C is the constant. One of them is also the maintainability as it had been shovn at the beginning. Uniformity of code can be of a great help in eliminating difficultiea and frustrations in authorship of the program. Anyhov, differences v "programming s if they ali they are a gre of similar p this is a subj be eliminated realistic poss the analysis ithin groups vere tandarda" vere d referred to the at reaaon for dif roblems. And our ective argument, Our analysis ibilities to achi References proved that very small, but ifferent. Even same philosophy, ferent solutions opinion is that vhich needs to examines for the eve it. dr. Radovan Andrejcic: "STATISTIKA PRI KADROVANJU IN IZOBRAŽEVANJU" - VSOD Kranj 1979 Lovell Jay Arthur: "MEASURING PROGRAMMER PROD0CTIVITY AND SOFTWARE QUALITY" - John Willy & sons 1985 dr. Robert T. Grauer: "STRUCTURED METHODS THROUGH COBOL" - Prentice Hali Capers Jones: "PROGRAMMING PRODUCTIVITY" - McGrav-Hill 1986 M. M. Al-Jarrah and I. S. Torsun: "EMPIRICAL ANALVSIS OF COBOL PROGRAMS" - Softvare Practice and Experience 9/1979 George Ledin Jr., Michael Kudlick, Victor Ledin: "THE COBOL PROGRAMMER'S BOOK OF RULES" - Belmont California Dr. Vitomir Smolej and Igor Korelic: "EMPIRIČNA ANALIZA COBOLSKIH PROGRAMOV" - Informatica, Ljubljana oct. 1981 John Roland: "SOFTMARE METRICS" - Computer Lang. 6/1986 dr. Francis J. Wall: "STATISTICAL DATA ANALVSIS riANDBOOK" MoGr.iv-Hill 1986