THE NORMALIZATION OF RELATIONAL DATABASE INFORMATICA 3/1988 UOK 681.3.06 PROLOG:519.24 Tatjana Welzer Ivan Rozman Jdzsef Gydrk6s University of Maribor After a brief presentation of base and some definitions neded, the aim of tho artiole Is to deal with the normalization of relational database.The implementation of normalization procesa will be shown by PROLOG, tha fifth generatlon language. The ascendanc/ of this language over the languages of the fourth generatlon is that tho program is not defined as a sequence of ateps but as a discription of relations between objects. This enables a loglcal conoluaion in que3tlons that are put in various ways. The built-in databaae of PROLOG will be oxploited as vreli. Po kratki predstavitvi relacijske baze in definicij, ki Jih potrebujemo v nadaljevanju, bomo v članku predstavili proces normalizacije relacijske baze podatkov. Izvedbo procesa normalizacije bomo predstavili s PROLOG-om, jezikom pete generacije. Prednost tega jezika pred jeziki četrte generacije jo v tera, da program ne definiramo kot zaporedje korakov, temvefi kot opis relacij med objekti, kar omogoča logično sklepanje ob različno zastavljenih vprašanjih. Prav tako pa bomo izkoristili tudi PROLOG-ovo vgrajeno bazo podatkov. 1.0 IHTBODOCTION One of the most important modules for various Computer applicatlons is database. The database can be a hierarchlcal database,' a netvork database or a relational database. The latter is more and more often used inatead of the first two databases and is succesafully galning ground on ali flelds off application. Its advantage is above ali in its airaple and user friendly transfer from the paper to the Computer. Successful work wlth the relational database can be ensured oni/ by a thoughtful formulatlon of base. Because of its extensiveness thia formulatlon ahould be aided by a corresponding tool. 2.0 BELATIOMAL CATABASS Relational database is a temporally varlable multitnde of relations /ALAG84/. R(A1,A2 ,An) is the relation over multitudes Dl,D2,...,Dn If the submultitude of Descartes product is aa follovs: R a { Dl X D2 X X Dn } Dl,D2,....,Dn mean the domalns of attributes. The attributes Al,A2,...,An denote the structure of the relation which is called relational scheoma. 2.1.0 RELATIOMAL ALGEBRA Relational algebra is a slraple formal language which enables data manipulation /CODD70/. Relational algebra consists of operations over multitudes (union, Intersaction and dlfference) and speclal operation (projection,seloction,join and division). For our further work, operation of projection is of main Importance. There is the following relation: R(A1,A2 An). X is a submultitude of the roultitude of attributes' X c {Al,A2,...,An}, Y is the complement {Al,A2,...,An}/ X. The relation R(A1,A2,...,An) can be written as follovs; R(X,Y). The operation of projection of the relation R accordlng to attributes X is marked aa follows: R[X] and defined: R[X] = {X I 3y : (x,y) e R(X,Y) } 2.2.0 LOGICAL DEFENOENCES 2.2.1 FONCTIOHAL DEFKNDKNCK There Is relation R(A1,A2,...,An) and submultitudes of the multltude of attributes X C {A1.A2,...,An} and Y gi {A1,A2 An}. R[XY] denotea the projection of the relation R accordlng to attributes from X and Y. The n funotlonal dep«ndance X —> T exi3ts if and only if it is valld for RtXY] in ev6ry moment that there is a fiaictional dependence RrX] --> R[Y]. The functional dependence X —> Y is said to be coMplete if for every real aubmultitudo y( (X' C X) it is valid that X' -/-> Y. If it is valid that X' —> Y thon tho functional dependence X —> Y becanes a partial functional dependence. 2.2.2 KKY R(A1,A2,...,An) forma a relation. X which is a submultitude of the multitude of attributes is said to be the key of the relation R if and only- if the followlng two conditions are fulfilled: (i) X detennines funotionally ali attributes of the relation R, X --> Ai for i = 1,...n. (ii) no real submultitude of the multitude X posaesses this characteristic, for X' C X X' -/-> Aj is valid l Ai, for i = l,...,n, then the key of the given relation is a complete multitude of attributes A1,A2,...,An. 2.3.0 NORMAL TGBKB Normal forms are rules on the joins of attributes into relations in which logical dependences are taken into account /DATE86/. Taklng theae rules into consideration the iregularities (anomalisms) of data input, deletingand updatlng are avoid. 2.3.1 THK riRST HOBMAL FOBM The relation R is in the flrat normal fora if and only if the values in the domains are atomic for every attribute A in the relation R. 2.3.2 THS SECOtID HOSMAL FORM Let X be the multitude of ali attributes R(A1,A2,...,An), which are not a part of the key of the relation R. It is said that the relation R is in the second normal fom if and only if each attribute from X is coopletely functionally dependend on each key of the relation R(Al,A2,...,An). If the relation R(A1,A2,...,An) is not in the second normal form, then their exists a deconpositlon of tho relation R(A1,A2 An) into a multitude of relations which are in the second normal form. The relations obtained in this .way can be united again into a provioua relation by means of the operation of natural Join. There is the relation R(A1,A2,...,An) which does not exist in the second normal form. The relation R can be recorded oquivalently in the form R(X,Y,Z). In this čase X means a multitude of key attributes and Y means a multitude of non-koy attributes. X —> Y forms a partial functional dependence. The multitude Z oovera ali the remaining attributes of the relation R(A1,A2,...,An) which exist neither in the multitude X nor in the multitude Y. The multitude X can be shoun by X = X'X''. Then it is valid: X' —> Y. It is a complete functional dependence. If the relation R(X,Y,Z) is supplanted by the projection BtXZ] and RtX'Y], then the projection R[X'Y] is in the second normal fona because X' —> Y forms full functional dependence. It should be found in. which normal form the projection R[XZ] exists and, if necessary, this projection should be decompoaed in the same way as R{X,Y,Z). The procedure is final because after each decomposition of relation tho relations with less nuraber of attributes are obtained. The previous stateaents can be confirmed by the following demonstration: R[X'Y] *x' R[XZ] = R[X'Y] *x' R[X'X"Z] = = R[X'YX"Z] = R[X'X"YZ] = R[XYZ] = R(X,Y,Z) 3.0 APPLICATION OF PROLOG IN THK NORMALIZATION PROCES 3.1 NOTATION OF RSLATION IN PROLOG The description of structured of indivldual relations is presentod by a relational schema. To model our relational schema the structured analaysis tools of DeMarco /DEMA79/ and Gane and Saraon /GANE79/ (data flowcharts, data dictionary, data store) vere used. The rosult of modeling is the relational schomaa wlch are used for data storage by the relational database management systora. The relation (tho relational schema filled up wlth data) exits, depending on the choico of mothod in the first normal form or unnormalized. The tables obtained were inrespective of their normal form, stored, in tho form of PROLOG atructure. PROLOG's built-in database ensures a baaic mechaniam for data storage and data access. The notation of relation, that ia of the whole relational baso is a simple ono. The data obtained from a relation are record in tho form of PROLOa's facts, consisting of predicates and attributes. The name of relation is vritten in prodicate; the values of attributes obtained from relational scheme are vritten in attributes. After the input of ali data into the relational schema we can see that there is a record on the scroen of the terminal which is equivalent to the record on the paper. This means that copying the relation from the paper to the screen is 1:1 (one-to-one). 3.2 IMPLKMKNTATION 07 PROJECTION BY MEANS OF PROLOG The table into which the required columms were translatod and in which the redundant lines vere deleted is the rosult of operation projection. ' The presonted relations are described by moans of PROLOG and in this. way the program implementing a projection of suitable relation is designed. * Program Projection * proJection(List_of_solutions,Solutions):- flnd_equal(List_of_solutions,Solutions). find_equal([],[]). find_equal([H!T],Solutions):- member(H,T), find_equal(H,Solutions). find_equal([H I T],Solutions):- not_meraeber(H,T), write(T), nI, find_equal(T,[HISolutions]). member(X,[]):-!. memeber(X,C_|Yi):- meraober(X,Y). 14 not_memeber(X,[]):-! . not_memeber(X,[Y!Z]):- X \== Y. not_memeber(X,Z). findall(X,H,_):- asserta(found(mark)), oall(H), asserta(found(X)), fail. findall(_,_,L);- colleot_found([],M), i'= M. collect_found(S,L):- getnext(X), 1 collect.found([X1 S],L). collect_found(L,L). getnext(X):- retract(found(X)), I • f X \== mark. In the vorklng verslon of the program Projectlon the Implementation of the operation is released hy the predicate findall and projoction. ?-findalK[cho3en_atribute3*], name_of_relation*(all_atribute3*),Solutions)), projection(Solutions,M). In the relation chosen ali posalble solutions are looked for, first. Then in the list of solutions (List_of_3olutions) ali redundant lines are excluded so that equal reoords (find_equal) are looked for whlch are not placed (not_member) on the list of final solutions (Solutions). 3.3.0 CHSCKINO THK RELATION IN THE LIOBT OF ITS NOBHAL rOSH 3.3.1 ONN(»iMALIZSD RELATI0N8 According to the definition from Chapter 2.3.1, the relation exist3 in the first normal form if and onl7 If the values in domains are atoralc. That is the values in the domain are not lists or sets of values or oomposite values. In praotice such records are found very often. If there is an unnormalized rolation over wich we want to perform certain operations, then the relation should be translated Into the first normal form. For the normalizatlon of such an unnormalized table the followlng is required: - to check the exi3tence of redundant lines and to exclude them, - to flnd out whether the values of individual attributes in the domain are recorded as multitudes or llsts. If such reoords exi3t, they should be translated into a corresponding form. The program of reoord In PROLOG is based on checking the elementa in the structure of list into whlch the previou3ly written relation has been translated. When the nuraber of elements in individual lists Is found out, the paralel lylng elements are jolned into lines which are then transmited to the screen of the terminal in their normalized form. * In the operation ohosen_atributes, name_of_relation and all_atributes are substituted with the real names of atributes and relation. * Program Normalizatlon * ' normalization(A);- A == tab,!. normalizatlon(A):- nonvar(A), A \== tab, A =.. X, write(A), write('.'), nI, change(X), read(Al), normallzationC.Al). norraalization(A):- read(Al), normalizationCAl}. change([H|Tall]):- P = H. 3pllt(P,Tail,Llst_of_goal3). 3plit(P,TaH,List_of_goal3) : - count_arg(Tail,N), count_el_arg(Tail,K), [HlT]=Tall, Joln.rest([H|T],N,1,S,K,1,Llat_of_goals), form_goals(P,G,List_of_goal3). count_arg([],-1). count_arg([GlT],N) :- count_arg(T,Nl), N is NI + 1. oount_el_arg([H,[HI 1[]]1 T] , 1). count_el_arg([H.[HI|X31 T],K):- count_el_arg([H,X!T],KI), K is KI + 1. Join_re3t(_,_,_,S,K,M,Sl) :- M2 is K + 1, M2 = M, rever3e_list(S,0), S1=0,!. Join_rest(Tail,N,Nl,Llst_of_goal3.K,M,Sl) :- join_el(Tall,[],N,Nl,M,S), Ml is M + 1, join_ro3t(Tail,N,l,[S|Ll3t_of_goals],K,M1,S1). form_goal3(_,_,[]) :- I. forn)_goal3(P,H, [HlITl]) :- reverse_list(Hl,H2), X =..[P,H,H2], tab(30), write(P), write('('),wrlte(H),copy(G2),write(').'), nI, forra_goal3(P,G,Rl). copy([]) :- ! copy([G!R]) :- write(','), write(H), copy(T). join_el(Ll,L2,N,Nl,M,List) ;- N2 is N + 1, N2 = NI, List = L2,I. join_el([H|T],Tl,N,Nl,M,S) :- append_link(T,Nl,M,T2), N2 is NI + 1, Joln_el(CH|T3,[T2|TI],N,N2,M,S). append_link(T,Nl,M,X) :- n link(Nl,T,Arg), n_link(M,Arg,X). n_link(l,[HlT],C) :- C = H, !. n_link(N,[H|T],C) :- NI is N - 1, n_link(Nl,T,C). reverse_list([],[]). reverse_li3t([H|T],L) ;- rever3e_li3t(T,Ll), append(Ll,[G],L). appendC[],L,L). append([H|RT,L.[H:T1]) :- append(T,L,TI). 15'" 3.3.2.0 &TATING THK S£COKD MOSMAL FOBM In Chapter 2.3.0 we pointed out that in inputting, deleting and updating irregularities can occur if nonnal forms are not taken into account. An Irregular input can prevent to add new lines if ali the values for attributes are not knovTO. In deleting the lines the Information on certain attributes can be lost while the updating of one attribute value requires a change in ali relatlons vhere this attribute appeara. 3.3.2.1 8TATING THK LOCAL AND PARCIAL FUHCTIOHAL DKPKHDEHCB According to the definition in chapter 2.2.1, the exi3tence of functional dependence is stated by checKing the values of attributes that form the multitudes X and Y, between »hlch- we vrould like to shate the exi3tence of functional dependence (X —> V). It is valid that for each value x, x C X there is exactly one y, y t Y. To State the functional existenoe by means of PROLOG the program Projection is used first to design the relatlon H[XY]. Then in this relation the condition HtX] —> R[T], according to chapter 2.2.1 is fulfilled. 6y means of a modified predicate find_equal from the program Projection equal x's are searched. If they exlsts then the value of belonging y's we can conclude to state the exl3tence of functional dependence. According to Chapter 2.2.1 the functional dependence can be a full or a partial one. To State this characteriotic the multitude atribute X should be decomposed into real submultltudes (X', X" , X" ',.,..). Then the existence of partial dependence between the real submultltudes (X', X", X'",...) and the multitude of attributes Y should be checked according to the procedure described. A rule to decompose the list (multitude X) into sub-list or elementa (real submultltudes of the multitude X) should be added to the program in PROLOG. The fifth .generation language PROLOG wa3 chosen because of its simple copying of relation into the structure of PROLOG's facts and definition of rules (records of program) which implementod certain actions within the normallzation procesa. To bring the application nearer to the circle of p'otential users (also to those posseasing no knowledge how to program) the application will be transfered to a Peraonal Computer by means of operation aystem DOS (TURBO PROLOG Borland, Inc. will be used). In this development phase, the application offers above ali an aid for good design of relational database which provides the exiatence of successfull Information system. 5.0 LITKRATDRK COOD70 E.F.Codd: A Relational Model of Data for Large Shared Data Banks, Communication of the ACM, Volume 13, No.:6, June 1970 CLOC84 W.F.Clocshin, C.S.Mellish: Programming " in Prolog, Springer-Verlag, Berlin 1984 DKYI84 Deyi Li: A Prolog Database System, Research Studies Press LTD,1984 DATE83 C.J.Date: Database: A Primer, Addiaon- Wesley Publishing Company, 1983 DATE86 C.J.Date: An Introduction to Database System3, Volume 1, Addison - We3ley Publishing Company, 1986 KELL87 R.Keller: Expert System Technology Development and Apppllcation, Vourdon Press, Prontice-Hall Company Englewood Cliffs, NJ 1987 HARC86 C.Marcus: Prolog Programming, Addison- We3ley Publishing Company, 1986 WAH86 B.W.Wah, Guo-Jie Li: Tutorial: Computers for Artlfical Intelligence Applications, IEEE 1986 3.3.3 NOBMALIZATKM OJ RELATION INTO THK SKC(»n) HORMAL FORM The relation which was found out that it wa3 not in the secon normal form should be decomposed into several tables that would meet the condition on seoond normal form. According to the definition in Chapter 2.3.2, the relation R(A1,A2,...,An) is decomposed into tWo tables B[XZ] and Il[X'Y] vhich form the result of the projection. The relation R[X'Y] exists in the seoond normal form because the following condition should be met: X' —> Y forms full functional dependence. To normalize the relation into the second normal form given programs for projection and full functional dependence are used. To check the relation according to its third normal form slmilar atepts (required for the third normal form) should be carried out. 4.0 OONCLDSION The described process of normallzation in PROLOG is performed by IJS PROLOG on the raain frame computer VAX 8800. The looal Information 3ystem of research group wa5 chosen as a model. The data on research group members, thelr aotivitles, Horking results and the tools used by the member are recorded.