Informatica
An International Journal of Computing and Informatics
EDITORIAL BOARDS, PUBLISHING COUNCIL
Informatica is a journal primarily covering the European computer science and informatics community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international referee-ing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations.
Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confirmed by the Executive Editors.
The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Science and Technology.
Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article.
Executive Editor - Editor in Chief
Anton P. Železnikar
Volariceva 8, Ljubljana, Slovenia
s51em@lea.hamradio.si
http://lea.hamradio.si/~s51em/
Executive Associate Editor - Managing Editor
Matjaž Gams, Jožef Stefan Institute
Jamova 39, 1000 Ljubljana, Slovenia
Phone: +386 1 4773 900, Fax: +386 1 251 93 85
matjaz.gams@ijs.si
http://dis.ijs.si/mezi/matjaz.html
Executive Associate Editor - Deputy Managing Editor
Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si
Editorial Board
Juan Carlos Augusto (Argentina) Costin Badica (Romania) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Hubert L. Dreyfus (USA) Jozo Dujmovic (USA) Johann Eder (Austria) Vladimir A. Fomichov (Russia) Maria Ganzha (Poland) Marjan Gušev (Macedonia) Dimitris Kanellopoulos (Greece) Hiroaki Kitano (Japan) Igor Kononenko (Slovenia) Miroslav Kubat (USA) Ante Lauc (Croatia) Jadran Lenarcic (Slovenia) Huan Liu (USA) Suzana Loskovska (Macedonia) Ramon L. de Mantras (Spain) Angelo Montanari (Italy) Pavol Nävrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadja Nedjah (Brasil) Franc Novak (Slovenia) Marcin Paprzycki (USA/Poland) Gert S. Pedersen (Denmark) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Dejan Rakovic (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) Robert Trappl (Austria) Srikumar Venugopal (Australia) Terry Winograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) Xindong Wu (USA)
Executive Associate Editor - Technical Editor
Drago Torkar, Jožef Stefan Institute
Jamova 39, 1000 Ljubljana, Slovenia
Phone: +386 1 4773 900, Fax: +386 1 251 93 85
drago.torkar@ijs.si
An Exquisite Mutual Authentication Scheme with Key Agreement Using Smart Card
Chiu-Hsiung Liao
General Education Center
National Chin-Yi University of Technology
Taichung, Taiwan 411, R.O.C.
E-mail: cliao@ncut.edu.tw
Hon-Chan Chen and Ching-Te Wang Department of Information Management National Chin-Yi University of Technology Taichung, Taiwan 411, R.O.C. E-mail: {chenhc, ctwang}@ncut.edu.tw
Keywords: Diffie-Hellman scheme, transformed identity, authentication, key agreement, password update Received: June 25, 2008
To access a network system legally, efficiently and securely, the authentication scheme is essential and very important. In this paper, we propose a nonce-based authentication scheme using smart card. We use Diffie-Hellman scheme to enhance the security of ourprotocol. To lessen the computation load, the remote system alone proceeds the exponentiation computation and it also implements only once. The other computations are all concerned with simple one-way hash functions or exclusive-or operations. No verification table is needed in our protocol. The protocol provides not only mutual authentication between a user and the remote server but also achievement of key agreement. The protocol also supports convenient password update at the user's terminal. To avoid the identity duplication, we introduce the idea of transformed identity in ourprotocol.
Povzetek: Opisana je nova shema dostopa do omrežja s pomočjo pametne kartice.
1 Introduction	Recently, some authentication protocols using smart
card have been proposed [6,11,12,14]. Using smart card
In recent years, people communicate via networks much	has many merits. Not only can it implement computations
more frequently than before. The frequency that net-	and store a wealth of useful information, like identifica-
work users transmit information and share the comput-	tion number, password and basic personal data, but also
ing resources increases very quickly. Moreover, with e-	it is portable. Although the protocol using public key en-
commerce being prosperous, people use computers daily	cryption is much more secure, it may incur a burdensome
to link with server to ask for service. In these situations,	computation load. Therefore, we propose an authentica-
the remote authentication and network security become in-	tion protocol using Diffie-Hellman scheme [15] to enhance
evitable and very important.	the security level and efficiency but to reduce the compu-
The authentication scheme is an essential part to assure	tation load for a smart card. In our method, the smart card
legitimate, secure and efficient access to a network system.	is responsible for simple computations and the server is re-
Among authentication schemes, password-based authenti-	sponsible for complicated ones. The proposed scheme also
cation is widely used. But password-based authentication	uses the one-way hash function and the exclusive-or oper-
is vulnerable to the dictionary attacks [1,2,3,4], i.e. the	ation to maintain security and convenience. To prevent the
password guessing attacks, because people are inclined to	replay attacks and the synchronization problem, we adopt
choose easy-to-remember identities or meaningful phrases.	the nonces in our scheme instead of using time-stamp. Fur-
As a result, a number of protocols have been proposed to	thermore, we introduce the design of transformed identity
overcome the guessing attacks [1,5-7]. Some of the im-	[16] in our scheme to avoid the duplication of identities. proved protocols [1,8-12] use public key encryption in authentication. The others [6,11,12,14] use nonces and one- The rest of this paper is organized as follows: Some re-
way hash functions. The nonce-based protocol is more se-	lated schemes are reviewed in Section 2. The proposed
cure because the nonce is randomly generated. As for one-	authentication scheme is described in Section 3. The secu-
way hash functions, it is irreversible. Thus, the protocol	rity analysis of our scheme is discussed in Section 4. The
using hash functions and nonces is safe and secure.	efficiency and specialities of the proposed scheme are dis-
cussed in Section 5. The functionality and performance of the proposed scheme are compared with related schemes and the result is listed in Table 1. Finally, the conclusions are given in Section 6.
2 Reviews of related schemes
In this section, we review some related schemes briefly and closely.
2.1 Chien and Jan's scheme (ROSI scheme)
Chien and Jan proposed a nonce-based authentication scheme using smart card: Robust and Simple authentication protocol (ROSI) [6], in 2003. The ROSI scheme consists of two phases: "The registration phase" and "The authentication phase".
In the scheme, a prospective user, u, selects his identity, IDu, password, PWu, and an initial nonce, Ni. Then, the user transmits these values to the server, S, in registration phase. After accepting the application, the server stores IDu and h^(PW^WNi) in its database, where the symbol " || " is the string concatenation. The server also uses its secret key to calculate some parameters and stores them in a smart card. Then, the server issues the smart card to the applicant, u. After the authentication phase, the user and the server can mutually authenticate each other. However, in this scheme, it is necessary to set up a verification table and a legitimate user cannot update his password conveniently and freely when the security faces potential threats.
2.2 Juang's password authenticated key agreement scheme
In Juang's authenticated key agreement scheme using smart card [12], two phases are included: "The registration phase" and "The login and session key agreement phase".
A prospective user submits his identity and password to the server for registration. After getting a smart card, the user can use it to access the server. The user applies his smart card to compute a secret key and uses the key to encrypt a message, which includes a random value and an authentication tag. After receiving the message, the server computes the secret key and decrypts the received message to extract the embedded authentication tag. Then, the server verifies the validity of this tag. In order to attain the shared session key, the user's smart card has to encrypt a forwarding message and decrypt the received message from server to perform a nonce-checking. In this scheme, we found that the smart card should encrypt and decrypt several messages by using the cryptographic scheme. In this situation, the smart card has to compute the modular exponential operations, which require a large amount of computations. These computations may overload the capability of the smart card.
2.3	Hwang et al's remote user authentication scheme
The scheme [14] is comprised of three main phases and an additional one. The main phases are "The registration phase", "The log in phase" and "The authentication phase". The additional phase is "The password changing phase" within the user's discretion.
When a prospective user, u, wants to register with a server, S, he submits his identity, IDu, and a hash value of password, h{PWu) to the registration center of S. Then, the center uses the server's secret key, Xg, and the hash value of password to compute a shifted password, PW1u = h{IDu e xs) e h{PWu) and stores it with the hash function, h( ), into a smart card, where " e " is the exclusive-or operation. Then, the smart card is issued to the user.
To access the server, the user connects his smart card to a card reader and keys in his identity and password at the user's terminal. The smart card executes the exclusive-or operation on the shifted password and h(PWu) to attain a crucial parameter, h(IDu e xxg). The smart card then combines this parameter with a time-stamp to compute an authenticating value. Next, the user transmits these values to the server for authentication. On receiving the messages, the server executes the verification procedures and performs the authentication. However, although the scheme can verify a legitimate user, the user and the server cannot achieve the mutual authentication and the session key agreement. The scheme cannot avoid the time synchronization problem, either.
2.4	Behind the reviews
In reviewing the related schemes, we are motivated to propose an improved scheme. Not only do we supplement the deficiencies, but we also enhance the efficiency and the functionality. In our scheme, the verification table is not required and the mutual authentication can be achieved. Furthermore, a user is allowed to select and update his password freely. Finally, the computation cost is reduced in the proposed scheme.
3 The proposed authentication scheme
Our authentication scheme consists of four phases: the registration phase, the login and authentication phase, the key agreement phase and the password update phase. As mentioned before, for the sake of security, we prefer to adopt modular exponentiation in registration phase. But, it is performed only at the remote server to reduce the computation load for smart card. The login phase is executed at the user's terminal and the authentication is verified mutually between the user and the server. The key agreement is achieved by the user and the server respectively, and is kept temporarily for mutual communication in the session. As
for the password update phase, it is completed only at the user's terminal.
To describe our proposed scheme with ease, we use the following symbols and operations:
1.	The operator " ® " is the bit-wise exclusive-or operation.
2.	The symbol " || " is the string concatenation.
3.	The function " h " is a one-way hash function.
4.	For the sake of convenience, let the expression " X —> Y : M " mean a sender X transmits a message M to a recipient Y.
3.1	The registration phase
The registration phase is performed with the remote server. When a person, u, wants to be a legitimate user to a server, S, he offers an account application to S. The procedure is as follows:
Step 1: u —> S : IDu,PWu. Responding the challenge from the server, the applicant submits his identity, IDu, and password, PWu, to the server for registration via a secure communication channel. Both IDu and PWu are selected by himself freely.
Step 2: After receiving the response, the server confirms the formats of the submitted identity and password first. Then, the server takes note of the registration time, TS u and archives the user's IDu and related TS u for later authenticating use. Then the server performs the following four processes:
(1)	Compute the transformed identity [16], TIDu = TSullIDu, automatically by itself. The transformed identity, TIDu, can ensure the uniqueness of the identity. At this stage, the applicant only needs to remember his selected identity, IDu, and password, PWu.
(2)	Compute Au = h{TIDu 0 x), where the parameter x is the secret key of S and is kept confidentially.
(3)	Compute Bu = {gAu modp) ® PWu, where p is a large prime positive integer and g is a primitive element in Galois field GF(p).
(4)	Store the values, TSu, Bu and h(-), in a smart card and issue the smart card to the applicant.
3.2	The login and authentication phase
When a legitimate user, u, intends to login the server, S, the user's terminal and the server need to mutually authenticate each other.
Step 1: u —> S : Mi = {IDu, NTIDu, Cu}.
The user, u, connects his smart card to a reader. The smart card challenges the user for his identity, IDu, and password, PWu, which are selected at his application. The smart card automatically performs the following processes:
(1)	Generate a nonce, nu. Store the value, nu, temporarily until the end of the session.
(2)	Retrieve the stored registration time to generate the transformed identity, TIDu = TSullIDu.
(3)	Compute NTIDu = TIDu 0 nu.
(4)	Compute the value Cu = h(Bu ® PWu) ® nu.
(5)	Send the message Mi = {IDu, NTIDu, Cu} to the server, S.
Step 2: S —> u : M2 = {Du, NTID^}.
After receiving the message M1, S does the following processes:
(1)	Retrieve from the database the registration time, TSu, which is corresponding with the identity, IDu, of the connecting user. If no such corresponding user matches, the server terminates the connection. Otherwise, it goes on to the next processes.
(2)	Compute TIDu = TSullIDu, and n'u = NTIDu 0 TIDu.
(3)	Compute Au = h(TIDu 0 x) and gAu modp, then h(gAu modp).
(4)	Compute n'u^ = Cu 0 h(gAu modp). If n'u = n'^, the received NTIDu is truly sent from u and the parameters n'u and n'u^ should be the same as nu, which is generated by the smart card at the user's terminal. Hence, the legitimacy of the connecting user is authenticated. See Theorem 1. So, the communication will carry on. On the other hand, if n'u = n^', the server terminates the connection. Furthermore, the server stores nu in memory temporarily for later use.
(5)	Create a nonce, n^, randomly. Compute Du =
Cu0nu0Vs and NTID, = TIDu0ns. Then the server sends the message, M2 = {Du, NTIDs}, to the connecting user, u.
Theorem 1: If n'u = n'^, the user, u, is authenticated.
Proof. Since NTIDu = TIDu 0 nu, thus, n'u = NTIDu 0 TIDu = nu.
Also, given Bu = (gAu modp) 0 PWu, we have
Cu = h(Bu 0 PWu) 0 nu
= h((gAu modp) 0 PWu 0 PWu)) 0 nu = h(gAu modp) 0 nu.
Then,
n'U^ = Cu e h{gAu modp)
= h{gAu modp) e n^ e h{gAu modp) = nu.
It follows that n'u = n'U' = nu. The nonce, nu, is generated at the user terminal when the user, u, inserts his smart card into a card reader. So it is fresh and unique. It is also embedded in NTIDU and never exposed. No one can impersonate it or pry about it. Both TIDU and NTIDU are unique, and NTIDU can be computed by u only. Once n'u = n'U^ is proven, we verify NTIDU is really transmitted by u. Hence, the genuineness of the user, u, is authenticated.	□
Step 3: u —> ^ : M3 = {Eu}.
When u receives the message M2, he executes the following processes:
(1)	Compute n', = NTIDs e TIDu and n'S = Cu e nu e Du. If n's = n'', the communication goes on. In this situation, both n's and n"s are equal to ns, which is generated by the server. Thus, the server is authenticated. See Theorem 2. On the other hand, if n's = n'', u ceases the communication. Furthermore, u keeps ns temporarily at the user terminal for later use.
(2)	Compute Eu = (Cu e nu)\\{n, + 1). Then u sends the message M3 = {Eu} to S. The parameter n, + 1 is the response to the server.
Theorem 2: The server, S, is authenticated if n's =
Proof. Since NTID, = TIDu e n,, n's = NTID, e TIDu = ns.
Also, since Du = Cu e nu e ns, n'' = Cu e nu e Du = ns.
Then, we have n's = n'^ = ns. The nonce, ns, is immediately generated by S, when S verifies the genuineness of the user, u. So ns is fresh and unique. The transformed identity, TIDu is also unique. Thus, NTID, is unique and it can be computed by the server only. Furthermore, Du is computed with Cu, nu and ng. A false server can not forge all of them. Once n'^ = n'' is proven, the integrity of S is authenticated.	□
Step 4: After receiving the message, M3, the server finds Eu in it. Since Bu e PWu = gAu modp, Cu = h(Bu e PWu) e nu = h(gAu modp) e nu. Thus, Cuenu = h{gAu modp). So, Eu = (Cuenu)\\(ns + 1) = h(gAu modp)\\(ns + 1), and it is really the string concatenation of h(gAu modp) and ns + 1. The server can easily extract ns + 1 from Eu and find ns in there. At this time, the server ensures that the authenticating user does have the nonce, n,.
Now, both the user and the server can try for a session key agreement.
3.3	The key agreement phase
After receiving the nonce, ns, sent from the server, the user creates a session key SKu = h((Bu e PWu)\\ns\\nu). Once the server ensures that u has the nonce, ns, it generates a session key SKs = h((gAu modp)\\ns\\nu).
Since Bu = (gAu modp) e PWu is computed in the registration phase,
h((Bu e PWu)\\ns\\nu) = h((gAu modp)\\ns\\nu).
Thus, SKu = SKs. Therefore, the key agreement is achieved and the session key for the session communication is
SK = h((BuePWu)\\ns\\nu) = h((gAu modp)\\ns\\nu).
3.4	The password update phase
When a user wants to change his password for personal reasons or for the sake of security. He can do so at the user's terminal by performing the following:
Step 1: Insert the smart card into a reader and announce a password update request at the user's terminal.
Step 2: Key in the original password, PWu. The smart card calculates Bu e PWu.
Step 3: Responding to the challenge of the smart card, the user gives a new password PWu^ . The smart card calculates bu = (Bu e PWu) e PW* and then replaces Bu with this new Bu . At this time, the password update phase is completed.
4 Security analysis
Not only do we concern with the efficiency and the specialties of our scheme, but also we ask for security and the computational complexity in our proposed scheme. In this section, we will display the strength of our scheme first, and later we discuss the computational complexity. The security analysis is listed as follows:
(1) Our scheme can overcome the guess attacks:
The user is allowed to select his own identity and password freely in our scheme, so he is apt to choose easy-to-remember or meaningful identity and password. In this situation, it seems easy to guess the identity and the password of a legitimate user. However, the construction of transformed identity in our proposed scheme makes the transformed identity be an independent unity. The uniqueness can prevent the transformed identity from being duplicate and resist the guess attacks. An intruder guesses a legitimate
user's identity. The guessed identity can not be converted into a valid transformed identity without the exact registration time, which is stored in the user's smart card. As a result, a intruder's intent to access a remote server should be rejected without a valid transformed identity.
(2)	Our scheme is capable of resisting the man-in-the-middle attacks:
A malicious intruder may intercept or eavesdrop on the communication between a legitimate user, u, and the server, S. After intercepting the message Mi sent by u, he may impersonate u and replay the message to S. Then, he waits for a response message from S. The intruder can not compute the efficient TIDu from the intercepted NTIDu without the nonce, Uu, which is generated randomly by the smart card and is never exposed on the communication. Even though the intruder has the response message, M2, from S, he can not extract the nonce, Ug, from NTIDs, which is included in M2, because he has no TIDu at hand. The nonce, Ug, is generated by S and is needed to authenticate the server in Step 3(1) in the login and authentication phase. This nonce is also required to achieve the session key agreement. Furthermore, the intruder must respond the server with Us + 1. Because the nonce, us, is unavailable, this response can't be completed, either. Eventually, an illegitimate user should be rejected and the connection is terminated.
On the other hand, when a malicious person intercepts the message Mi, he may pretend to be the server that u is connecting to. Furthermore, he has no TIDu and he can't compute it because he has no TSu at hand. The intruder has no nonce, uu, either. Thus, he can not send an available parameters to the user u for authenticating the integrity of the server. The communication terminates when the authentication fails.
(3)	An intruder can not achieve session key agreement: The user's password is never exposed in the transmission. An intruder can not intercept the password or any information about it. Meanwhile, the parameter Bu is stored in the user's smart card, no one can access it. So, the parameter gAu modp can not be computed from Bu ® PWu. On the other hand, if an intruder intends to compute modp directly. He needs to compute Au = h{TIDu ® x) first. But, the secret key x of the server is kept confidentially. No one can have it. Hence, it is impossible to compute modp directly. Therefore, no session key agreement can be achieved without all of modp, uu and us at hand.
(4)	An intruder will be confronted with the complexity of the discrete logarithm:
The secret key x of the server is protected by the oneway hash function. It is not possible to derive it from Au = h{TID ® x). Trying to solve out Au from modp is also impossible, because the adversary
will be confronted with the difficulty and the complexity of the discrete logarithm problem. Without secret key x, an adversary can not pretend to be the server, S, in the communication. The parameter, Bu, can't be derived without Au. Thus, an adversary can not pretend to be the connecting user, u, either.
5 The efficiency and specialties of our scheme
From the procedures of the construction, we point out some merits in our scheme. We concern not only efficiency but also special properties.
(1)	No verification table is needed:
Once a prospective user, u, offers his identity, IDu, and password, PWu, in registration phase. The server, S, takes note of the registration time, TSu, to derive the transformed identity, TIDu. Then, S calculates the parameter, Bu, and stores it in a smart card. When the legitimate user wants to access the system, he only gives his selected identity to compute the transformed identity and then transmits it to the remote server. The smart card also generates automatically a nonce, Uu, to compute the authenticating values, Cu and NTIDu. Then the values are transmitted to the server. It is not necessary for the remote server to set up any verification table of passwords or other personal information.
(2)	The transformed identity is unique:
The construction of transformed identity makes the identity unique. A few users could select the same identities, but the transformed identities should eventually be different since our scheme takes the registration time into account. It prevents the duplication from happening.
(3)	The user's identity and password can be selected freely:
Since our proposed scheme uses the transformed identity to discriminate different users, the original identity is allowed to be selected according to the user's preference. Taking into account the registration time, the proposed scheme converts the selected identity into transformed identity. The transformed identities should be different from one another even if the selected identities might be the same. Thus, a user's identity can be selected freely.
The transformed identity is used to compute the parameter Au. Then, gAu modp is computed. The parameter Bu is generated by performing exclusive-or operation on PWu and gAu modp. Because Bu is stored in the user's smart card, no one can pry about it. Therefore, the password can also be selected freely.
(4)	Diffie-Hellman scheme is used:
In registration phase, the server calculates the parameter Bu through Diffie-Hellman scheme to enhance
security. Because the computation of modular exponentiation is burdensome for a smart card, the proposed scheme makes the server execute the operation in order to lessen the troublesome implementation for smart card and to speed up the computation.
(5)	The computations proceed very quickly and the load is low:
The modular exponentiation is the only burdensome and time-consuming computation. It is used on the Diffie-Hellman scheme and is performed only once at the remote server. The other computations at both the user's terminal and the remote server are just the one-way hash functions, string concatenations and the exclusive-or operations. The computations proceed very quickly, and the load is extremely low for either of them. The Table 1 demonstrates the computational complexity is simple.
(6)	The password can be conveniently updated at the user's terminal:
The server needs no password-verification table to check the a user's genuineness. The proposed scheme allows a user to update his password at his terminal. It is convenient and efficient for users.
(7)	The mutual authentication is executed:
The scheme can mutually authenticate each other between the user and the server. From the Theorem 1 and 2, the correct methods of the mutually authentication between the user and the remote server are proven.
At the end of this section, we compare our proposed scheme with some other schemes on the computational complexity and the performances.
The comparison on computational complexity is also listed in Table 1.
From an objective point of view about the performance, we include some criteria in the following items:
Item 1. No verification table needed: At the remote server, a password-verification table is not needed to authenticate the users.
Item 2. Using unique transformed identity: Describe whether a user can choose his identity according to his preference and prevent it from duplication.
Item 3. Choosing a password freely: Display whether a scheme allows a user to choose his password freely or not.
Item 4. Mutual authentication: Demonstrate whether a legitimate user and the remote server can mutually authenticate each other or not.
Item 5. Password update conveniently: Discuss whether a user can conveniently update his password at the user's terminal or not.
Item 6. Session key agreement: Show whether a scheme can achieve the session key agreement or not.
Item 7. Avoiding time synchronization problem: Exhibit whether a scheme can avoid the time synchronization problem or not.
The result of the comparisons on the performances is listed in Table 2.
6 The conclusions
We have proposed an exquisite mutual authentication scheme without verification table of passwords and other users' personal information. The proposed scheme includes session key agreement and convenient password update. Our scheme uses the registration time to create the unique transformed identity in order to discriminate a user from the others efficiently, even if they may choose the same value for their identities. Through the storage of important information in the smart card, the proposed scheme can generate necessary parameters without exposing the password in transmission. Our scheme can withstand the replay attacks and resist the man-in-the-middle attacks. Moreover, the security of our scheme relies on the intractability of discrete logarithm because the Diffie-Hellman scheme is used.
References
[1]	S.M. Bellovin, M. Merritt, (1993) Augmented encrypted
key exchange: A password-based protocols secure against dictionary attacks and password file compromise, Proceedings of First ACM Conference on Computer & Communications Security, pp.244-250.
[2]	Y. Ding, P. Horster, (1995) Undetectable on-line pass-
word guessing attacks, ACM Operating Syst. Rev., pp.77-86.
[3]	DV. Klein, (1990) Foiling the cracker: a survey of, and
improvements to password security, Proceedings of the second USENIX UNIX security workshop, pp.514.
[4]	R. Morris, K. Thompson, (1979) Password security: a
case history. Communications of the ACM, 22(11), pp.594-597.
[5]	V. Goyal, V. Kumar, M. Singh, A. Abraham, and S.
Sanyal, (2006) A new protocol to counter online dictionary attacks, Computers & Security, 25, pp.114120.
[6]	H.Y. Chien, J.K. Jan, (2003) Robust and simple authen-
tication protocol, Computer Journal, 46, pp.193-201.
[7]	C.L. Lin, H.M. Sun, T. Hwang, (2001) Attacks and
solutions on strong-password authentication, lEICE Trans. Commun. E84-B, No. 9, pp.2622-2627.
[8]	S. Halevi, H. Krawczyk, (1998) Public-key cryptogra-
phy and password protocols, Proceedings of the 5th ACM Conference on Computer and Communications Security, San Francisco, CA, pp.122-131.
[9]	C.C. Chang, W.Y. Liao, (1994) A Remote Password Au-
thentication Scheme Based upon ElGamal's Signature Scheme, Computer & Security, Vol. 13, pp.137-144.
[10]	C.C. Chang, L.H. Wu, (1990) A Password Authentication Scheme Based upon Rabin's Public-Key Cryp-tosystem, Proceedings of International Conference on Systems Management '90, Hong Kong, pp.425-429.
[11]	M.S. Hwang, L.H. Li, (2000) A new remote user authentication scheme using smart card, IEEE Transactions on Consumer Electronics, 46(1), pp.28-30.
[12]	W. S. Juang, (2004) Efficient password authenticated key agreement using smart card, Computer & Security, 23,pp.167-173.
[13]	Y.C. Chen, L.Y. Yeh, (2005) An efficient nonce-based authentication scheme with key agreement, Applied Mathematics and Computation, 169, pp.982-994.
[14]	M.S. Hwang, C.C. Lee, Y.L. Tang, (2002) A simple remote user authentication scheme, Mathematical and Computer Modelling,36, pp.103-107.
[15]	W. Diffie, M. Hellman, (1976) New directions in cryp-
tography, IEEE Trans. Inform. Theory, 22, pp.476492.
[16]	C.T. Wang, C.C. Chang, C.H. Lin, (2004) Using IC Cards to Remotely Login Passwords without Verification Tables, Proceedings of the 18th International Conference on Advanced Information Networking and Application(AINA), Fukoka, Japan, pp.321-326.
Phase	Registration	Login and	Key	Password
		Authentication	Agreement	Update
	1Co	3Co		
Our scheme	1Ha 20 1ME	2Ha 90 1ME	Yes	Yes
	2Co	5Co		
Chien et al [6]	3Ha 10	17Ha 100	No	No
	1ME	2Ha		
Hwang and Li [11]		20 5ME 2MM	No	No
	1Ha	1Co		
	10	4Ha		
Juang's [12]		10 3En 3De	Yes	No
Hwang et al [14]	2Ha 20	1Ha 20	No	Yes
Co: concatenation; Ha: one-way hash function; ®: exclusive-or; ME: modular exponentiation; MM: modular multiplication; En: encryption; De: decryption
Table 1: Comparison on Computational Complexity
	No veri-	Using	Choosing	Mutual	PW	Session	Avoiding
Criterion item	fication	transformed	PW	authenti-	update	key	synchro-
	table	ID	freely	cation		agreement	nization
Our scheme	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Chien et al's [6]	No	No	Yes	Yes	No	No	Yes
Hwang and Li's [11]	Yes	No	No	No	No	No	No
Juang's [12]	Yes	No	Yes	Yes	No	Yes	Yes
Hwang et al's [14]	Yes	No	Yes	No	Yes	No	No
Table 2: The result of the comparisons on performancesamong schemes
Routing Scalability in Multicore-Based Ad Hoc Networks
Ami Marowka
12 Anna Frank, Ramat-Gan,
52526, Israel
E-mail: amimar2@yahoo.com
Keywords: ad hoc networks, routing speedup, multicore, power management Received: December 4, 2007
The integrati on of multicore processors into wireless mobile devices is creating new opportunities to enhance the speed and scalability of message routing in ad hoc networks. In this paper we study the impact of multicore technology on routing speed and node efficiency, and draw conclusions regarding the measures that should be taken to conserve energy and prolong the lifetime of a network.
We formally define three metrics and use them for performance evaluation: Time-to-Destination (T2D), Average Routing Speedup (ARS), and Average-Node-Efficiency (ANE). The T2D metric is the time a message takes to travel to its destination in a loaded traffic network. ARS measures the average routing speed gained by a multicore-based network over a single-core based network, and ANE measures the average efficiency of a node, or the number of active cores.
These benchmarks show that routing speedup in networks with multicore nodes increases linearly with the number of cores and significantly decrease traffic bottlenecks, while allowing more routings to be executed simultaneously. The average node efficiency, however, decreases linearly with the number of cores per node. Power-aware protocols and energy management techniques should therefore be developed to turn off the unused cores.
Povzetek: Narejena je analiza povezljivosti vozlov omrežja.
1 Introduction
The recent emergence of affordable dual-core processors in consumer products will overturn many currently accepted standards for software applications(5). The transition from single to multicore CPUs calls for the parallelization of all applications. Developers will be faced with the challenge of designing single-threaded applications that run efficiently on multiple cores.
Dual-core processors are only the beginning. Chip makers are currently working on the next generation of mul-ticore processors, which will contain 4, 8 or 16 cores on a single die. According to the roadmap introduced by Intel(15), dual core processors are slated to reach mobile devices as well.
The integration of multicore processors into wireless communication and mobile computation devices will in general strengthen the communication infrastructure. Ad hoc wireless networks of mobile devices will also become more robust. An ad hoc network is a self-organized mobile network, whose every node is responsible for both computation and communication operations (1; 11).
In this paper we address the problem of how to measure the gain in routing speedup in ad hoc wireless networks where nodes are equipped with multicore processors, in particular when the network is heavily loaded. Moreover, we analyze the efficiency of multicore nodes in the network and show that the energy consumption of multicore nodes
could be dramatically reduced by adapting existing methods and techniques.
We use location-based routing protocols in our traffic simulations(12). Location-based protocols are usually compared and analyzed by means of the "hops count" metric; i.e., the best single-path routing protocol is the one that finds a path from the origin node to destination node with the fewest number of hops. In real networks, however, many routing tasks are carried out at the same time. This is known in the literature as a multiple-sessions scenario, in which several routing sessions compete for the node's services. Each message wants to reach its destination node without waiting in a queue of routing sessions to be served. In such a scenario, we need a metric to quantify the arrival time of messages to their destinations.
This work therefore starts by defining a routing metric, called the Time to Destination (T2D). This metric will quantify the time required for a message to travel along the best path determined by the routing protocol. Then, two other metrics are defined based on the T2D: the Average Routing Speedup (ARS) and Average Node Efficiency (ANE). The first quantifies the average improvement in message travel time for a multicore network compared to a single-core network. The second measures the average node efficiency in a multicore network, compared to the efficiency of a network with single-core nodes.
These metrics serve to analyze the behavior of routings in our simulations of heavily loaded networks. We de-
rive conclusions regarding the benefits of message routing in multicore networks, and show how power-aware management protocols can reduce the consumption of energy. Moreover, we analyze the effect of mobility on the delivery success rate in multicore ad hoc networks and show that multicore nodes can improve the dependability of mobile networks. To the best of our knowledge, this is the first work in the open literature discussing routing protocols in ad hoc networks of multicore devices.
The design principles behind the speedup and the efficiency metrics present in this work were inspired from the parallel computing literature. However, our evaluation methods using scalable networks for measuring the real scalability of routing protocols are novel. In order to explore how protocols scale as the number of the nodes increases and as the average-node-degree increases, our simulator has the capability to generate extendable networks. In the open literature, the scalability is evaluated by measuring the performance of different number nodes such as 10, 20, 30,... where, for example, the 10-nodes network has different topology from the 20-nodes network. Our unique method for generating planar random networks enables us to generate 20-nodes network which is extension of the 10-nodes network and therefore to measure and to evaluate real scalability. In Section 4 we describe in detail this method, that to the best of our knowledge, we are the first to use. A work-in-progress versions of this paper was presented in (9; 10).
The remainder of the paper is organized as follows. Section 2 describes state-of-the-art works in the area of MIMO ad hoc networks. Section 3 defines the network model. In section 4, the metrics described above are defined. Section 5 details the simulations and analyzes the resulting data. Section 6 summarizes our results.
niques thereby achieving spectral efficiency and increased throughput. A MIMO-OFDM system transmits independent OFDM modulated data from multiple antennas simultaneously. At the receiver, after OFDM demodulation, MIMO decoding on each of the sub-channels extracts the data from all the transmit antennas on all the sub-channels. The IEEE 802.16e standard incorporates MIMO-OFDMA.
The MIMO capability of antenna arrays has been studied at the physical layer and over a single link. There are few research directions that have studied MIMO in a multi-hop network from the perspective of higher layers. The research in (20) proposed a scheduling algorithm to offer fair medium access in a network where nodes are equipped with MIMO antennas. The model under study provides a simple abstraction of the physical layer properties of MIMO antennas. At the routing layer, a routing scheme to exploit MIMO gains is proposed (21). The idea is to adaptively switch the transmission/reception strategy using MIMO so that the aggregate throughput at the routing layer is increased. At each hop along a route this decision is made dynamically based on network conditions such as node density and traffic load.
At the transport layer, TCP performance over MIMO communications was studied (22). Focusing on the two architectures previously proposed to exploit spatial multiplexing and diversity gains (namely BLAST and STBC), the authors studied how the ARQ and packet combining techniques impact on the overall TCP performance. Their results indicate that, from the standpoint of TCP performance, the enhanced reliability offered by the diversity gain is preferable to the higher capacities offered by spatial multiplexing.
3 A Multicore Network Model
2 State-of-the-Art of MIMO MANET
Multiple-input Multiple-output (MIMO) wireless communication systems are the most promising multiple antenna technology today (16; 17). The advantages of MIMO communication, which exploits the physical channel between many transmit and receive antennas, are currently receiving significant attention (18). The integration of an air interface technology, such as MIMO, with a modulation scheme called orthogonal frequency division multiplexing (OFDM) (19) has the potential to lay the foundation for the data rate and capacity gains that will be needed for years to come. Since multiple data streams are transmitted in parallel from different antennas there is a linear increase in throughput with every pair of antennas added to the system. MIMO systems do not increase throughput simply by increasing bandwidth. They exploit the spatial dimension by increasing the number of unique spatial paths between the transmitter and receiver.
MIMO-OFDM combines OFDM and MIMO tech-
A highly realistic network model would take into account many complexities, such as the control traffic overhead, traffic congestion, mobility of the nodes, the irregular shape of radio coverage areas, and the intermittence of communication due to weather conditions and interference from preexisting infrastructure (power lines, base stations, etc.). Including all these details in the network model, however, would make it extremely complicated and scenario-dependent. This would hamper the derivation of meaningful and sufficiently general analytical results. It was shown that a simple parameterized model can accurately reflect the simulations (14). The model defined here, which is used in our simulations, therefore makes some widely accepted simplifying assumptions.
We formally define a multicore network model as follows:
Definition 1: A multicore network model is an undirected graph G = (V, E, D, M), where:
1.	V is the set of nodes;
2.	E C VxV is the set of undirected edges
i.e., (u, v) e E if u is able to transmit to v;
3.	D is the average node degree;
4.	M is the number of cores in a node;
5.	Pi,j is the link probability of retransmission;
6.	For v G V, the node mobility is determined every
pause time pau(v) by its velocity vel(v) towards
destination des(v).
A realistic multicore model must reflects the potential cost of retransmissions required to recover from link errors. We use a linear transmission cost function: Ti,j = ^-where a link is assumed to exist between node pair {i,j) as long as node j lies within the transmission range of node i, A is the transmission rate, L is the message size and Pi,j is the link packet error probability associated with that link. This cost function captures the cumulative delay expended in reliable data transfer, for both reliable and unreliable link layers.
Node mobility is based on the random waypoint parameterized model (2): a node chooses a destination uniformly at random in the simulated region, chooses a velocity uniformly at random, and then moves to that destination at the chosen velocity. Upon arriving at the chosen waypoint, the node pauses for a period before repeating the same process. In this model, the pause time represents the degree of mobility in a simulation; a longer pause time amounts to more nodes being stationary for more of the simulation.
The nodes communicate using omnidirectional antennas with maximum range r. We assume that all the nodes are equipped with identical transceivers, and further that each M — core node is equipped with M transceivers and M antennas. Each core has multiple wireless channels and identical hardware and software mechanisms. A multicore node can hence transmit multiple packets at the same time to different nodes. The network is assumed to be homogeneous, consisting of nodes with the same transmission range, number of cores, and battery power. Imposing a common transmission range induces a strongly connected communication graph, often called a unit disk graph in the literature of routing protocols.
We assume that the traffic in the network is highly loaded, and that routing sessions are issued in a random manner. In other words, the origin and destination nodes of each routing are chosen randomly and no a priori knowledge is available regarding future sessions. Each node maintains a queue, so that incoming routing sessions are served on a FIFO basis. For simplicity it is assumed that the queue is large enough to store all arriving messages, so no messages are lost due to overflow. If the network consists of multicore nodes with M cores, then each node can serve M routing requests simultaneously.
The model assumes that different networks may have different ratios of computation time to communication time, and that communication and computation overlap. The communication time is the amount of time it takes for a packet transmitted by one node to be received by the next node on the path. The computation time is the amount of time it takes for a packet to be processed, from the time it is received by a node to the time it is transmitted to the
next node. This interval includes computationally intensive functions such as signal processing, encoding and decoding, and encrypting and decrypting. It also includes relatively lightweight functions such as next-hop routing decisions and channel access delay.
Routing requests in our model are managed by localized, location-based protocols (12). In such protocols the location of the destination node is known, and the distance to neighboring nodes can be estimated on the basis of incoming signal strengths. The routes between nodes are created through a series of localized hop decisions; each node decides which neighbor will receive the message based on its own location, its neighboring nodes, and the message's destination. In our simulations we use the Most Forward within Radius (MFR) protocol (13), which forwards the message to the neighbor that maximizes its progress.
4 Routing Speedup and Efficiency
In measuring the scalability of parallel applications, the most commonly used practical metrics are relative speedup and relative efficiency (7; 8). Relative Speedup is the ratio of a parallel application's run time on a single processor to its run time on N processors; it is important to emphasize that the application and its test problem are identical in both situations. Relative Efficiency is the relative speedup divided by the number of processors N. Researchers use the speedup metric to check the performance of their applications on multiple platforms; it is a natural choice, since it is dimensionless and captures the relative benefit of solving a problem in parallel.
Motivated by these definitions, we define similar metrics for measuring the scalability of routing in heavily loaded, multi-session ad hoc networks. First, we define a time-based routing metric called the Time-to-Destination (T2D). Second, we define the Routing-Speedup (RS) and the Average-Routing-Speedup (ARS) metrics. Finally, we define the Node-Efficiency (NE) and Average-Node-Efficiency (ANE) metrics.
The metrics used in simulations of wireless ad hoc networks usually reflect the goal of the network design protocol. Most routing schemes thus use hop count as their metric, where hop count is the number of transmissions along a given route from source to destination. This choice agrees with the assumption that delay is proportional to hop count, which is reasonable when the impact of congestion is not significant. However, this assumption is not warranted for realistic ad hoc network scenarios in which routing sessions are issued from many and various origin nodes simultaneously and to random destinations. We therefore need a metric that reflects the delay caused by traffic congestion in a wireless network.
We formally define the Time-to-Destination metric as follows:
Definition 2: Time-to-Destination (T2D).
Assume a unit disk multicore graph G, and a route
R = vs,..., v d from source node v s to destination node vd. The Time-to-Destination (T2D) is the aggregate time taken to route a message from the origin node to the destination node.
The T2D metric is also known in the open literature as the end-to-end latency. We assume that different networks have different ratios of computation time to communication time, and that communication and computation overlap. Thus, in the case of a routing with delay times ds,...,dd at each hop the total time it takes for a message to traverse the route is given by
T 2D = L * (Tcomm + Tcomp) ^ J2 i=s di,
Where Tcomm and Tcomp are the total communication time and total computation time respectively. L is the length of the route, the number of hops from the source node to the destination node.
Definition 3: Routing Speedup (RS).
Given a unit disk multicore graph G and a route R = vs,..., vd from source node vs to destination node vd, the Routing Speedup (RS) of R is the ratio of message routing time for a network with single-core nodes to the message routing time for a network with M-core nodes:
RS = T 2Di
RS T2Dm '
Where the subscripts 1 and m indicate the single-core and M-core networks respectively. Since it is more practical to measure the average speedup in a scenario with multiple sessions, we also define the Average Routing Speedup as follows:
Definition 4: Average Routing Speedup (ARS).
The Average Routing Speedup (ARS) is the average speedup over R routings in G: ARS = r=i RSi.
The ARS metric is a practical tool for evaluating the speed gained by moving from an ad hoc network with single-core nodes to one with multicore nodes. ARS does not, however, measure the efficiency of a multicore node. (By node efficiency, we mean the average number of cores that are busy per node.) As will be shown in the next section, knowing the node efficiency permits a dramatic reduction of power consumption in each node and thus increases the lifetime of the whole network.
Definition 5: Node Efficiency (NE).
Assume a unit disk multicore graph G, and R routings executed in G over a period At. The Node Efficiency (NE) of a given node during At is the ratio of T1omp, its aggregate computation time in the case of a single-core network, to M times Tmmmp, its aggregate computation time in the
case of an M-core network.
Ti
NE = Tcomp
M*Tm0,mp '
For the same practical reasons mentioned with respect to the RS and ARS metrics, we define the Average Node Efficiency as follows:
Definition 6: Average Node Efficiency (ANE).
The Average Node Efficiency (ANE) in G of \V| = n nodes during time At is: ANE = n ^n=1 NEi.
It is important to notice that NE and ANE do not take into account idle times, only those times when the nodes
Figure 1: Illustration of Scalable Planar Network of 27 nodes with c=3.
are involved in the routing process. The aim of these metrics is to measure the efficiency of the cores within a node, rather than a given node's dominance over other nodes in the routing process. Although this information is important for intelligent power management and energy conservation, the efficiency metrics we define here focus on what happens inside a multicore node.
In the next section we use these metrics to evaluate and analyze the implications of multicore nodes on position-based routing protocols and power management strategies in ad hoc networks.
5 Simulator and Simulations
A discrete event simulator was developed in order to monitor, observe, and measure ARS and ANE in multicore ad hoc networks. We generated a database with hundreds of random unit graphs, with values of V and D spanning a wide range of sparse and dense networks. The results shown in this paper are only a representative sample of the many simulations performed, and each result is averaged over many runs.
The network generator first partitions the plane into k regions: an innermost disk whose radius is equal to the transmission range r, and a series of concentric annuli of width r surrounding the disk. The number of nodes in each annulus is proportional to its area. If c nodes are randomly located in the inner disk of a network with k regions, then a total of c * k2 nodes will be randomly placed in the entire network. For example, if the innermost region of a network has 3 nodes (c=3) then 9, 15, and 21 nodes will be located in the
Figure 2: A randomly generated 4-regions network of 48 nodes and average node degree of 4.
first three rings respectively. Figure 1 illustrates a network of 27 nodes for c=3. The networks are thus generated incrementally, ring after ring. A network with k regions is just an extension of the network with (k - 1) regions. In this way we can calculate the scalability of our protocols exactly.
For small networks (up to 50) we choose the value of c to be 3 and for large networks the value was 4. The networks were extension of a base network of 27 nodes (in case of c=3) or 36 nodes (in case of c=4). The average-node-degree of the base network was preserved in the extended networks. Usually, the topology of randomly generated scalable networks are not symmetrical like the illustration shown in Figure 1. An example of a randomly generated 4-region network of 48 nodes and average-node-degree of 4 is shown in Figure 2.
All the simulations shown in this paper were carried out on a network of 108 nodes, with an average node degree of 7. Measurements were performed for three traffic loads: 100, 1K and 10K routings. The source and destination nodes of each route were randomly chosen, and all routings were issued simultaneously. The scalability of routing was
tested assuming values of 2, 4, 8, and 16 cores per node. Each simulation shown in this section (except Fig. 5) was performed assuming that the ratio of computation time to communication time is 1. The routing protocol used in our benchmarks is the Forward within Radius (MFR) protocol.
For the simulations of our multicore network model we used parameter values that are the average values of the IEEE 802.11-based interfaces as appeared in (4) and summarized in table 1. Each core is assumed to be able to transmit and receive 2Mbps. The packet size was of 64KB and the link probability of retransmission was set to 0.25.
For the network mobility it was assumed that all nodes move according to the random waypoint model. First, a node chooses a destination uniformly at random in the simulated region. The area A(N, R) of a simulated region is determined relative to the number of nodes in the network (N = c * k2 ) and the data transmission range R. Thus, A(N, R) = (k * R)2 * n = (N/c) * R2 * n. For example, a scalable network of 48 nodes with c =3 has 4 rings and thus the area of the network region is (4R)2n. In the simulations we used data transmission range of 250m. Next, the node chooses a velocity uniformly at random, with a maximum velocity of 10 m/s, and moves to that destination at the chosen velocity. Upon arriving at the chosen waypoint, the node pauses for a period before repeating the same process. We simulate pause times in the range of 0-10 seconds.
Table 1: Parameter values used in the simulations.
Communication	
Packet size	64KB
Retransmission probability	0.25
Mobility	
Simulated area	(N/c) * R2 * n
Transmission range	250m
Velocity	0-10m/s
Pause time	0-10s
Figures 3 and 4 depict the routing scalability that can be expected from a multicore-based network. Figure 3 plots ARS as a function of the number of routings for nodes with 2, 4, 8 and 16 cores. Figure 4 plots ARS as function of the number of cores per node for traffic loads of 100, 1K and 10K routings. We choose to present the same information in two different ways to make all the relationships clearly visible.
Analysis of these results leads to the following findings. First, ARS increases as the number of the routings increases; this is obvious from Figure 3. For 10K routings the ARS increases linearly with the number of cores, up to values of 1.96, 3.82,7.26 and 13.27 for 2,4, 8 and 16 cores respectively. For 100 routings, however, the ARS reaches its maximum value of 1.35 at only 4 cores (Figure 4). Adding more cores does not increase the ARS unless there is more traffic to handle. These encouraging results show that mul-
Figure 3: Graphs of Average-Routing-Speedup as function of number of routings.
ticore nodes decrease the congestion of loaded networks, increase the routing speedup, and decrease the travel time of messages.
Figure 5 plots the ARS as function of the number of cores per node and the ratio of computation time to communication time. The goal of this paper is to study the impact of various multicore ad hoc networks on routing scalability, so we varied the ratio of computation time to communication time in the range 0.5 to 3. Delays in computation and communication can be incurred from many sources: next-hop routing decisions, channel access delays, transmission delays, traffic load, the sizes of contending packets, the medium access control algorithm used by the nodes, the modulation and symbol rate of the packets, and finally the distance that the packets must travel. Moreover, wireless communication usually requires several network-dependent, computationally-intensive functions such as signal processing, encoding and decoding, and encrypting and decrypting. Figure 5 shows that as the ratio of computation time to communication time increases, the degree of speedup for large core numbers improves. In the case of 16 cores the ARS increases by 34% as the computation time to communication time ratio is varied in the range 0.5 to 3. In the case of 8 cores the improvement is only by 8% and for 2 and 4 cores there is no improvement at all. However, in the case of 2, 4 and 8 cores the ARS is high. This phenomenon matches also the results shown in Figure 8. The network reaches its load balancing equilibrium point when the number of cores reaches 16. At this
Figure 4: Graphs of Average-Routing-Speedup as function of number of cores.
point each node has enough available cores to amortize the increase in the ratio of computation time to communication time.
Figures 6, 7 and 8 describe the node efficiency of multicore-based networks. Figure 6 graphs ANE as a function of core number for the cases of 100, 1K and 10K routings. Figure 6 graphs ANE as a function of the number of routings for the cases of 2, 4, 8, and 16 cores. Once again, we choose to present the same information in two different ways to clarify the relations. Figure 8 is a histogram showing the distribution of routing loads over all nodes for the case of 10K simultaneous, random routings.
Analysis of these results reveals the following relationships. The ANE asymptotically increases with the number of routings (Figure 6). For example, in the case of 8-core network ANE approaches the values 0.18, 0.54 and 0.70 for 100, 1K, and 10K routings respectively. However, the ANE decreases dramatically as the number of cores increases (Figure 7). For example, in the case of 10K routings ANE approaches the values 0.88, 0.78, 0.70 and 0.63 for 2, 4, 8 and 16 cores respectively. In other words, as the number of cores increases more cores will remain idle.
Figure 8 presents another view of this phenomenon. This histogram shows how the routing load is distributed among the nodes. For example, in the case of a single-core network most nodes (80 of 108) were not busy at least 40% of the time (marked low in the histogram). 18 nodes were busy between 40% to 80% of the time (marked moderate), and 10 nodes were busy more then 80% of time (marked high). These 10 nodes are the dominant set of the network, through which most of the traffic passes. However, as the number of cores increases more nodes become busy more
16
□	2-cores ■ 4-cores
□	8-cores
□	16-cores
"<D 8 (0
-^4-cores — 8-cores 16-cores
100 1000 10000 Number of Routings
0.5 1	2	3
Computation/Communication
Figure 5: Graphs of Average-Routing-Speedup as function of number of cores and computation to communication ratio.
of the time. This indicates improved load balancing and decreased traffic congestion. For example, in the case of 16 cores only 42 nodes are idle most of the time, 28 nodes are busy about half of the time, and 38 are busy most of the time.
These observations have important implications for the power management strategies that should be taken to reduce energy consumption in multicore-based networks. Many design techniques for reducing the power consumption of mobile devices have been introduced over the last decade (4; 6; 3). Among these are technologies which enable the processor to operate at multiple voltages and frequencies depending on the workload required by the user. Thus, when workload drops the processor steps down to a lower voltage and frequency, conserving battery power. Processors including a power-optimized system bus with a low-power L2 cache, which turn off parts of the high-speed memory when it isn't needed, also result in an overall reduction of the power consumption. A minimization of capacitance can also be achieved by relying on chip-based resources such as caches and registers.
Our measurements of Average Node Efficiency show that it is essential to develop power-aware protocols and
Figure 6: Graphs of Average-Node-Efficiency as function of number of routings.
complementary hardware capabilities that will permit a multicore node to dynamically adjust the number of active cores. As can be seen from Figure 6, a 16-core network under traffic loads of 10K and 1K routings has ANE values of 0.6 and 0.4 respectively. In other words, energy savings of up to 40% to 60% could be achieved by allowing individual cores to be turned off or put in a sleep mode.
Figure 9 and Figure 10 depict the effect of the network topology characteristics on the routing scalability. Figure 9 plots the speedup as function of number of nodes, 16 cores each, for 100, 1K and 10K routings. It can be observed from the graphs that for high traffic load (10K routings) the speedup decreases from 15.4 to 13.27 when the number of nodes increases from 27 to 108 respectively. Since low traffic load leads to low speedup, as shown in Figure 3, increasing the number of nodes, for the same global traffic load, decreases the local traffic load in each node and thus decreases the speedup. For low traffic loads (100 and 1K routings) the impact of the number of nodes on the speedup is marginal. Figure 10 plots the speedup as function of average-node-degree for network of 108 nodes, 16 cores each, and for 100, 1K and 10K routings. Since low average-node-degree increases the traffic load in the nodes, it is expected that the speedup will increase as well. Figure 10 shows that for high traffic load (10K routings) the speedup increases from 12.2 to 14.8 when the average-node-degree decreases from 9 to 4 respectively as expected. For traffic load of 1K routings the speedup increases slightly and for 100 routings the speedup remains 1.35 for 4, 7 and 9 degrees.
Figure 11 depicts the effect of mobility on the delivery
L =
2	4	8	16
Number of Cores
Figure 7: Graphs of Average-Node-Efficiency as function of number of cores.
success rate in multicore ad hoc networks. Figure 11 plots the routing delivery success rate as a function of pause time for 1, 2, 4, 8 and 16 cores per node. The delivery success rate decreases as the number of cores per node decreases since less cores means higher traffic load. High traffic load increases the time-to-destination a message travels and thus increases the possibility that the message will not reach its destination due to nodes mobility. For example, for the case of pause time of 0 (continuous mobility) the delivery success rate increases from 0.75 to 0.85 when the number of cores per node increases from 1 to 16 respectively. Moreover, increase in the number of cores per node may achieves the desired delivery success rate although the mobility rate increases. For example, for the case of pause time of 10 seconds and single core nodes, the success delivery rate is 0.85. Increasing the mobility rate, by decreasing the pause time to 5 seconds, and using 8 cores per node, instead of one, yield success delivery rate of 0.89. The conclusion arises from these results is that multicore nodes decrease the the number of messages that do not reach their destination due to mobility and thus increase the dependability of ad hoc networks.
6 Conclusions
We have studied the effects of introducing multicore nodes on the routing performance of wireless ad hoc networks. First, we formally defined a multicore-based network model and three metrics as evaluation tools: Time-to-Destination, Routing Speedup, and Node Efficiency. Next we evaluated the routing speedup over a wide range of net-
Figure 8: Histogram of routings load distribution. The nodes are grouped into three load levels: low (nodes that were busy between 0% and 40% of the time), moderate (40%-80%) and high (80%-100%).
work configurations through intensive routing simulations.
We discovered that a network with more cores in each node has a larger routing speedup and handles more routings efficiently. Multicore networks decreased traffic congestion, balanced the load among the nodes and improve the dependability of mobile networks. However, there are side effects that must still be resolved. Adding more cores also decreases node efficiency; not all the node's cores are used all the time. This phenomenon calls for the development of better power-saving protocols and energy management strategies in order to conserve battery power.
References
[1]	S. Basgani, M. Conti, S. Giordano, and I. Stojmeno-viv, Mobile ad hoc networking, IEEE Press, 2004.
[2]	J. Broch, D. Maltz, D. Johnson, Y. Hu, and J. Jetcheva, A performance comparison of multi-hop wireless ad hoc network routing rotocols, ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom '98), August 1998.
[3]	R. Chary and J. Gilbert, Intel pca: Advanced power management for the mobile platform, Intel Developer UPDATE Magazine, August 2002.
16
14
12
10
a
is 8
m
□ 100 ■ 1000 □ 10000
27 48 75 108 Number of Nodes
0.95
■n
(U
is 0.9
>
<u
a
.!= 0.85 13 O OČ
C <3
0.8
0.75
0.7
0	5	10
Pause Time (s)
Figure 9: Graphs of Average-Routing-Speedup as function of number of nodes for 16-cores nodes and for 100, 1K and 10K routings requests.
Figure 11: Routing delivery success rate as a function of pause time for 1, 2, 4, 8 and 16 cores.
16
14
12
10
a
is 8

□ 100 ■ 1000 □ 10000
4	7	9
Average Node Degree
Figure 10: Graphs of Average-Routing-Speedup as function of average-node-degree for network of 108(16) nodes(cores) and for 100, 1K and 10K routings requests.
[4]	L. M. Feeney, Energy-efficient communication in ad-hoc wireless networks, In Mobile Ad Hoc Networking Editors S. Basagni, M. Conti, S. Giordano and I. Stojmenovic, pages 301-327, IEEE-Wiley, 2004.
[5]	D. Geer, Chip makers turn to multicore processors, IEEE Computer, May 2005.
[6]	P. J. M. Havinga and G. J. M. Smit, Design techniques for low-power systems, Journal of Systems Architecture, 46(1):1-21,2000.
[7]	K. Hwang and Z. Xu, Scalability and speedup analysis, Scalable Parallel Computing, McGraw Hill, pages 134-148, 1998.
[8]	V. Kumar, A. Grama, A. Gupta, and G. Karipis, Performance and scalability of parallel systems, Introduction to Parallel Computing, Benjamin/Cummings, pages 117-146, 2004.
[9]	A. Marowka, Routing speedup and node efficiency in multicore ad hoc networks, 4th International Conference on Performance Modeling and Evaluation of Heterogeneous Networks (HET-NETs'06), pages P02/1-P02/11, September 2006.
[10] A. Marowka, Routing speedup in multicore-based ad hoc networks, 6th International Symposium on Parallel and Distributed Computing (ISPDC'07), July 2007.
1
6
4
2
0
6
4
2
0
[11]	C. Siva, R. Murthy, and B. S. Manoj, Ad hoc networks, architecture and protocols, Prentice Hall,
2004.
[12]	I. Stojmenovic, Position-based routing in ad hoc networks, IEEE Communications Magazine, July 2002.
[13]	H. Takagi and L. Kleinrock, Optimal transmission ranges for randomly distributed packet radio terminals, IEEE Transactions on Communications, 32(3):246-257, 1984.
[14]	M. Viennot, T. Clausen, and P. Jacquet, Analyzing control traffic overhead versus mobility and data traffic activity in mobile ad-hoc network protocols, ACM Wireless Networks journal, 10(4), July 2004.
[15]	U. Weiser, Future direction in microprocessors, In: Intel SSTechnology in MotionŠ conference, May 30
2005,	Ramat-Gan, Israel.
[16]	E. Biglieri, R. Calderbank, A. Constantinides ,A. Goldsmith, A. Paulraj, H. Poor, MIMO Wireless Communications, Cambridge University Press, 2007
[17]	C. Oestges, B. Clerckx, MIMO Wireless Communications : From Real-world Propagation to Space-time Code Design, Academic, 2007.
[18]	D. Gesbert, M. Kountouris, R. Heath, C. Chae, T. Salzer, Shifting the MIMO Paradigm: From Single User to Multiuser Communications, IEEE Signal Processing Magazine, vol. 24, no. 5, pp. 36-46, Oct., 2007.
[19]	A. Bahai, B. Saltzberg, M. Ergen, Multi Carrier Digital Communications: Theory and Applications of OFDM, Springer, 2004.
[20]	K. Sundaresan, R. Sivakumar, M. Ingram, T. Chang,
A fair medium access control protocol for ad-hoc networks with MIMO links, In IEEE Infocom, 2004.
[21]	K. Sundaresan and R. Sivakumar, Routing in ad-hoc networks with MIMO links, In IEEE ICNP, 2005.
[22]	A Toledo and X Wang, TCP performance over wireless MIMO channels with ARQ and packet combining, IEEE Transactions on Mobile Computing, 5(3):208-223, 2006.
Similarity Measures for Relational Databases
Melita Hajdinjak
University of Ljubljana, Faculty of Electrical Engineering, Tržaška 25, 1000 Ljubljana, Slovenia melita.hajdinjak@fe.uni-lj.siand http://matematika.fe.uni-lj.si/
Andrej Bauer
University of Ljubljana, Faculty of Mathematics and Physics, Jadranska 21, 1000 Ljubljana, Slovenia andrej.bauer@fmf.uni-lj.si and http://andrej.com/
Keywords: relational algebra, related answers, similarity search Received: November 5, 2008
We enrich sets with an integrated notion of similarity, measured in a (complete) lattice, special cases of which are reflexive sets and bounded metric spaces. Relati ons and basic relati onal operati ons of traditional relational algebra are interpreted in such richer structured environments. An canonical similarity measure between relations is introduced. In the special case of reflexive sets it is just the well known Egli-Milner ordering while in the case of bounded metric spaces it is the Hausdorff metric. Some examples of how to perform approximate searches (e.g., similarity search and relaxed answers) are given.
Povzetek: Z željo po iskanju bližnjih informacij in relaksiranih odgovorov množice obogatimo z merami podobnosti. Interpretiramo relacije in operacije relacijske algebre.
1 Introduction
The relational algebra (4; 15), a relational data model with five basic operations on relations, i.e., Cartesian product x, projection n, selection a, union U, and set difference -, and several additional operations such as 9-join or intersection, has three main advantages over non-relational data models (13):
-	From the point of view of usability, the model has a simple interpretation in terms of real-world concepts, i.e., the essential data structure of the model is a relation, which can be visualized in a tabular format.
-	From the point of view of applicability, the model is flexible and general, and can be easily adapted to many applications.
-	From the point of view of formalism, the model is elegant enough to support extensive research and analysis.
Hence, the relational data models have gained acceptance from a broad range of users, they have gained popularity and credibility in a variety of application areas, and they facilitate better theoretical research in many fundamental issues arising from database query languages and dependency theory.
However, there are several applications that have evolved beyond the capabilities of traditional relational data models, such as applications that require databases to cooperate with the user by suggesting answers which may be
helpful but were not explicitly asked for. The cooperative-behaviour or cooperative-answering techniques (5) may be differentiated into the following categories:
i.)	consideration of specific information about a user's state of mind,
ii.)	evaluation of presuppositions in a query,
iii.)	detection and correction of misconceptions in a query,
iv.)	formulation of intensional answers,
v.)	generalization of queries and of responses.
The cooperative behaviour plays an important part, for instance, in information-providing dialogue systems (7), where the most vital cooperative-answering technique leading to user satisfaction is generalization ofqueries and of responses as shown by Hajdinjak and Mihelic (8). Generalization of queries and of responses, the aim of which is to capture possibly relevant information, is often achieved by query relaxation (6).
Another kind of applications not suitable for the traditional relational data models are applications which require the database to be enhanced with a notion of similarity that allows one to perform approximate searches (9). The goal in these applications is often one of the following:
i.) Find objects whose feature values fall within a given range or where the distance from some query object falls into a certain range (range queries).
ii.)	Find objects whose features have values similar to those of a given query object or set of query objects (nearest neighbour queries and approximate nearest neighbour queries).
iii.)	Find pairs of objects from the same set or different sets which are sufficiently similar to each other (closest pairs queries).
Examples of such approximate-matching or similarity-search applications are databases storing images, fingerprints, audio clips or time sequences, text databases with typographical or spelling errors, text databases where we look for documents that are similar to a given document, and computational-biology applications where we want to find a DNA or a protein sequence in a database allowing some errors due to typical variations.
Persuaded that many applications will never reach the limitations of the widespread relational data model this article focuses on traditional relational algebra equipped with extra features that allow query relaxation and similarity searches. Although a large body of work has addressed how to extend the relational data model to incorporate cooperativity, neighbouring information, and/or or-derings (2; 3; 10; 11; 13), neither of them have succeeded to fit into the representational and operational uniformity of traditional relational algebra or even to reach a certain degree of generality.
Therefore, we are going to talk about domains, similarity, approximate answers, and nearness of data in a highly systematic and comprehensive way, which will lead us towards an usable, applicable, and a formaly strong generalization of the relational data model.
2 Sets with similarity
Most applications and proposed solutions of non-exact matches and similarity search, which are not covered by traditional relational algebra, have some common characteristics - there is a universe of objects and a non-negative distance or distance-like function defined among them. The distance function measures how close are the non-exact matches to the exact specifications that were given by the user willing to accept approximate answers.
Instead of restricting only to distance metrics, we consider more general similarity measures that satisfy the only condition of being reflexive, i.e., every object is most similar to itself. Hence, rather than focusing on (ordinary sets) or metric spaces, we will consider more general sets with similarity, where a measure of similarity assigns to a pair of objects a similarity value, which tells us how similar they are. Note, we speak of similarity instead of distance - if a point x moves toward a point y, the distance between x and y gets smaller, but their similarity gets larger.
For the domain of possible similarity values we choose complete lattices, i.e., partially ordered sets in which all subsets have both a least upper bound (join) and a greatest lower bound (meet).
Definition 1. A set with similarity is an ordered triple
A = (A, La, Pa),
where A is the underlying set, LA is a complete lattice with the least element 0A and the greatest element 1A, and
pA : A X A ^ La
is a measure of similarity in A satisfying the reflexivity condition
Pa(x,x) = 1a
for all x A.
In the trivial case, if we take the complete lattice L2 of boolean values {0,1} equipped with minimum and maximum as the operations meet and join, respectively, ordered with relation <, and define the similarity by
p(x,y) =
I 1, if x = y
10, if x = y,
the resulting set with similarity gains no additional structure. That is, it is equivalent to the underlying set.
There are many non-trivial similarities and, consequently, non-trivial sets with similarity, such as reflexive sets and bounded metric spaces.
Definition 2. A reflexive set is an ordered pair (A,<a), where A is the underlying set, and
<A : A X A ^ L2
is a reflexive relation in A. Habitually, instead of
<a(x, y) = 1 we write x <a y.
Since x<A x for all x e A, the reflexive relation <A can be understood as a special case of a measure of similarity, thus the reflexive set (A, <A) can be transformed to the set with similarity (A,L2 ,<A) and embedded into sets with similarity.
Definition 3. A bounded metric space is an ordered pair (A, dA), where A is the underlying set, and
dA : A X A ^ [0, to]
is a distance function, which satisfies the conditions of non-negativity, symmetry, and triangle inequality:
a.) d(x,y) > 0 and d(x, y) = 0
x = y,
b.)	d(x,y) = d(y,x),
c.)	d(x,y) < d(x,y) + d(y,z).
(non-negativity) (symmetry) (triangle inequality)
In an arbitrary metric space the distance is measured by values strictly smaller than to. By allowing to as a distance we have in effect restricted to bounded metric spaces. While the modest generalization of allowing to as a similarity value does not pose a serious restriction (databases are usually built from finite, and therefore bounded sets of data), it makes the set [0, to], when ordered by the usual > relation, a complete lattice	as required in sets with
similarity. Meet and join are computed as supremum and infimum, respectively. Note that we turned [0, to] upside down so that the least element is to and the greatest is 0.
Hence the metric d a is again a special case of a measure of similarity because dA{x,x) = 0, which is the greatest element of the complete lattice Ljo,^]. Thus the bounded metric space (A, d a) can be transformed to the set with similarity (A, L[0,^],dA) and embedded into sets with similarities.
3 Tables, relations, and basic relational operations
A relational database is composed of several relations in the form of two-dimensional tables of rows and columns containing related tuples. The rows (tuples) are called records and the columns (fields in the record) are called attributes. Each attribute has a data type that defines the set of possible values. Thus a relation is a subset of a Cartesian product of sets (value domains).
3.1 Cartesian products and subsets
In order to use sets with similarity instead of (ordinary) sets we need a suitable notion of relation between sets with similarity. Hence we first need to know how to interpret Cartesian products and subsets of sets with similarity in a natural and effective way.
Definition 4. The Cartesian product of sets with similarity
A = (A,La,pa) and B = (B,Lb, pb ) is the set with similarity
A X B = (A X B,La X Lb,Paxb),
where AxB is the Cartesian product of sets, LA xLB is the product of complete lattices, and the measure of similarity p AxB is given by
PAxB ((xi,yi), (x2,y2)) = (PA (xi,x2),PB (yi ,y2))-
The corresponding canonical projections are (n1,p1) : A X B ^ A and (n2,p2) : A x B ^ B, where n1 and n2 are projections of sets, but p1 and p2 are projections of complete lattices.
This interpretation of Cartesian products of sets with similarity is sound since a product of complete lattices is
a complete lattice (14) and paxb satisfies the condition of being a measure of similarity:
PAxB((x,y), (x,y)) = (PA(x,x),PB(y,y)) = (1a, 1b )
= 1AxB,
where 1A is the greatest element of LA, 1B is the greatest element of Lb, and 1axb is the greatest element of the complete lattice La x Lb .
Further, we have decided to consider only those sub-objects or substructures I of the set with similarity A = (A,LA,pA) whose similarity measure is induced by the structure of A. That is, the underlying set is a subset of A but the measure of similarity and the corresponding lattice are inherited from (A). Even though the domain of the measure of similarity has changed from A x A to I x I, we will keep the notation pA and write I = (I,LA, pA).
Definition 5. (i.e., the Egli-Milner ordering and the Hausdorff metric) A subset of the set with similarity A =
(A, La, pA) is a set with similarity I = (I,LA, pA), where I C A. Subsets of sets with similarity will also be called
induced subobjects.
3.2 Relations and basic relational operations
The family of subsets of A, denoted by IndSub(A), is essentially just the power set P (A).
Theorem 1. The induced subobjects of a set with similarity A = (A, La, pA) form a complete boolean algebra equivalent to P (A), in which all the basic relational operations can be properly interpreted.
The formal proof is given in (7). However, the Boolean lattice IndSub(A) is ordered with the usual subset relation, where the least element is the empty subobject 0 = (9, La, pA) and the greatest element is A. Hence selection, union, and difference are calculated as usual (A1, A2 C A):
ar (A1,LA,PA) = (^F (A1),LA,PA), (A1, LA, PA) U (A2,LA, PA) = (A1 U A2,LA, PA), (A1,LA, PA) - (A2, LA, PA) = (A1 - A2,LA, PA)■
Moreover, Cartesian products, projections, selections, unions, and differences of induced subobjects satisfy all the abstract properties that are axiomatized by relational calculus (15; 16).
A relation between two objects of the category of similarities, namely A = (A, LA, pA) and B = (B, LB, pB ), is now determined by a subset R C A x B, which induces a subobject (R, LA x LB ,pAxB ) of the Cartesian product A x B. Hence tables and answers to queries are modeled as induced subobjects.
4 Similarity of relations
Sets with similarity enjoy additional constructions, which do not exist at the level of underlying sets. For instance, a suitable notion of similarity < between induced subobjects can be defined.
In the case of the reflexive set (A, which is equipped with a reflexive relation <a establishing connections between certain elements of A, we propose to take the naturally integrated Egli-Milner ordering. Its importance in data models was also recognized by Buneman, Jung, and Ohori (1).
Definition 6. Let A^ = (Ai, and A2 = (A2, <a) be two induced subobjects of the reflexive set A = (A,<^a). The Egli-Milner ordering is given as follows:
A1 < A2
(ix e Ai 3y e A2 : x<a y) and (yy e A2 3x e Ai : x <a y)-
On the other hand, in the case of the bounded metric space (A, dA), which is equipped with a distance function dA, we propose to take the well-known Hausdorff metric. It has several applications, for instance, in fractal geometry, in numerical mathematics, and in pattern recognition.
Definition 7. Let Ai = (A1,dA) and A2 = (A2,dA) be two induced subobjects of the bounded metric space A = (A, dA). The Hausdorff metric is defined as follows:
d(Ai,A2) = max{sup^£Ai infyeA2{dA(x,y)}, suPyeA2inf{dA(x,y)}}-
Note, in the trivial example of ordinary sets (without similarity), it is straightforward that two induced subob-jects (ordinary subsets) of a given set can only be similar if they are equal, i.e., if they share all the elements.
The following theorem generalizes the above-defined, special notions of similarity between induced subobjects and proposes a similarity measure in IndSub(A).
Theorem 2. Let Al = (A1,LA,pA) and Al = (A2,LA,pA) be two induced subobjects of the set with similarity A = (A,LA,pA). The Egli-Milner ordering from reflexive sets and the Hausdorff metric from bounded metric spaces can be generalized to sets with similarity as follows:
p(Ai,Ai) =
= ^ V pA(x,y)) A ^ V pA(x,y)),
x^Ai y^A2	y^A2 x^Ai
where all the meets and joins are computed in the complete lattice LA.
The proof of this theorem and some highly-desirable properties of the generalized similarity measure p, such as
i.) the empty induced subobject is completely dissimilar to any other induced subobject and
ii.) every induced subobject is most similar to itself,
are given in (7). Note, if infinite sets are allowed, theorem 2 requires from the sets with similarity to be equipped with complete (!) lattices (see definition 1).
5 Approximate searches
As already explained, tables and answers to queries are modeled as induced subobjects. Each column is equipped with its own measure of similarity (integrated within sets with similarity), and from all these we build the measure of similarity for the whole table (see definition 4), which can be used to make comparisons between pairs of rows, find rows whose distance from some origin falls into a certain range, find nearest neighboring rows or closest pairs of rows. Hence we can perform all types of similarity search.
Moreover, the measure of similarity p (see theorem 2) between induced subobjects could serve to measure the nearness or exchangeability of the exact and the relaxed answer to a query, for comparing instances of a time-dependent table, or track changes made to a table. While in the special case of reflexive sets, the Egli-Milner relation tells us only when a table or an answer is interchangeable with another one, in the special case of bounded metric spaces, the Hausdorff metric allows a more fine-grained control of relaxation.
Example 1. Let tables 1 and 2 contain data about the users of an Internet forum at two consecutive days (day 1 and day 2).
NAME	NICK	CITY
Marko	obi	Maribor
Maja	maja	Ljubljana
Darko Koren	dare	Koper
Table 1: Table of users at day 1.		
NAME	NICK	CITY
Hujs Marko	marko	Pragersko
Maja	maja	Ljubljana
Darko Koren	dare	Koper
Meta Novak	metan	Maribor
Jernej	jernej	Kranj
Table 2: Table of users at day 2.
The relational schema of tables 1 and 2 is
[P : NAME,N : NICK,C : CITY], where the domains corresponding to the atributes P, N,
and C are the following sets with similarity:
NAME = (Strings, LStrings,PStrings),
NICK = (Strings, LStrings,PStrings),
CITY = (Cities, L[0^^],dcities)-
Here, Strings is the set of all strings of maximum length 30 and Cities is the set of all possible cities, towns, and villages in the world. The complete lattice LStrings is the linearly ordered set {0,1,..., 30} with the order relation > and is the complete lattice from definition 3. The measures of similarity are defined as follows:
•	The measure of similarity pStrings is the Damerau-Levenshtein distance (12), given as the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character or a transposition of two characters. Since the length of the strings is bounded by 30, the Damerau-Levenshtein distance is at least 0 (the greatest element in the lattice LStrings) and at most 30 (the least element in the lattice LStrings).
•	The measure of similarity dsiovenia calculates the similarity of two cities as their air distance given as the Euclidean distance (in kilometres) between the Gauss-Krüger coordinates of the city centers (we have used the tool from http://www2.arnes.si/).
The measure of similarity corresponding to the Cartesian product
USERS = NAME x NICK x CITY
of given sets with similarity is defined in accordance with definition 4:
PUsers((p1,n1,c1), (p2,n2,c2)) = = (pStrings(p1,p2), PStrings(n1, n2), dcities(c1, c2)),
where (p1,n1,c1) and (p1,n1,c1) are two rows of the given table instance, i.e., elements of the Cartesian product of sets:
Users = Strings x Strings x Cities.
For instance, the similarity between the first rows of the two tables 1 and 2 is equal to (5,5,18.3), but the similarity of the last two rows of table 2 is equal to (9, 5,189.9). Clearly, (9,5,189.9) > (5, 5,18.3), which means that the first pair of rows is more similar than the second one, i.e., the similarity value of the first pair is higher in the complete lattice
LUsers = LStrings x LStrings x Lti

Furthermore, the measure of similarity p between induced subobjects A and B of the Cartesian product USERS can be defined in accordance with theorem 2:
p(A,B) =
= ^ A V PUsers(x,y) ) A ^ \J pUsers(x,y) ),
xceA yeB
yeB xceA
where all the meets and joins are computed in the complete lattice LUsers. Hence the similarity between the given table instances is equal to (9,4,106.9). The similarity would certainly decrease if one of the tables would be increased in size by users living far from Slovenia and/or have or use much longer names or nicks. Clearly, if the similarity measures integrated within the sets with similarity were changed, the similarity value between the two table instances would also change and possibly have a different interpretation. Hence the usefullness of the calculated similarity values depends highly on the definitions of the basic similarity measures.
Example 2. Let table 3 contain a portion of public-transport bus routes in Ljubljana (Slovenia).
The relational schema of table 3 corresponding to relation BUSES is
[R : ROUTE, D : DEPATURE, DT : DTIME,
A : ARRIVAL, AT : ATIME],
where the domains corresponding to the atributes R, D, DT, A, and AT are the following sets with similarity:
ROUTE DEPARTURE DTIME ARRIVAL ATIME
(Buses, L2,aBuses),
(Stops, LStops,ffStops),
(Time, LTime, ffTime), (StoPs, LStops, ffStops),
(Time, LTime,ffTime).
than the similarity value of the second pair. Note, since LUsers is not linearly ordered, there are also uncomparable elements in the lattice.
Here, Buses and Stops are the sets of bus routes and bus stops in Ljubljana, respectively. Time is the set of all possible times of the form HH:MM, where HH denotes hours written as 00,01,..., 23 and MM denotes minutes written as 00,01,..., 59. The complete lattice Lstops is the linearly ordered set {0,1,..., M, to} of non-negative integers (smaller than the number of all bus stops M) and the infinity value to with the order relation >. The complete lattice LTime is the linearly ordered set Time with 23:59 being the least element and 00:00 being the greatest element. Moreover, lattice L2 is the lattice of boolean values from definition 2.
The measures of similarity are defined as follows:
•	The measure of similarity aBuses says 1 if the given bus routes are equal and 0 if they are not.
•	The measure of similarity astops calculates the similarity of the given bus stops as the minimum number of bus stops needed to pass by bus to come from the first bus stop to the second. If it is impossible to do this, the similarity value given is equal to to.
ROUTE	DEPARTURE	DTIME	ARRIVAL	ATIME	Exchangeability
9 (Step. naselje-Trnovo)	145 (Emona)	9:58	026 (Konzorcij)	10:25	(1,0,23:58,4,00:25)
5 (Step. naselje-Podutik)	145 (Emona)	10:10	025 (Hotel Lev)	10:26	(1,0,00:10,6,00:26)
13 (Sostro-Bežigrad)	145 (Emona)	10:11	059 (Bavarski dvor)	10:24	(1,0,00:11,5,00:24)
9 (Step. naselje-Trnovo)	145 (Emona)	10:14	026 (Konzorcij)	10:41	(1,0,00:14,4,00:41)
6 (CCrnuce-Dolgi most)	058 (Bavarski dvor)	10:29	034 (Hajdrihova)	10:32	(1,6,00:29,0,00:32)
1 (Vižmarje-Mestni log)	024 (Kolizej)	10:36	034 (Hajdrihova)	10:49	(1,7,00:36,0,00:49)
6 (CCrnuce-Dolgi most)	026 (Konzorcij)	10:41	034 (Hajdrihova)	10:46	(1,8,00:41,0,00:46)
1 (Vižmarje-Mestni log)	026 (Konzorcij)	10:44	034 (Hajdrihova)	10:49	(1,8,00:44,0,00:49)
6 ((Crnuce-Dolgi most)	026 (Konzorcij)	10:47	034 (Hajdrihova)	10:54	(1,8,00:47,0,00:54)
Table 3: Table of public-transport bus routes in Ljubljana. The last column contains data about the exchangeability of each row with an exact answer to the query given within example 2.
• The measure of similarity ^Time calculates the similarity of two time moments as their difference (second minus first) in form of HH:MM.
Now consider the query "It is 10 o'clock and I am at the Emona bus stop. Are there any buses to Hajdrihova? I would like to arrive as soon as possible.", written in the language of relational algebra (4):
^V=EmonaAVT=10:00A^=HajdrihovaA^T =10:00 (BUSES).
The last column of table 3 contains the calculated similarity or exchangeability values of the exact and the possibly relaxed answer (row). Notice that in table 3 there are no buses satisfying all the conditions given by the user but there are several buses that could be interesting for the user, such as the buses described by the second or the third row. These have a different destination, which is not a real handicap since the user could take another bus to come to Hajdrihova, i.e., bus routes 1 and 6, respectively. However, if we would like to suggest a suitable bus or a sequence of buses, we just need to calculate a 0-join of the given relation with the requirement that the arrival bus stop of the first and the departure bus stop of the second bus are (basically) the same, i.e., there are no bus stops between them, maybe one only needs to cross the street.
6 Conclusion
We have defined the mathematical structure of sets with similarity that allows us to treat the features of richly-structured data, such as order, distance, and similarity, in a theoretically sound and uniform way. The proposed measures of similarity allow us to perform all types of similarity search.
In addition, we now briefly discuss possible implementations of the resulting databases enriched with measures of similarity. Clearly, the user should be able to query approximate or cooperative data from databases without being concerned about the internal structure of data. Hence some default similarity measures should be integrated within the database. But still, the user should have the opportunity to modify the default notions of similarity if he/she is willing to do this.
However, the question that arises is how to store the defined similarity measures. When the size of the data set A is small, the evident way to store a similarity measure pA : A X A ^ La is in tabular form, i.e., as a relation
PA C A X A X LA.
This kind of representation quickly becomes inefficient since it requires space quadratic in the size of A. Fortunately, in most cases the similarity measure can be easily calculated so that there is no need for storing it.
There are two typical examples of similarity measures that can be computed rather than stored. First, distancelike similarities are computed from auxiliary data, such as geographic location, duration, and various other features that only require a minimal amount of additional storage.
Second, reflexive relations are often defined in terms of deduction rules, e.g., it may be known that the relation is symmetric or transitive. In such cases we only store the base cases in a database, and deduce the rest from them. This is precisely the idea behind deductive database languages, such as Datalog.
References
[1]	P. Buneman, P. Jung, A. Ohori (1991) Using Powerdomains to Generalize Relational Databases, Theoretical Computer Science 9/1, Elsevier, pp. 23-55.
[2]	W. W. Chu, Q. Chen (1994) A Structured Approach for Cooperative Query Answering, IEEE Transactions on Knowledge and Data Engineering 6/5, IEEE Computer Society, pp. 738-749.
[3]	W. W. Chu, H. Jung, K. Chiang, M. Minock, G. Chow, C. Larson (1996) CoBase: A Scalable and Extensible Cooperative Information System, Journal of Intelligent Information Systems 6/2-3, Springer US, pp. 223-259.
[4]	E. F. Codd (1970) A Relational Model of Data for Large Shared Data Banks, Communications of the ACM 13/6, Association for Computing Machinery, pp. 377-387.
[5]	T. Gaasterland, P. Godfrey, J. Minker (1992) An Overview of Cooperative Answering, Journal of Intelligent Information Systems 1/2, Springer US, pp. 123-157.
[6]	T. Gaasterland, P. Godfrey, J. Minker (1992) Relaxation as a Platform of Cooperative Answering, Journal of Intelligent Information Systems 1/3-4, Springer US, pp. 293-321.
[7]	M. Hajdinjak (2006) Knowledge Representation and Evaluation of Cooperative Spoken Dialogue Systems, Ph.D. thesis, Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia.
[8]	M. Hajdinjak, F. Mihelic (2006) The PARADISE Evaluation Framework: Issues and Findings, Computational Linguistics 32/2, MIT Press, pp. 263-272.
[9]	G. R. Hjaltason, H. Samet (2003) Index-Driven Similarity Search in Metric Spaces, ACM Transactions on Database Systems 28/4, Association for Computing Machinery, pp. 517-580.
[10]	A. Motro (1988) VAGUE: A User Interface to Relational Databases that Permits Vague Queries, ACM Transactions on Office Information Systems 6/3, Association for Computing Machinery, pp. 187-214.
[11]	A. Motro (1990) FLEX: A Tolerant and Cooperative User Interface to Databases, IEEE Transactions on Knowledge and Data Engineering 2/2, IEEE Computer Society, pp. 231-246.
[12]	G. Navarro (2001) A guided tour to approximate string matching, ACM Computing Surveys 33/1, Association for Computing Machinery, pp. 31-88.
[13]	W. Ng (2001) An Extension of the Relational Data Model to Incorporate Ordered Domains, ACM Transactions on Database Systems 26/3, Association for Computing Machinery, pp. 344-383.
[14]	D. E. Rutherford (1965) Introduction to Lattice Theory, Oliver & Boyd, Edinburgh, London.
[15]	J. D. Ullman (1988) Principles of Database and Knowledge-Base Systems, Volume I, Computer Science Press Inc., Rockville, Maryland.
[16]	J. D. Ullman (1989) Principles of Database and Knowledge-Base Systems, Volume II: The New Technologies, Computer Science Press, Inc., Rockville, Maryland.
Robust H« Control of a Doubly Fed Asynchronous Machine
Gherbi Sofiane
Department of electrical engineering, Faculty of Science of the engineer 20 August 1956 University, Skikda, Algeria E-mail: sgherbi@gmail.com
Yahmedi Said
Department of electronic, Faculty of Science of the engineer Badji Mokhtar University, Annaba, Algeria E-mail: sais.yahmedi@carmail.com
Sedraoui Moussa
Department of electronics, Faculty of Science of the engineer Constantine University, road of AIN EL BEY Constantine, Algeria E-mail : msedraoui@gmail.com
Keywords: doubly fed asynchronous machine, robust control, ^^ control, LMI's Received: April 25, 2008
The doubly fed asynchronous machine is among the most used electrical machines due to its low cost, simplicity of construction and maintenance [1]. In this paper, we present a method to synthesize a robust controller of doubly fed asynchronous machine which is the main component of the wind turbine system (actually the most used model [2]), indeed: there is different challenges in the control of the wind energy systems and we have to take in a count a several parameters that perturb the system as: the wind speed variation, the consumption variation of the electricity energy and the kind of the power consumed (active or reactive) ...etc.. The method proposed is based on the H^ control problem with the linear matrix inequalities (LMI's) solution: Gahinet-Akparian [3], the results show the stability and the performance robustness of the system in spite of the perturbations mentioned before. Povzetek: Opisana je metoda upravljanja motorja vetrnih turbin.
1 Introduction
From all the renewable energy electricity production systems, the wind turbine systems are the most used specially the doubly fed asynchronous machine based systems, the control of theses systems is particularly difficult because all of the uncertainties introduced such as: the wind speed variations, the electrical energy consumption variation, the system parameters variations, in this paper we focus on the robust control ( H^ controller design method) of the doubly fed asynchronous machine which is the most used in the wind turbine system due to its low cost, simplicity of construction and maintenance [1]. This paper is organised as follow: Section 2 presents the wind turbine system equipped with the doubly fed asynchronous machine and then the mathematical electrical equations from what the system is modelled (in the state space form) are given. The section 3 presents the H^ robust controller design method with the LMI's solution used to control our system.
The section 4 presents a numerical application and results in both the frequency and time plan are presented And finally a conclusion is given in section 5.
2 System presentation and modelling
The following figure represents the wind turbine system Wind
—^	Reactive Active
Power	Power
\ /
Electrical
energy
consumption
Doubly fed
asynchronous
machine
Electrical network
Figure 1: The Wind turbine system
The system use the wind power to drag the double fed asynchronous machine who acts as a generator, the output power produced must have the same high quality when it enters the electrical network, i.e.: 220 volts amplitude and 60 Hz frequency and the harmonics held
to a low level in spite of wind speed changes and electrical energy consumption in active or reactive power form. References [4], [5], [6] describe detailed models of wind turbines for simulations, we use the model equipped with the doubly fed induction generators (asynchronous machine) (for more details see [7]), the system electrical equations are given in (d, q)frame orientation, then the stator voltage differential equations are:
Vds = R. .Ids + ^dt ® ds - qs	(1)
Vqs = R. .Iqs + ® qs + ^s ds	(2)
The rotor voltage differential equations are:
Vdr = Rr .Idr + ^df^ ® dr - ^r	(3)
V = R I + — ® + w	(4)
I'qr	d^q^^r ^dr
The stator flux vectors equations are:
® ds = Ls .Ids + M .Idr	(5)
® qs = Ls .Iqs + M.Iqr	(6)
The rotor flux vectors equations:
® dr = Lr .Idr + M .I—s	(7)
® qr = Lr .Iqr + M .Iqs	(8)
The electromagnetic couple flux equation :
= P^M- ds .Ir qs .Ir )	(9)
Ls
The electromagnetic couple mecanic equation :
Cm = Cr + + f
(10)
With:
V^^, Vq^s : Statoric voltage vector components in 'd'
and 'q' axes respectively.
Vdrr, Vq^r : Rotoric voltage vector components in 'd'
and ' q' axes respectively.
I^is, Iqs : Statoric current vector components in 'd'
and ' q' axes respectively.
Idr, Iqr : Rotoric current vector components in 'd'
and ' q' axes respectively.
®ds, ®qs : Statoric flux vector components in 'd'
and ' q' axes respectively.
®, ®^^^ : Rotoric flux vector components in 'd' and ' q' axes respectively.
Rs, R^ : Stator and rotor resistances (of one phase) respectively.
Ls, Lr : Stator and rotor cyclic inductances respectively.
ws,wr : Statoric and rotoric current pulsations respectively.
M : Cyclic mutual inductance. p : Number of pair of the machine poles.
Cr : Resistant torque. f : Viscous rubbing coefficient. J : Inertia moment.
2.1 State space model
In order to apply the robust controller design method, we have to put the system model in the state space from; we consider the rotoric voltage V,:^^, Vqr as the inputs
and the statoric voltage Vdis, V^^^ as the outputs, i.e. we
have to design a controller who acts on the rotoric voltages to keep the output statoric voltages at 220volts and 50^z frequency in spite of the electric network perturbations (demand variations ^ etc) and the wind speed variations (see figure.2).
perturbations /
	K		G	y -	
					
f					
Figure 2: A Doubly fed wind turbine system control configuration
Where: u , y and e are the rotoric voltage vector (control vector), statoric output voltage vector and the error signal between the input reference and the output system respectively. K , G are the controller and the wind turbine system respectively. R : is the statoric voltage references vector and perturbations are the electric energy demand variations, wind speed variations ^etc.
't
Let us consider x =

u =Ids Iqs Vds Vq
dr ^ qr T
as a state vector, and
qs
ds
qs
as the command vector, the stator flux vector is oriented in d axis of Parks reference
® qs = 0 and Ids.
frame then
constant in the steady state i.e.: Ij^ = Iqs = 0 .
I qs are considered
qs
We use the folowing doubly fed asynchronous machine parameters:
Rs = 5Q ; R^ = 1.0113Q ; M = 0.1346H
Ls = 0.3409H ; L^ = 0.605H ; w^ = 146.6Hz ;
ws = 2^- 50Hz
Let w = ws - wr and a = 1 -
M 2
Ls - L,
The state space (11) can be obtained by the combining of the equations (1) to (8) as follow:
x = A - x + B - u >> = C - x + D - u
Where:
R
X = U =
y =
^ds ^qs Vdr Vqr
Vds Vqs
And:
A =
C = -
r-R,	wr - Rr	
Lr - wr		B =
	Lr _	
Rr M
1 0
Rr^M 0 1
M
R.
Lr w -
w
Rr
D =
Rs + ^L2 • Rr Lr	- a- Ls .W	M Lr	0
a- Ls .W	R. + ■ Rr Lr	0	M Lr
3 The H^ controller design method
It is necessary to recall the basics of a control loop (figure.3). With G : the perturbed system.
K	[	G
	1	

Figure 3: The control loop with the output multiplicative uncertainties
The multiplicative uncertainties at the process output which include all the perturbations that act in the system
are then : Am = (G - G).G, with G'= G(I + Am ) : is
the perturbed system, figure.4 show the singular values
plot at the frequency plan of A m, we can see that the
uncertainties are smaller at low frequencies and grow at the medium and high frequencies, this mean a strong perturbation at high frequencies (the transient phase), we also note a pick at: m = 260rad / s, this is due to the fact that the system is highly coupled at this pulsation.
We can bound the system uncertainties by the following weighting matrix function:
Wt ( jw) =
Informatica 32 (2008) 143-150 145 0.55(0.02 jw +1)
(1 + 0.0001 jw) 0
0
0.55(0.02 jw +1)
(12)
(1 + 0.0001 jw)
The figure.5 show that the singular values of Wt (jw) bounds the maximum singular values of the uncertainties in the entire frequency plan. The robust stability condition [11] is then:
ä[T(j-w)- W, {jw)] -< 1	(13)
Or:
a
T(jw)]< a[Wt{jw)]-
(14)
Where: a is the maximum singular value and T (jw) is the nominal closed loop transfer matrix defined by:
T (jw) = G(jw) - K (jw) - [l + G(jw) - K (jw)
-1
(15)
The equations (13) allow us to guaranty the stability robustness, in other hand we most guaranty satisfying performances (no overshoot, time response ^etc) in the closed loop (performances robustness), this can by done by the performance robustness condition [8]:
(16)
a
Or:
^(jw)- Wp (jw)] < 1 (jw)]< a[Wp (jw)
(17)
Where:
S (jw) is the sensitivity matrix given by:
S (jw)=[l + G(jw)- K (jw)]-1	(18)
Wp ( jw) is a weighting matrix function designed to meet the performance specifications desired in the frequency plan, we choose the following matrix function:
W^ (jw) =
(0.005 jw +1)
0.05 jw 0
0
(0.005 jw +1)
(19)
0.05 jw
The figure.6 represent the singular values of WP (jw) in the frequency plan, one notice that the specifications on the performances are bigger in low frequencies (integrator frequency behaviour), and this guaranty no static error.
Then the standard problem of H® Control theory is then:
T(jw)- Wt (jw)
min
K stabili sin g
S (jw)- Wp (jw)
(20)
i.e.: to find a stabilising controller K that minimise the norm (20).
With: is The Hinfinity norm.
T
0
L
r
0
r
L
r
L
r
G
m
R

4 Application
The minimisation problem (20) is solved by using two Riccati equations [9] or with the linear matrix inequalities approach. For our system, we use the linear matrix inequalities solution (for more details see [10]). The solution (controller) can be obtained via the Matlab instruction hinflmi available at 'LMI Toolbox' of Matlab® Math works Inc [11].
The figure 7 and the figure 8 show the satisfaction of the stability and performances robustness conditions (14) and (17).
The figure.9 show the step responses step responses of the closed loop controlled nominal system with:
R = ^Vds_ ref = 10 Vqs_ ref = ^^) respectÌvely.
The Outputs Vj,. good time
qs _ ref
and Vq^s follow the references with a response and no overshoot.
10
10"°
iS
10"0-3
" 10"0-4
10"0
10"0
10"4
a
A m (jw)] %
\
10"3
10"2
10 10 10 Pulsations rad/sec
102
103
104
Figure 4: The system uncertainties maximum singular values
10"
10
ÉE Xn
10"
10
10
äiWt (jw)
/
7
/
a[Am (jw)]
10
10
10 10 10 Pulsations rad/sec
10
10
10
Figure 5: Maximum singular values of the system uncertainties Am bounded by the singular values of W^{jw).
^ 101
10
10°
10"'
10"
10"3
10"'
10"-
10"6
10"'
10"3
10"2
10 10 10 Pulsations rad/sec
102
103
10'
Figure 6: Singular Values of the weighting performance specification
10'
10"
10"'

a
T ( jw)
10"'
10
10
10
10
10 10 10 Pulsations rad/sec
10
10
10
Figure 7: Stability robustness condition
^ 10"2
10'
10
10
fWp ( ./w)]-
5
io-2
10"-

a
S ( jw)
10"'
10
10"6
10
	0.8
	
	0.6
	
O	0.4
	0.2
	1
	0.8
	0.6
	
	
O	0.4
	0.2
	0
10
10
10 10 10 Pulsations rad/sec
10
Figure 8: Performances robustness condition
0.1	0.15
Time (sec)
0.1	0.15
Time (sec)
10
10
				
			4	
			Vds	
/				
/			Vqs	
y			\	
				
Figure 9: Step response of the controlled closed loop nominal system
				
			4	
			Vqs	
/				
/			Vds	
y			\ \	
				
5 Conclusion
In this paper we deal with the control problem of a wind stability and good performances in spite of the
turbine equipped with a doubly fed asynchronous perturbations and system uncertainties.
machine subject to various perturbations and system
uncertainties (wind speed variations, electrical energy
consumption, system parameters variations ...etc), we
show that the H® controller design method can be
successfully applied to this kind of systems keeping
1
0
0
0
References
[1]	G. L. Johnson (2006), 'Wind energy systems: Electronic Edition', Manhattan, KS, October 10.
[2]	'AWEA Electrical Guide to Utility Scale Wind Turbines', (2005), The American Wind Energy Association, available at http://www.awea.org.
[3]	P. Gahinet, P. Akparian (1994), 'A linear Matrix Inequality Approach to H„ Control ', Int. J. of Robust & Nonlinear Control", vol. 4, pp. 421-448.
[4]	J. Soens, J. Driesen, R. Belmans (2005), ' Equivalent Transfer Function for a Variable-speed Wind Turbine in Power System Dynamic Simulations ', International Journal of Distributed Energy Resources, Vol.1 N°2, pp. 111-133.
[5]	'Dynamic Modelling of Doubly-Fed Induction Machine Wind-Generators' (2003), Dig Silent GmbH Technical Documentation, available at http://www.digsilent.de.
[6]	J. Soens, J. Driesen, R. Belmans (2004), ' Wind turbine modelling approaches for dynamic power system simulations ', IEEE Young Researchers Symposium in Electrical Power Engineering -
Intelligent Energy Conversion, (CD-Rom), Delft, The Netherlands.
[7]	J. Soens, V. Van Thong, J. Driesen, R. Belmans (2003), ' Modelling wind turbine generators for power system simulations ', European Wind Energy Conference EWEC.
[8]	Sigurd Skogestad, Ian Postlethwaite (1996), 'Multivariable Feedback Control Analysis and Design', John Wiley and Sons. pp: 72 to 75
[9]	J. C. Doyle, K. Glover, P. P. Khargonekar and Bruce A. Francis (1989), 'State-Space Solution to
Standard H2 and Control Problems', IEEE
Transactions on Automatic Control, Vol. 34, N°. 8.
[10]	D.-W. Gu, P. Hr. Petkov and M. M. Konstantinov (2005), 'Robust Control Design with MATLAB® ', © Springer-Verlag London Limited.pp:27 to 29
[11]	P. Gahinet, A. Nemirovski, A. J. Laub, M. Chilali (1995). "LMI Control Toolbox for Use with MATLAB®", User's Guide Version 1, The Math Works, and Inc.
Balancing Load in a Computational Grid Applying Adaptive, Intelligent Colonies of Ants
Mohsen Amini Salehi
Department of Software Engineering, Faculty of Engineering, Islamic Azad University, Mashhad Branch, Iran E-mail: Amini@mshdiau.ac.ir
Hossein Deldari
Department of Software Engineering, Faculty of Engineering Ferdowsi University of Mashhad, Iran E-mail: hd@um.ac.ir
Bahare Mokarram Dorri
Management and Planning Organisation of Khorasan, Mashhad, Iran E-mail: mokarram@mpo-kh.ir
Keywords: Grid computing, Load balancing, Ant colony, Agent-based Resource Management System (ARMS) Received: July 1, 2007
Load balancing is substantial when developing parallel and distributed computing applications. The emergence of computational grids extends the necessity of this problem. Ant colony is a meta-heuristic method that can be instrumental for grid load balancing. This paper presents an echo system of adaptive fuzzy ants. The ants in this environment can create new ones and may also commit suicide depending on existing conditions. A new concept called Ant level load balancing is presented here for improving the performance of the mechanism. A performance evaluation model is also derived. Then theoretical analyses, which are supported with experiment results, prove that this new mechanism surpasses its predecessor.
Povzetek: Metoda inteligentnih mravelj je uporabljena na problemu razporejanju bremen.
1 Introduction
A computational grid is a hardware and software	Therefore, the grid middleware should compensate for the
infrastructure which provides consistent, pervasive and	lack of unique administration.
inexpensive access to high end computational capacity. An	ARMS is an agent-based resource manager infrastructure
ideal grid environment should provide access to all the	for the grid [3, 4]. In ARMS, each agent can act
available resources seamlessly and fairly.	simultaneously as a resource questioner, resource provider,
The resource manager is an important infrastructural	and the matchmaker. Details of the design and
component of a grid computing environment. Its overall	implementation of ARMS can be found in [2]. In this
aim is to efficiently schedule applications needing	work, we use ARMS as the experimental platform.
utilization of available resources in the grid environment.	Cosy is a job scheduler which supports job scheduling as
A grid resource manager provides a mechanism for grid	well as advanced reservations [5]. It is integrated into
applications to discover and utilize resources in the grid	ARMS agents to perform global grid management [5];
environment. Resource discovery and advertisement offer	Cosy needs a load balancer to better utilize available
complementary functions. The discovery is initiated by a	resources. This load balancer is introduced in part 3.
grid application to find suitable resources within the grid.	The rest of the paper is organized as follows: Section 2
Advertisement is initiated by a resource in search of a	introduces the load balancing approaches for grid resource
suitable application that can utilize it. A matchmaker is a	management. In Section 3, ant colony optimization and
grid middleware component which tries to match	self-organizing mechanisms for load balancing are
applications and resources. A matchmaker may be	discussed. Section 4 describes the proposed mechanism.
implemented in centralized or distributed ways. As the grid	Performance metrics and simulation results are included in
is inherently dynamic, and has no boundary [1], so the	Section 5. Finally, the conclusion of the article is presented
distributed approaches usually show better results [2] and	as well as future work related to this research. are also more scalable. A good matchmaker (broker)
should uniformly distribute the requests, along the grid	2 Load balancing resources, with the aid of load balancing methods.
As mentioned in [1], the grid is a highly dynamic	Load balancing algorithms are essentially designed to
environment for which there is no unique administration. spread the resources' load equally thus maximizing their
utilization while minimizing the total task execution time
[7]. This is crucial in a computational grid where the most
pressing issue is to fairly assign jobs to resources. Thus, the difference between the heaviest and the lightest resource load is minimized.
A flexible load sharing algorithm is required to be general, adaptable, stable, scalable, fault tolerant, transparent to the application and to also induce minimum overhead to the system [8]. The properties listed above are interdependent. For example, a lengthy delay in processing and communication can affect the algorithm overhead significantly, result in instability and indicate that the algorithm is not scalable.
The load balancing process can be defined in three rules: the location, distribution and selection rule [7]. The location rule determines which resource domain will be included in the balancing operation. The domain may be local, i.e. inside the node, or global, i.e. between different nodes. The distribution rule establishes the redistribution of the workload among available resources in the domain, while the selection rule decides whether the load balancing operation can be performed preemptively or not [7].
2.1 Classification of load balancing mechanisms
In general, load balancing mechanisms can be broadly categorized as centralized or decentralized, dynamic or static [10], and periodic or non-periodic [11]. In a centralized algorithm, there is a central scheduler which gathers all load information from the nodes and makes appropriate decisions. However, this approach is not scalable for a vast environment like the grid. In decentralized models, there is usually not a specific node known as a server or collector. Instead, all nodes have information about some or all other nodes. This leads to a huge overhead in communication. Furthermore, this information is not very reliable because of the drastic load variation in the grid and the need to update frequently. Static algorithms are not affected by the system state, as their behaviour is predetermined. On the other hand, dynamic algorithms make decisions according to the system state. The state refers to certain types of information, such as the number of jobs waiting in the ready queue, the current job arrival rate, etc [12]. Dynamic algorithms tend to have better performance than static ones [13].
Some dynamic load balancing algorithms are adaptive; in other words, dynamic policies are modifiable as the system state changes. Via this approach, methods adjust their activities based on system feedback [13].
3 Related works
Swarm intelligence [14] is inspired by the behaviour of insects, such as wasps, ants or honey bees. The ants, for example, have little intelligence for their hostile and dynamic environment [15]. However, they perform incredible activities such as organizing their dead in cemeteries and foraging for food. Actually, there is an indirect communication among ants which is achieved through their chemical substance deposits [16].
This ability of ants is applied in solving some heuristic
problems, like optimal routing in a telecommunication
network [15], coordinating robots, sorting [17], and
especially load balancing [6, 9, 18, 19].
Messor [20] is the main contribution to the load balancing
context.
3.1	Messor
Messor is a grid computing system that is implemented on top of the Anthill framework [18].
Ants in this system can be in Search-Max or Search-Min states. In the Search-Max state, an ant wanders around randomly until it finds an overloaded node. The ant then switches to the Search-Min state to find an underloaded node. After these states, the ant balances the two overloaded and underloaded nodes that it found. Once an ant encounters a node, it retains information about the nodes visited. Other ants which visit this node can apply this information to perform more efficiently. However, with respect to the dynamism of the grid, this information cannot be reliable for a long time and may even cause erroneous decision-making by other ants.
3.2	Self-Organizing agents for grid load balancing
In [6], J.Cao et al propose a self-organizing load balancing mechanism using ants in ARMS. As this mechanism is simple and inefficient, we call it the "seminal approach". The main purpose of this study is the optimization of this seminal mechanism. Thus, we propose a modified mechanism based on a swarm of intelligent ants that uniformly balance the load throughout the grid. In this mechanism an ant always wanders '2m+ 1' steps to finally balance two overloaded and underloaded nodes. As stated in [6], the efficiency of the mechanism highly depends on the number of cooperating ants (n) as well as their step count (m). If a loop includes a few steps, then the ant will initiate the load balancing process frequently, while if the ant starts with a larger m, then the frequency of performing load balancing decreases. This implies that the ant's step count should be determined according to the system load. However, with this method, the number of ants and the number of their steps are defined by the user and do not change during the load balancing process. In fact, defining the number of ants and their wandering steps by the user is impractical in an environment such as the grid, where users have no background knowledge and the ultimate goal is to introduce a transparent, powerful computing service to end users.
Considering the above faults, we propose a new mechanism that can be adaptive to environmental conditions and turn out better results. In the next section, the proposed method is described.
4 Proposed method
In the new mechanism, we propose an echo system of intelligent ants which react proportionally to their conditions. Interactions between these intelligent,
autonomous ants result in load balancing throughout the grid.
In this case, an echo system creates ants on demand to achieve load balancing during their adaptive lives. They may bear offspring when they sense that the system is drastically unbalanced and commit suicide when they detect equilibrium in the environment. These ants care for every node visited during their steps and record node specifications for future decision making. Moreover, every ant in the new mechanism hops 'm' steps (the value of 'm' is determined adaptively) instead of '2m+1'. At the end of the 'm' steps, 'k' overloaded are equalized with 'k' underloaded nodes, in contrast to one overloaded with one underloaded according to the previous method. This results in an earlier convergence with fewer ants and less communication overhead.
In the next sections, the proposed method is described in more detail.
4.1	Creating ants
If a node understands that it is overloaded, it can create a new ant taking only a few steps to balance the load as quickly as possible. Actually, as referred in [2], neighbouring agents, in ARMS, exchange their load status periodically. If a node's load is more than the average of its neighbours, for several periods of time, and it has not been visited by any ant during this time, then the node creates a new ant itself to balance its load throughout a wider area. Load can be estimated several ways by an agent to distinguish whether a node is overloaded or not. For the sake of comparison with similar methods, the number of waiting jobs in a node is considered the criterion for load measurement.
4.2	Decision-making
Every ant is allocated to a memory space which records specifications of the environment while it wanders. The memory space is divided into an underloaded list (Min List) and an overloaded list (Max List). In the former, the ant saves specifications of the underloaded nodes visited. In the latter, specifications of the overloaded nodes visited are saved.
At every step, the ant randomly selects one of the node's neighbours.
4.2.1 Deciding algorithm
After entering a node, the ant first checks its memory to determine whether this node was already visited by the ant itself or not. If not, the ant can verify the condition of the node, i.e. overloaded, underloaded or at an equilibrium, using its acquired knowledge from the environment. As the load quantity of a node is a linguistic variable and the state of the node is determined relative to system conditions, decision making is performed adaptively by applying fuzzy logic [21, 22].
To make a decision, the ant deploys the node's current workload and the remaining steps as two inputs into the fuzzy inference system. Then, the ant determines the state of the node, i.e. Max, Avg or Min.
The total average of the load visited is kept as the ant's internal knowledge about the environment. The ant uses this for building membership functions of the node's workload, as shown in Figure1.a. The membership functions of Remain steps and Decide, as the output, are on the basis of a threshold and are presented in Figures 1.b, 1.c:
Load
Total avg Max load
(a)
RmStep
Decide
(b)
(c)
Figure 1: Membership functions of fuzzy sets a) The Node's Load, b) Remain steps, c) Decide.
The inference system can be expressed as the following relation:
R^ : Load<L,ML,MH,H > *RmStep<F, A,V >
^ Decide< Min, Avg, Max >
(1)
Where L, ML, MH, H in Figure1.a indicates Low, Medium Low, Medium High, High respectively and F, A, V in Figure 1.b indicates Few, Average and Very. Thus, the ant can make a proper decision. If the result is "Max" or "Min", the node's specifications must be added to the ant's max-list or the min-list. Subsequently, the corresponding counter for Max, Min, or Avg increases by one. These counters also depict the ant's knowledge about the environment. How this knowledge is employed is explained in the next sections.
4.2.2 Ant level load balancing
In the subtle behaviour of ants and their interactions, we can see that when two ants face each other, they stop for a moment and touch tentacles, probably for recognizing their team members. This is what inspired the first use of ant level load balancing.
With respect to the system structure, it is probable that two or more ants meet each other on the same node. As mentioned earlier, each of these ants may gather specifications of some overloaded and underloaded nodes. The amount of information is not necessarily the same for each ant, for example one ant has specifications of four overloaded and two underloaded while the other has two overloaded and six underloaded nodes in the same
position. In this situation, ants extend their knowledge by exchanging them. We call this "ant l^el load balancing." In the aforementioned example, after ant level load balancing of the two co-positions, the ants have specifications of three overloaded and four underloaded nodes in their memories. This leads to better performance in the last step, when the ants want to balance the load of 'k' overloaded with 'k' underloaded nodes. This operation can be applied to more than two ants. Actually, when two or more co-positioned ants exchange their knowledge, they extend their movement radius to a bigger domain, thus improving awareness of the environment. Another idea is taken from the ant's pheromone deposits while travelling, which is used by ants to pursue other ants. This is applied in most ant colony optimization problems [23, 24]. There is, however, a subtle difference between these two applications. In the former the information retained by the ant may become invalid over time. This problem can be solved by evaporation [23, 24]. Evaporation, however, is not applicable in some cases, e.g. in the grid, where load information varies frequently. On the other hand, in the latter application, the knowledge exchanged is completely reliable.
4.2.3 How new ants are created
In special conditions, particularly when the its life span is long, the ant's memory may fill up, even though the ant may still be encountering nodes which are overloaded or underloaded. In this situation, if a node is overloaded, the ant bears a new ant with predefined steps. If encountering an underloaded node, the ant immediately exchanges the node's specification with the biggest load on the list of underloaded elements. This results in better balancing performance and adaptability to the environment. Here, adaptability translates into increasing the number of ants automatically, whenever there is an abundance of overloaded nodes.
4.3 Load balancing, starting new itineration
When its journey ends, the ant has to start a balancing operation between the overloaded (Max) and underloaded (Min) elements gathered during its roaming. In this stage, the number of elements in the Max-list and Min-list is close together (because of ant level load balancing). To achieve load balancing, the ant recommends underloaded nodes to the overloaded nodes and vice versa. In this way, the amount of load is dispersed equally among underloaded and overloaded nodes. After load balancing, the ant should reinitiate itself to begin a new itineration. One of the fields that must be reinitiated is the ant's step counts. However, as stated in previous sections, the ant's step counts (m) must be commensurate to system conditions [6]. Therefore, if most of the visited nodes were underloaded or in equilibrium, the ant should prolong its wandering steps i.e. decrease the load balancing frequency and vice versa. Doing this requires the ant's knowledge about the environment. This knowledge should be based on the number of overloaded, underloaded and equilibrium nodes visited during the last itineration.
Because of fuzzy logic power in the adaptation among several parameters in a problem [22] and the consideration of the step counts (m) as a linguistic variable, e.g. short, medium, long, it is rational to use fuzzy logic for determining the next itineration step counts. Actually, this is an adaptive fuzzy controller which determines the next itineration step counts (NextM, for short) based on the number of overloaded, underloaded and equilibrium nodes visited, along with the step counts during the last itineration (LastM). In other words, the number of overloaded, underloaded and equilibrium nodes encountered during the LastM indicate the recent condition of the environment, while the LastM, itself, reports the lifetime history of the ant.
The membership functions of the fuzzy sets are shown in Figure 2.
Last M
	(a)	Max m	
			
TL L \A	MH AA	TH Dead A/	
X/			NextM
Max m
(b)

M
Last m

(c)
Figure 2: Membership functions of fuzzy sets a) Last m, b) Next m, c) count of Max, Min and Average nodes visited
Where TL, L, M, H, TH shows Too Low, Low, Medium, High and Too High in Figure 2.a, 2.b and L, M, H indicates Low, Medium and High in Figure 2.c. This fuzzy system can be displayed as a relation and a corresponding function as follows:
B^g : MaxCount l, m, ^ > *MinCount l, mh > * Avgt l, m, h >^NextM< TL, L, M, H,TH, Dead>
H
L
135
Z yr * n ( xi )
f ( x) = ^
i=1
135 4
(3)
zn ( xi )
r=1 i=1
x.
Where shows the input data into the system, y is the centre of the specific membership function declared in
rule r . ( ' ) indicates the membership value of the i th input in membership functions of the r th rule. In this inference system, we also have 4 inputs and 135 rules defined, as stated in (3).
In this system, a large number of underloaded and, especially, equilibrium elements indicate equilibrium states. Consequently, the NextM should be prolonged, thus lowering the load balancing frequency. One can say that, if an ant's step counts extend to extreme values, its effect tends to be zero. Based on this premise, we can conclude that an ant with overly long step counts does not have any influence on the system balance. Rather, the ant imposes its communication overhead on the system. In this situation, the ant must commit suicide. This is the last ring of the echo system. Therefore, if the NextM is fired in the "Dead" membership function, the ant does not start any new itineration.
Below is a diagram exhibiting the ant's behaviour in different environmental conditions. Figure 3.a shows the relation between the LastM and the amount of overloaded nodes visited, while Figure 3.b illustrates the relation between the LastM and the number of equilibrium nodes visited.
5 Performance valuations
In this section, several common statistics are investigated, which show the performance of the mechanism.
5.1 Efficiency
To prove that the new mechanism increases efficiency, it should be compared with the mechanism described in [4]. First, we introduce some of the most important criteria in load balancing:
0 0
(a)
(b)
Figure 3: Schematic view of adaptive determining of next itineration step counts. a) LastM -MaxCount-output, b) LastM - avgCount-output
Let P be the number of agents and Wpk where (p: 1, 2... P)
is the workload of the agentp at stepk. The average workload is:

w, =
pk
P	(4)
The mean square deviation of Wpk, describing the load balancing level of the system, is defined as:
Lk
Z (Wk - w^,)2
p=1
P
(5)
Finally, The system load balancing efficiency (e ) is defined as:
L - L
C,.
(6)
Where ek means efficiency at step k and Ck is the total number of agent connections that have achieved a load balancing level Lk. To compare the efficiency of these two
mechanisms, we should consider
e^ / e^
As L0 indicates the load balancing level at the beginning of the load balancing process and is also equal in both new
ek =
Tirai
new
and seminal mechanisms, we shall discuss the value of Lk. For the sake of simplicity, assume that every node gets to
after the balancing process and no longer requires
balancing, i.e.
w - w = 0
wk wpk 0 (7)
On the other hand, after the k stage, if the memory space considered for overloaded and underloaded elements is equal to 'a' (a>2), then we have ka elements balanced:
Lknw =
p-ka _
Z(Wk -Wpk)' p=1
P (8) While in the seminal approach we have:
LkTrad =
p-2k _
Z (Wk -Wpk)2
_p=1_
P	(9)
If we suppose that a>2, we can conclude:
P - 2k > P - ka	(10)
After the k stages, the difference in the balanced nodes in these two mechanisms is:
P - 2a - P + ka = k(a - 2) (11)
Then:
L. =
p-ka	p-2k
Z (Wk -	Wpk )2 Z (Wk - Wpk )2
p=1	1 p=ka
Lt =
i
p-ka
Z (Wk - Wpk )2
p=1
P L

L > L,
L
< 1
'^krro^ ^ Lknew —> kTrad
With respect to (14), we have:
ek„
kTrad
2(L0 - L,^^^ )
L0 - LkTr„,
e

k„
"kTraä
^ > 2
(12)
(13)
(14)
(15)
ant's memory leads to occupying too much bandwidth as well as increasing processing time. Actually, there is a trade-off between the step counts (S) and the memory allocated to each ant (a).
If a<<S, then the memory allocated expires rapidly and the ant is compelled to generate new ants. This explodes the ant population, subsequently augments their communication and the remaining pheromone and finally leads to an increase in time. However, as the probability of balancing every node more than once rises, the load balancing level falls.
On the other side, if a^S, then the probability of creating new ants lessens. Subsequently, the ant's population is reduced. Cutting down on the ant population results in faster speed, diminished communication and the pheromone left by the ants. The final result, however, is not satisfactory (final load balancing level is high). Due to the reasons discussed and with respect to several experiments shown in Figures 4, 5 and Table 1, we deduce that, in order to satisfy the different parameters mentioned, it is better to set the allocated memory at about half of the step counts.
a = S/2
(18)
Experiments achieved with a different memory size allocated, where S=15 initially, are reported here.
10 n 8 -6 -4 -2 0 -
/
12000 n
9000 -6000 -3000 -0 -
T
V
0
10
1500 1200 900 600 300 0
20
0
10
20
Ant
0
10
15
One of the most important parameters in the efficiency of the new mechanism is the ant's memory space (a). In an extreme case, if a=2, then the mechanism resembles the seminal one, with half steps (S), i.e.
Snew = V2 * STrad (16)
Consider that memory space (a) is effective if and only if it can be filled during the ant's wandering steps. Therefore, if a increases, then the amount of steps (S) must increase accordingly to prevent performance degradation. This means that:
If a — ^ then S — ^ (17)
Increasing S causes a decrease in load balancing frequency and consequently an increase in convergence time.
Overly long trips also lead to many reserved nodes. At the same time, there may be other roaming ants looking for free, unbalanced nodes. On the other hand, expanding the
Figure 4: Relation between memory allocated to each ant (a) and a) load balancing level(L) b) time(T) base on millisecond c) Ant no (Ant), where the Ant initial Step Counts (S)=1
a	Time(ms)	Level	Ant No
5	10274	1.9455	1197
8	4906	3.0363	797
14	971	8.6015	325
Table 1: Relation between ants' memory size (a) and Ants with initial Step Counts (S=15).
5.2 Load balancing speed
Adaptively determining the step counts, actually, causes a differentiation in load balancing frequency over time. In other words, as time increases, the whole system approaches convergence and the load balancing frequency lessens, hence postponing the final convergence time. On
L
Trad
a
a
a
5
the contrary, the new mechanism imposes less overhead as the system nears the balance state. In reality, in an environment such as the grid, attaining final convergence is impractical because of its inherent dynamism and vastness. However, if balancing occurs, it would not last long.
Figure 5 shows a schematic comparison for load balancing frequency between the new and the seminal mechanism.
8000 n 6000 -4000 -2000 -0
F
0
5
10
Trad New T
1
15
Figure 5: Comparison between the Seminal (Trad) and the new method's load balancing frequency (F).
5.3 Experimental results
Experiments are achieved according to the specifications of Simorgh mini-grid [25]. This mini-grid will include different clusters throughout Ferdowsi University. However, as this mini-grid is under construction now, we have simulated its behaviour. In this simulation, Agent system, Workload, and Resources are modelled as follows:
•	Agents. Agents are mapped to a square grid. This simplification has been done in similar works [6, 13]. All of experiments described later include 400 agents.
•	Workload. A workload value and corresponding distribution are used to characterize the system workload. The value is generated randomly in each agent.
•	Resources. Resources are defined in the same way as workload.
The first experiment involves total network connections. In this experiment, as shown in Figure 6.a, ant communication in the new mechanism is drastically less than in the seminal approach. This is mainly because every time an ant wanders 'S' steps in the new method, it balances 'k' elements. In the traditional method, however, the ant wanders '2S+1' steps and then balances only two elements. Therefore, as seen below, with an equal initial step count (S=15), the ant in the new mechanism only goes through 2,000 stages to achieve final convergence, while in the traditional method, the ant passes 7,000 stages. Figure 6.b illustrates the comparison between a colony of ants using S=15 and a memory size=7. This figure elucidates that, in the new mechanism, the communication count goes flat. This occurs when the step counts enlarge and load balancing frequency decreases, i.e. in the last seconds.
250000 200000 150000 100000 50000 0
60000^ C
400000
200000
T C
10 (a)
15
20
New Method Trad
10 (b)
15
T(s) -1
20
Figure 6: Comparing agent communications (C) between the new and seminal (Trad) method. Final results using. a) One ant S=15, a=7 b) a colony of ants, N=220, S=15, a=7
The second experiment focuses on the relation between load balancing levels and the number of dead ants. As illustrated in Figure 7, as the number of dead ants rises, the load balancing level declines, i.e. it approaches final convergence. This experiment is conducted with different initial ants. Repeating the experiment with a different number of initial ants proves that, deploying more ants would result in better balancing level.
70605040302010-
init 250
600	800	1000 1200 1400
Figure 7: Impact number of created ants (DeadAntNo) on the load balancing level (L), the experiment achieved with different numbers of initial ants (init).
The third experiment concentrates on the correlation between an ant's step counts and the load balancing level. The average step counts of the swarm over time are used for measurement. As Figure 8 shows, the step count increases by approaching convergence. This results in delay to achieve final convergence.
0
5
0
0
5
80
0
0
200
Dead Ant No
250 200 150 100 50 ^ 0
avg step
8
10
12
14
16
18
Figure 8: Relation between average Step count (avgStep) and load balancing Level (L)
The fourth experiment indicates the effect of proposed load balancing method on the final job distribution. As understood from Figure 9, ant level load balancing produces a better convergence.
^-with ant level balance
-without ant level balance
Consider that stages do not have a completely standard meaning in our method. Thus, we think of periods of time as stages (k).
0.01
0.008
0.006
0.004
-■-8=15, a=7 —©—Traditional -A-8=20, a=10 -X-8=10, a=5


100
Figure 9: Comparison between effects of using ant level load balancing on final balancing level (L) during the time (T).
It is clear that this load balancing method cannot be achieved without any cost. As illustrated in Figure 9, although the proposed method results are better than previous ones, it consumes more time. We must acknowledge that the new method enables the ant to obtain global information even while moving locally. Furthermore, the validity of the exchanged information is guaranteed in contrast to using the pheromone, which is not, even with evaporation.
The fifth experiment presents the efficiency of the new method in comparison with the seminal approach. Figure 10 illustrates that the new method, with different initial steps and different memory allocated, is more efficient than the seminal one.
On the other hand, comparing the new method's efficiencies, with different initial step counts(S) and different memory allocated shows the effect of the tradeoff in determining the memory allocated to each ant (a). In this case, if the memory allocated is high, e.g. a=10, then the probability of creating new ants decreases. So, the probability of visiting a node by different ants is decreased which causes a fall in efficiency. In the other way, as mentioned earlier, low values for memory allocated (a), e.g. a=5, increase the ant population and consequently their interconnections (Ck). This again results in decreasing the final efficiency in regard to (6).
8 T(S) 10
Figure 10: Efficiency (e) comparison between the traditional and the new method with different step counts and memory allocated, during the time (T).
6 Conclusion
As described in the previous sections, equalizing the load of all available resources is one of the most important issues in the grid. In this way, with respect to grid specifications, an echo system of autonomous, rational and adaptive ants is proposed to meet the challenges of load balancing. There are great differences between the proposed mechanism and similar mechanisms which deploy ant colony optimization. We believe that ant level load balancing is the most important difference. In future work, we plan to extend the applications of ant level load balancing in addition to implementing this mechanism in a more realistic environment. Promoting ant intelligence and adaptation, establishing billing contracts among resources as they exchange customer loads, as well as overcoming security concerns are other future work.
References
[1]	F. Berman, Anthony J.G. Hey, Geoffrey C. Fox (2003) Grid Computing Making the Global Infrastructure a Reality WILEY SERIES IN COMMUNICATIONS NETWORKING & DISTRIBUTED SYSTEMS.
[2]	J. Cao, S. A. Jarvis, S. Saini, D. J. Kerbyson, and G. R. Nudd (2002) ARMS: an Agent-based Resource Management System for Grid Computing, Scientific Programming, Special Issue on Grid Computing Vol. 10, No. 2 pp. 135-148.
[3]	M. Baker, R. Buyya, and D. Laforenza (2002) Grids
and Grid Technologies for Wide-area Distributed Computing, Software: Practice and Experience Vol. 32, No. 15 pp. 1437-1466.
[4]	J. Cao, D. J. Kerbyson, G. R. Nudd (2001) Performance evaluation of an agent-based resource management infrastructure for grid computing in
0
Proc. 1st IEEE Int. Symp. on Cluster Computing and the Grid, pp. 311-318.
[5]	J. Cao and F. Zimmermann (2004) Queue Scheduling and Advance Reservations with COSY in Proc. of 18th IEEE Int. Parallel and Distributed Processing Symp pp. 120-128.
[6]	J. Cao (2004) Self-Organizing Agents for Grid Load Balancing Proc. of the 5th IEEE/ACM Int. Workshop on Grid Computing, pp.168-176.
[7]	A. Y. Zomaya, and Y. The (2001) Observations on using genetic algorithms for dynamic load-balancing, IEEE Trans. on Parallel and Distributed Systems, Vol. 12, No. 9, pp. 899-911.
[8]	O. Remien, J. Kramer (1992) Methodical analysis of adaptive load sharing algorithms, IEEE Trans. on Parallel and Distributed Systems, Vol. 3, No: 11, pp. 747-760.
[9]	Bing Qi Chunhui Zhao (2007) Ant Algorithm Based Load Balancing for Network Sessions, ICNC 2007, 3 th Int. Conference on Natural Computation, pp. 771775.
[10]	Y. Lan, T. Yu (1995) A Dynamic Central Scheduler Load-Balancing Mechanism, Proc. 14th IEEE Conf. on Computers and Communication, Tokyo, Japan, pp. 734-740.
[11]	H.C. Lin, C.S. Raghavendra (1992) A Dynamic Load-Balancing Policy with a Central Job Dispatcher (LBC), IEEE Transaction on Software Engineering, Vol. 18, No. 2, pp. 148-158.
[12]	Z. Zeng, B. Veeravalli (2004) Rate-Based and Queue-Based Dynamic Load Balancing Algorithms in Distributed Systems, Proc. 10th IEEE Int. Conf. on Parallel and Distributed Systems, pp. 349- 356.
[13]	M. Amini, H. Deldari (2006) Grid Load Balancing Using an Echo System of Ants, Proc. Of 24th IASTED Int. Conf, Innsbruck, pp. 47-52.
[14]	E. Bonabeau, M. Dorigo, G. Theraulaz (1999) Swarm Intelligence: from natural to artificial systems, Oxford University Press, pp: 75-98.
[15]	M. T. Islam, P. Thulasiraman, R. K. Thulasiram (2003) A Parallel Ant Colony Optimization Algorithm for All-Pair Routing in MANETs, Proc. 3th Int. Symp., Parallel and Distributed Processing, pp. 259270.
[16]	Siriluck Lorpunmanee, Mohd Noor Sap, Abdul Hanan Abdullah, and Chai Chompoo-inwai (2007) An Ant Colony Optimization for Dynamic Job Scheduling in Grid Environment, Int. Journal of Computer and Information Science and Engineering Volume 1 Number 4, pp. 207-214.
[17]	J. Deneubourg, S. Goss, N. Franks (1990) The dynamics of collective sorting robot-like ants and antlike robots, Proc. of the 1st Int. Conf. on Simulation of Adaptive Behavior, pp. 356-363.
[18]	Ö. Babaoglu, H. Meling, A. Montresor (2002) Anthill: A Framework for the Development of Agent-Based Peer-to-Peer Systems, in Proc. of 22th IEEE Int. Conf. on Distributed Computing Systems, Vienna, Austria, pp. 15-22.
[19]	J. Liu, X. Jin, and Y. Wang (2005) Agent-Based Load Balancing on Homogeneous Minigrids: Macroscopic
Modeling and Characterization, IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL 16, NO 7, pp.586-598.
[20]	A. Montresor, H. Meling, and Ö. Babaoglu (2002), Messor: Load-Balancing through a Swarm of Autonomous Agents, in Proc. of 1st ACM Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, Bologna, Italy, pp.112-120.
[21]	A. Shaout, P. McAuliffe (1998) Job scheduling using fuzzy load balancing in distributed system, ELECTRONICS LETTERS, Vol. 34, No. 20, pp. 5662.
[22]	Hai Zhuge, Jie Liu (2004) A fuzzy collaborative assessment approach for Knowledge Grid, Int. Journal of Future Generation Computer Systems, Vol. 2, No 20, pp. 101-111.
[23]	M. Dorigo, L. Maria (1997) Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem, Transactions ON EVOLUTIONARY COMPUTATION, VOL. 1, NO. 1, pp. 53-66.
[24]	M. Dorigo, G. Carol (1999) Ant Colony Optimization: A New Meta-Heuristic, proc. Of 3th Fuzzy Sets and Systems, pp. 21-29.
[25]	http://profsite.um.ac.ir/~hpcc
Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis
Željko Agić and Zdravko Dovedan
Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb Ivana Lučića 3, HR-10000 Zagreb, Croatia E-mail: {zeljko.agic, zdravko.dovedan}@ffzg.hr
Marko Tadić
Department of Linguistics, Faculty of Humanities and Social Sciences, University of Zagreb Ivana Lučića 3, HR-10000 Zagreb, Croatia E-mail: marko.tadic@ffzg.hr
Keywords: part-of-speech tagging, morphological analysis, inflectional lexicon, Croatian language Received: September 12, 2008
This paper investigates several methods of combining a second order hidden Markov model part-of-speech (morphosyntactic) tagger and a high-coverage inflectional lexicon for Croatian. Our primary motivation was to improve tagging accuracy of Croatian texts by using our newly-developed tagger CroTag, currently in beta-version. We also wanted to compare its tagging results - both standalone and utilizing the morphological lexicon - to the ones previously described in (Agić and Tadić 2006), provided by the TnT statistical tagger which we used as a reference point having in mind that both implement the same tagging procedure. At the beginning we explain the basic idea behind the experiment, its motivation and importance from the perspective of processing the Croatian language. We also describe tools - namely tagger and lexicon - and language resources used in the experiment, including their implementation method and input/output format details that were of importance. With the basics presented, we describe in theory four possible methods of combining these resources and tools with respect to their operating paradigm, input and production capabilities and then put these ideas to test using the F-measure evaluation framework. Results are then discussed in detail and conclusions and future work plans are presented.
Povzetek: Za hrvaški jezik je razvita metoda za označevanje besedila.
1 Introduction
After obtaining satisfactory results of the preliminary morphological cues about Croatian language or other experiment with applying a second order hidden Markov	rule-based modules.
model part-of-speech/morphosyntactic tagging paradigm We considered both courses of action as being
by using TnT tagger on Croatian texts, we decided to	equally important. HMM PoS/MSD trigram taggers
attempt reaching a higher level of accuracy based on	make very few mistakes when trained on large and
these results. Detailed description of the previous	diverse corpora encompassing most of morphosyntactic
experiment is given in (Agić and Tadić 2006) and TnT	descriptions for a language and, on the other hand, they
tagger is described in (Brants 2000). Please note that	rarely seem to surpass 98% accuracy on PoS/MSD,
abbreviation HMM is used instead hidden Markov model	excluding the tiered tagging approach by (Tufis 1999.)
and PoS (MSD) tagging instead part-of-speech	and (Tufis and Dragomirescu 2004), not without help of
(morphosyntactic) tagging further in the text.	rule-based modules, cues from morphological lexica or
In the section about our future work plans in (Agić	other enhancements which in fact turn stochastic tagging
and Tadić 2006), we provided two main directions for	systems into hybrid ones. We have therefore chosen to
further enhancements:	undertake both courses of action in order to create a
a.	Producing new, larger and more comprehensive	robust version of Croatian PoS/MSD tagger that would language resources, i.e. larger, more precisely	be able to provide us with high-quality MSD-annotated annotated and systematically compiled corpora of	Croatian language resources automatically.
Croatian texts, maybe with special emphasis on However, knowing that manual production of MSD-
genre diversity and	tagged corpora takes substantial amounts of time and
b.	Developing our own stochastic tagger based on	human resources, we put an emphasis on developing and HMMs (being that TnT is available to public only as	fine-tuning the trigram tagger in this experiment. Here a black-box module) and then altering it by adding	we describe what is probably the most straightforward of
currently available fine-tuning options for Croatian -combining CroTag tagger and Croatian morphological lexicon. The lexicon itself is described in (Tadić and Fulgosi 2003) and implemented in form of Croatian lemmatization server, described in (Tadić 2006) and available online at http://hml.ffzg.hr. Our notion of tagger-lexicon combination in this paper refers to several possibilities of utilizing high coverage of the lexicon on Croatian texts in order to assist CroTag where it makes most errors, namely while tagging tokens that were not encountered by its training procedure.
Section 2 of the paper describes all the tools, language resources, annotation standards, input and output formats used in the experiment, while section 3 deals in theory with four conceptually different but functionally similar methods of pairing CroTag tagger and Croatian morphological lexicon. Section 4 defines the evaluation framework that would finally provide us with results. Discussion and conclusions along with future plans are given in sections 5 and 6.
2 Resources and tools
In this section, we give detailed insight on tools and resources used in the experiment, along with other facts of interest - basic characteristics of available annotated corpora and input-output file format standard used.
2.1 Inflectional lexicon
At the first stage of the experiment, we had available the Croatian morphological lexicon in two forms - one was the generator of Croatian inflectional word forms, described in (Tadić 1994) and another was the Croatian lemmatization server, detailed in (Tadić 2006). As it can be verified at http://hml.ffzg.hr, the server takes as input a UTF-8 encoded verticalized file. File verticalization is required because the server reads each file line as a single token which is used as a query in lemma and MSD lookup. Output is provided in form of a text file and an equivalent HTML browser output. Figure 2.1 represents a simplified illustration of this output: first token is the word form given at input and it is followed by pairs of lemmas and corresponding morphosyntactic descriptors compliant to MULTEXT-East v3 specification, given by e.g. (Erjavec 2004).
da [da2 Qr] [dati Vmia2s] [dati Vmia3s] [dati Vmip3s] [dal Css]
Figure 2.1: Output of inflectional lexicon (illustration).
Therefore, a text document was extracted from the server containing all (lemma, token, MSD) triples and any computer program or a programming library implementing fast search capability over this document could be utilized in our experiment as a black-box module. For this purpose, we used the Text Mining Tools library (TMT), described in (Šilić et al. 2007), that had implemented a very fast and efficient dictionary module based on finite state automata, storing triples of word
forms, lemmas and tags into an incrementally constructed deterministic automaton data structure. This TMT dictionary module has thus provided us with the needed object-oriented interface (conveniently developed in C++, same as CroTag) that we could use to get e.g. all lemmas and MSDs for a token, all MSDs for a (token, lemma) pair etc. By utilizing this library, a working inflectional lexicon interface was at our disposal to be used both as an input-output black-box and rule-based module for integration with CroTag at runtime.
2.2 Stochastic tagger
Stochastic PoS/MSD trigram tagger for Croatian (or just CroTag from this point on) was developed and made available in form of an early beta-version for purposes of validation in this experiment, enabling us to envision future improvement directions and implementation efforts. Although many stochastic taggers have been made available to the community for scientific purposes during the years - for example, the TnT tagger (Brants 2000) and its open source reimplementation made in OCaml programming language, named HunPos (Halacsy et al. 2007) - and could be utilized in research scheme of our experiment, we still chose to develop our own trigram HMM tagger. This enabled us to alter its operation methods whenever required and also allowed us to integrate it with larger natural language processing systems that are currently under development for Croatian, such as the named entity recognition and document classification libraries. CroTag is developed using standard C++ with some helpful advice from the HunPos development team and additional interpretation of the OCaml source of HunPos tagger itself.
At this moment, the tagger implements only a second order hidden Markov model tagging paradigm (trigram tagging), utilizing a modified version of the Viterbi algorithm (Thede and Harper 1999), linear interpolation, successive abstraction and deleted interpolation as smoothing and default unknown word handling paradigms. These are de facto standard methods, also found in both TnT and HunPos. CroTag presumes token emission upon reached state and is trained as a visible Markov model, i.e. on pre-tagged corpora, from which it acquires transition and emission probability matrices, as described in e.g. (Manning and Schütze 1999).
Input and output formats of CroTag are once again virtually identical to ones of TnT and HunPos The training procedure takes a verticalized, sentence delimited corpus and creates the language model - i.e. tag transition and token emission probability matrices -while the tagging procedure takes as input a verticalized, sentence delimited, non-tagged text and utilizes the language model matrices to provide an output formatted identical to that required for training input: verticalized text containing a token and MSD per line.
Since CroTag is still under heavy development taking several different implementation directions, tagging procedures do not offer any possibility of setting the parameters to the user at the moment, although implementation of these options is placed on our to-do
list. Once we develop a final version of CroTag, it will be made available to the community as a web service and possibly as an open source project as well. Additional work planned for CroTag beta is discussed in section 6 together with other possible research directions.
2.3 Annotated corpus
The Croatia Weekly 100 kw newspaper corpus (CW100 corpus further in the text) consists of articles extracted from seven issues of the Croatia Weekly newspaper, which has been published from 1998 to 2000 by the Croatian Institute for Information and Culture. This 100 kw corpus is a part of Croatian side of the Croatian-English parallel corpus, as described by (Tadić 2000).
PoS	Corpus %	Different MSD
Noun	30.45	119
Verb	14.53	62
Adjective	12.06	284
Adposition	09.55	9
Conjunction	06.98	3
Pronoun	06.16	312
Other	20.27	107
Table 2.1: PoS distribution on the CW100 corpus.
The CW100 corpus was manually tagged using the MULTEXT-East version 3 morphosyntactic descriptors specification, detailed in (Erjavec 2004) and encoded using XCES encoding standard (Ide et al. 2000). The corpus consists of 118529 tokens, 103161 of them being actual wordforms in 4626 different sentences, tagged by 896 different MSD tags. Nouns make for a majority of corpus wordforms (30.45%), followed by verbs (14.53%) and adjectives (12.06%), which is in fact a predictable distribution for a newspaper corpus.
Some details are provided in Table 2.1. Please note that PoS category Other includes acronyms, punctuation, numerals, etc. A more detailed insight on the CW100 corpus stats and pre-processing methods can be found in (Agić and Tadić 2006).
3 Combining lexicon and tagger
Four different methods were considered while planning this experiment. They all shared the same preconditions for input and output file processing, as described in the previous section. We now describe in theory these methods of pairing our trigram tagger and morphological lexicon.
3.1 Tagger resolving lexicon output
The first idea is based on very high text coverage displayed by the inflectional lexicon (more than 96.5% for contemporary newspaper texts documented). The text, consisting of one token per line to be tagged, could serve as input to the lexicon, providing all known MSDs given a wordform in each output line. The tagger would then be used only in context of tag sequence probabilities obtained by the training procedure and stored in the transition probability data structure. Namely, a program
module could be derived from basic tagger function set, using tagger's tag transition probabilities matrix to find the optimal tag sequence in the search space, narrowed by using output of the inflectional lexicon instead of a generally poor lexical database stored in the emission probability matrix acquired at training.
3.2 Lexicon handling unknown words
A second-order HMM tagger such as CroTag is largely (almost exclusively) dependent of matrices of transition and emission probabilities, both of which are usually obtained from previously annotated corpora by a training procedure. As mentioned before, both CroTag and TnT (and HunPos, for that matter) use visible Markov model training procedures. It is well-known that it this case a large gap occurs when comparing PoS/MSD tagging accuracies on tokens known and unknown to the tagger in terms of the training procedure. If the training procedure encounters wordforms and discovers their respective tag distributions at training, error rates for tagging these words decrease substantially compared to tagging words that were not encountered at training. Improving trigram tagger accuracy therefore often means implementing an advanced method of guessing distributions of tags for unknown wordforms based on transition probabilities and other statistical methods, e.g. deleted interpolation, suffix tries and successive abstraction. Namely, TnT tagger implements all the methods listed above. However, most of these heuristic procedures frequently assign MSD tag distributions containing morphosyntactic descriptions having no linguistic sense for given unknown wordforms. We based our second method of pairing CroTag and inflectional lexicon on that fact alone; it would be worth investigating whether lexicon - as a large, high-coverage database of wordforms and associated lemmas and MSDs - could serve as unknown word handling module for the tagger at runtime. As it is expected that in most cases lexicon would recognize more word forms than tagger, implementation of this setting seemed to us as a logical and feasible course of action.
Suffix trie		Lexicon		Distribution
p(	tagu | suffi )	(Wi li	tagi)	p'(tagiilWi)
p(	taglili suffi)	(Wi li	tagiJ)	p'(tagij |Wi)
p(	tagijll suffi)	(Wi li	tagij)	p'(tagitlWi)
p(	tagiji suffi)	(Wi l2	tagij)	
				Sp' = Sp
p(	taginl suffi)	(Wi lm	tagi)	
Table 3.1: Lexicon improving the suffix trie.
In more detail, the idea builds on (Halacsy et al. 2006) and (Halacsy et al. 2007) and is basically a simple extension of the unknown word handling paradigm using suffix tries and successive abstraction (Samuelsson 1993). Trigram tagger such as TnT uses algorithms to disambiguate between tags in tag lists provided by emission probability matrix for a known wordform. Upon encountering an unseen wordform, such a list
cannot be found in the matrix and must be constructed from another distribution, e.g. based on wordform suffixes acquired from specific types of encountered wordforms and implemented in the suffix trie data structure. Successive abstraction module contributes by iteratively choosing a more general distribution, i.e. distribution for shorter suffixes, shortening until a distribution of tags for a matching is finally assigned to the unknown token. This results in large and consequently low-quality distributions of MSD tag probabilities for unknown word forms, resulting in lower tagging accuracy. Taking high coverage of the inflectional lexicon into consideration, our idea was to choose from the suffix trie distribution only those MSDs on which both lexicon and suffix trie intersect, falling back to suffix tries and successive abstraction alone when both lexicon and tagger fail to recognize the wordform. By this proposition, we utilize wordform and tag probabilities as given by the suffix trie and yet choose only meaningful wordform and tag pairs, i.e. pairs confirmed by reading the lexicon. Probabilities of tags that remain in distributions after the selection are recalculated, increasing and thus becoming more reliable for calculating the optimal tag sequence. Table 3.1 illustrates this principle: if suffix trie tag and lexicon tag for an unknown token match, this tag is chosen for the new emission distribution of the previously unknown wordform and emission probability is recalculated.
3.3 Lexicon as pre-processing module
In this method, we train CroTag and obtain matrices containing transition and emission probabilities. The latter one, emission probability matrix, links each of the tokens found in the training corpus to its associated tags and counts, i.e. probabilities as is shown in Figure 3.1. The figure provides an insight on similarities and differences of storing language specific knowledge of tagger and inflectional lexicon.
ime 26 Ncnsa 24 Ncnsn 2
imena 8 Ncnpa 1 Ncnpg 1 Ncnpn 3 Ncnsg 3 imenima 2 Ncnpd 1 Ncnpi 1 imenom 3 Ncnsi 3
imenovan 2 Vmps-smp 2 imenovana 1 Vmps-sfp 1 imenovanja 3 Ncnpg 2 Ncnsg 1 imenovanje 1 Ncnsv 1 imenovanjem 1 Ncnsi 1 imenovanju 4 Ncnsl 4
ime ime Ncnsa ime Ncnsn ime Ncnsv imenima ime Ncnpd ime Ncnpi ime Ncnpl imenom ime Ncnsi
Figure 3.1: Emission probability matrix file and lexicon output file comparison.
It was obvious that inflectional lexicon and tagger lexicon acquired by training have common properties, making it possible to create a lexicon-derived module for error detection and correction on the acquired lexicon used internally by the tagger. From another perspective, inflectional lexicon and tagger lexicon could also be merged into a single resource by some well-defined merging procedure.
3.4 Lexicon as post-processing module
Similar to using language knowledge of the inflectional lexicon before tagging, it could also be used afterwards. Output of the tagger could then be examined in the following manner:
1.	Input is provided both to tagger and inflectional
lexicon, each of them giving an output.
2.	The two outputs are then compared, leading to
several possibilities and corresponding actions:
a.	Both tagger and lexicon give an answer. Lexicon gives an unambiguous answer identical to the one provided by the tagger. No action is required.
b.	Both tagger and lexicon give an answer. Lexicon gives an unambiguous answer and it is different from the one provided by the tagger. Action is required and we choose to believe the lexicon as a manually assembled and thus preferred source of language specifics.
c.	Both tagger and lexicon give an answer. Lexicon gives an ambiguous answer, i.e. a sequence of tags. One of the tags in the sequence is identical to taggers answer. We keep the tagger's answer, being now confirmed by the lexicon.
d.	Both tagger and lexicon give an answer. Lexicon gives an ambiguous answer and none of the tags in the sequence matches the one provided by the tagger. A module should be written that takes into account the sequence provided by the lexicon and does re-tagging in a limited window of tokens in order to provide the correct answer. Basically, we define a window sized 3 tokens/tags and centred on the ambiguous token, lookup the most frequent of various trigram combinations available for the window (these are given by the lexicon!) in transition probability matrix of the tagger and assign this trigram to the window, disambiguating the output. By this we bypass tagger knowledge and once again choose to prefer lexicon output, unfortunately disregarding the fact that Viterbi algorithm outperforms this simple heuristic disambiguation.
e.	Tagger provides an answer, but token is unknown to the lexicon. We keep the tagger's answer, this being the only possible course of action.
f.	Tagger does not provide an answer and lexicon does. If its answer is unambiguous, we assign it
to the token. If it is ambiguous, we apply the procedure described in option 2d. 3. Final output produced by the merge is then
investigated by the evaluation framework.
It should by all means be noted that each of the presented paradigms had to undergo a theoretical debate and possibly - if considered to be a reasonable course of action - a full sequence of tests described in section 4 in order to be accepted or rejected for introducing overall improvement of tagging accuracy or creating additional noise, respectively. Details are given in the following sections.
4	Evaluation method
As a testing paradigm, we chose the F-measure framework for evaluation on specific PoS and general accuracy for overall tagging performance. Firstly, we provide a comparison of CroTag beta and TnT: overall PoS vs. MSD accuracy and also F-measures on nouns, pronouns and adjectives, proven to be the most difficult categories in (Agić and Tadić 2006). We then discuss the proposed tagger-lexicon combinations and provide the measures - overall accuracy and F1-scores for those methods judged as suitable and meaningful at the time of conducting the investigation.
Each test consists of two parts: the worst-case scenario and the default scenario. Worst-case is a standard tagging accuracy measure scenario created by taking 90% of the CW100 corpus sentences for training and leaving the other 10% for testing. Therefore, in a way, this scenario guarantees the highest number of unknown words to be found at runtime given the corpus. The default scenario chooses 90% of sentences from the CW100 pool for training and then 10% for testing from the same pool, making it possible for sentences to overlap in these sets. The default scenario is by definition not a standard measure scenario and was introduced in order to respect the nature of random occurrences in languages, leaving a possibility (highly improbable) of tagger encountering identical sentences at training and at runtime. Also, we argue that investigating properties of errors occurring on highest accuracy scores, derived by the default testing scenario, provides additional insight on properties of trigram tagging in general.
Note that we do not include testing scenarios debating on training set size as a variable: in this test, we consider improving overall tagging accuracy and not investigating HMM tagging paradigm specifics as in (Agić and Tadić 2006), being that conclusions on this specific topic were already provided there.
5	Results
The first set of results we present is from the set of tests evaluating overall tagging accuracy of CroTag on full MULTEXT East v3 MSD and on PoS information only (by PoS we imply the first letter of the MSD tag - not comparable to English PoS of e.g. English Penn Treebank). Acquired results are displayed in Table 5.1.
It could be stated from this table that results on TnT and CroTag are virtually identical and the differences
exist merely because testing environment - mainly the number of unknown words - was variable. It is however quite apparent that CroTag outperformed TnT on part-of-speech, especially regarding unknown tokens, but this should be taken with caution as well, being that CroTag dealt with fewer unknowns in that specific test.
Second testing case considers combining CroTag and the inflectional lexicon. Before presenting the results and in order to interpret them correctly, it should be stated that only two of the four initially proposed merging methods were chosen to proceed to the practical testing session: method (3.2) using the inflectional lexicon as an unknown world handler (3.4) using the inflectional lexicon as a postprocessing module to resolve potential errors produced by the tagger. We rejected applying (3.1) tagger as a disambiguation module for inflectional lexicon output because it would be costly to develop yet another tagger-derived procedure to handle transition probabilities only. This procedure would, in fact, do nothing different than a common HMM-based tagger does with its own acquired lexicon: disambiguates its ambiguous entries upon encountering them in the text and applying the transition probability matrix and handling procedures on unknown words.
		TnT		CroTag	
		MSD	PoS	MSD	PoS
	Overall	86.05	96.53	86.05	96.84
Worst	Known	89.05	98.29	89.26	98.42
case	Unknown	66.04	86.02	65.95	87.29
	Corp. unk.	13.07	14.40	13.77	14.11
	Overall	97.54	98.51	97.51	99.31
Default	Known	98.04	98.74	98.05	99.43
case	Unknown	62.21	83.11	63.75	88.39
	Corp. unk.	01.42	01.51	01.59	01.13
Table 5.1: Overall tagging accuracy on MSD and PoS.
The idea of inflectional lexicon as preprocessing module (3.3) was also rejected, mainly because we were unable to define precisely how to merge its database to the one acquired by tagger at training procedure. Being that tagger training procedure assigns each entry with a number of its occurrences overall and number of occurrences under various MSDs, in order to apply the inflectional lexicon as proposed by (3.3), we would have to assign these numbers so the tagger could understand the new entries. If we assign all to 1, it does not contribute and is redundant and if we assign any other number, we are in fact altering the tagging procedure outcome in such a manner that is not in any way bound by the language model, i.e. the training corpus. Therefore, we proceed with considering proposed cases (3.2) and (3.4) only.
We have also omitted PoS results from this testing case because TnT and CroTag are both able to achieve an accuracy over 95% without additional modules so we were focused in investigating MSD accuracy, keeping in mind that most errors do not occur on PoS but on sub-PoS levels resolvable by the lexicon. Details are provided by Table 5.2.
The first apparent conclusion is that method (3.4) that cleans up the errors on tagger output has failed and that it has failed on unknown words - where we may have expected it (or hoped for it) to perform better. The reason
is, on the other hand and second thought, quite obvious: the tagger applies a tag to an unknown word using transition probabilities and smoothing procedures that are proven to operate quite satisfactory in TnT, HunPos and CroTag. When the postprocessing lexicon-based module encounters a word tagged as unknown, this word is rarely unambiguous in the inflectional lexicon. Therefore, a resolution module using transition probabilities has to be applied quite frequently and this module clearly and expectedly does not outperform default unknown word handling procedures.
		TnT	CroTag +3.2	CroTag +3.4
	Overall	86.05	85.58	83.94
Worst	Known	89.05	88.84	88.18
case	Unknown	66.04	65.13	57.38
	Corp. unk.	13.07	13.77	13.77
	Overall	97.54	97.97	97.88
Default	Known	98.04	98.53	98.51
case	Unknown	62.21	63.49	59.40
	Corp. unk.	01.42	01.59	01.59
Table 5.2: Tagging accuracy with (3.2) unknown word handler and (3.4) postprocessing.
Based on other stats in Table 5.2, we could end the section by stating that CroTag, when combined with the inflectional lexicon in such a manner that the lexicon provides morphological cues to the tagger upon encountering unknown words, outperforms TnT by a narrow margin on the default MSD test case. However, a more sincere and exact statement - taking in regard all section 5 tables - would be that both TnT and CroTag share the same functional dependency regarding the number of unknown words they encounter in the tagging procedure. That is, CroTag outperforms TnT when less unknown tokens occur for him at runtime and vice versa, the inflectional lexicon contributing for around 1.3% improvement on unknown words. We can thus argue that our beta-version of CroTag tagger performs as well as TnT tagger and that we succeeded in implementing a state-of-the-art solution for tagging large-scale corpora of Croatian, given the test environment we had at hands, its drawbacks noted and hereby included.
In Table 5.3 we present results of evaluation broken down by three most difficult PoS categories: adjectives, nouns and pronouns. Data and analysis is given for PoS information only, as mentioned before.
		Adjective	Noun	Pronoun
TnT	Worst case	64.56	81.63	75.42
	Default case	94.79	96.75	96.94
CroTag	Worst case	65.31	80.85	74.62
	Default case	95.86	97.40	95.88
CroTag	Worst case	66.72	82.61	77.32
+3.2	Default case	95.06	96.79	95.82
Table 5.3: Tagging accuracy with adjectives, nouns and pronouns.
It can be clearly noticed that suggested combination mode (3.2) outperforms both TnT and CroTag in the worst case scenario on all parts of speech since it has the
support of HML when handling unknown words, that obviously do occur somewhat more frequently in this scenario. In the default case scenario, results are expectedly more even and inconclusive - default CroTag actually outperforms lexicon combination (3.2) because unknown tokens were found in small numbers in the test sets, much too small for the inflectional lexicon to contribute significantly to overall tagging accuracy.
6 Conclusion
In this contribution we have presented CroTag - an early beta-version of statistical PoS/MSD tagger for Croatian and proposed combining it with a large scale inflectional lexicon of Croatian, creating a hybrid system for high-precision tagging of Croatian corpora. We have presented several possible types of combinations, tested and evaluated two of them using the F-measure evaluation framework. CroTag provided results virtually identical to TnT, differing only in fractions of percentage in both directions in different evaluating conditions. This way we have shown that CroTag functions at the level of state-of-the-art regarding HMM-based trigram tagging and PoS/MSD-tagging in general.
Our future directions for improvement of this system could and probably can and probably will fall into several different research pathways.
The first of them should be analyzing tagging accuracy on morphological (sub-part-of-speech) features in more detail and fine-tuning the tagger accordingly.
Various parameterization options could also be provided at tagger runtime. Such options could include parameters for unigram, bigram and trigram preference or implementing token emissions depending on previously encountered sequences (multiword unit dependencies). As was previously mentioned, once we remove the beta-version appendix from CroTag by implementing these features and optimizing and tidying its source code, it will firstly be made available as a web service and then most probably as a freely-downloadable open source project on the web.
Fine-tuned rule-based modules for Croatian language specifics could also be considered and applied before or after the statistical procedure. Another option would be integration of inflectional lexicon into tagger as they have been programmed as separate modules, inducing some overhead to execution speed.
The next direction would be to build a full lemmatizer which, unlike inflectional lexicon presented in this paper, gives fully disambiguated lemmas as output relying on the results of the tagger. Selection of proper lemmas from sets of possible ones would be done on the basis of tagger output, once again fine-tuning levels of confidence between tagger and lemmatizer similar to section 3 of the paper.
It should also be noted that (Agić and Tadić 2008) takes into account an entirely different approach, putting an emphasis on corpora development. Namely, all the methods presented in previous sections are made exclusively for handling unknown word occurrences and all of them required lots of time and human effort to be
implemented. On the other hand, manual corpora development - although obviously also requiring time and effort - is by definition a less demanding and at the same time reasonable course of action: larger, better and more diverse corpora are always a necessity for any language, necessity that implicitly resolves many unknown wordform issues as well. Courses of action could therefore be argued; we decided to take most of them throughout our future work in order to additionally improve tagging accuracy on Croatian texts.
Acknowledgement
This work has been supported by the Ministry of Science, Education and Sports, Republic of Croatia, under the grants No. 130-1300646-1776, 130-13006460645 and 036-1300646-1986.
References
[1]	Agić, Ž., Tadić, M. (2006). Evaluating Morphosyntactic Tagging of Croatian Texts. In Proceedings of the Fifth International Conference on Language Resources and Evaluation. ELRA, Genoa - Paris 2006.
[2]	Agić, Ž., Tadić, M. (2008). Investigating Language Independence in HMM PoS/MSD-Tagging. In Proceedings of the 30th International Conference on Information Technology Interfaces. Cavtat, Croatia, 2008, pp. 657-662.
[3]	Brants, T. (2000). TnT - A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Conference on Applied Natural Language Processing. Seattle, Washington 2000.
[4]	Erjavec, T. (2004). Multext-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In Proceedings of the Fourth International Conference on Language Resources and Evaluation. ELRA, Lisbon-Paris 2004, pp. 1535-1538.
[5]	Halacsy, P., Kornai, A., Oravecz, C. (2007). HunPos - an open source trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, Prague, Czech Republic, pp. 209-212.
[6]	Halacsy, P., Kornai, A., Oravecz, C., Trón, V., Varga, D. (2006). Using a morphological analyzer in high precision POS tagging of Hungarian. In Proceedings of 5th Conference on Language Resources and Evaluation (LREC). ELRA, pp. 2245-2248.
[7]	Ide, N., Bonhomme, P., Romary, L., (2000). An XML-based Encoding Standard for Linguistic Corpora. In Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 825-830. (see also at http://www.xces.org).
[8]	Manning, C., Schütze, H. (1999). Foundations of Statistical Natural Language Processing, The MIT Press, 1999.
[9]	Samuelsson, C. (1993). Morphological tagging based entirely on Bayesian inference. 9th Nordic Conference on Computational Linguistics NODALIDA-93. Stockholm University, Stockholm, Sweden.
[10]	Šilić, A., Šarić, F., Dalbelo Bašić, B., Šnajder, J. (2007). TMT: Object-Oriented Text Classification Library. Proceedings of the 29th International Conference on Information Technology Interfaces. SRCE, Zagreb, 2007. pp. 559-566.
[11]	Tadić, M. (1994). Računalna obrada morfologije hrvatskoga književnog jezika. Doctoral thesis. Faculty of Humanities and Social Sciences, University of Zagreb, 1994.
[12]	Tadić, M. (2000). Building the Croatian-English Parallel Corpus. In Proceedings of the Second International Conference on Language Resources and Evaluation. ELRA, Paris-Athens 2000, pp. 523530.
[13]	Tadić, M., Fulgosi, S. (2003). Building the Croatian Morphological Lexicon. In Proceedings of the EACL2003 Workshop on Morphological Processing of Slavic Languages. Budapest 2003, ACL, pp. 41-46.
[14]	Tadić, M. (2006). Croatian Lemmatization Server. Formal Approaches to south Slavic and Balkan Languages. Bulgarian Academy of Sciences, Sofia, 2006. pp. 140-146.
[15]	Thede, S., Harper, M. (1999). A second-order Hidden Markov Model for part-of-speech tagging. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pp. 175182.
[16]	Tufi§, D. (1999). Tiered Tagging and Combined Classifiers. In F. Jelinek, E. Nöth (Eds.) Text, Speech and Dialogue, Lecture Notes in Artificial Intelligence 1692, Springer, 1999, pp. 28-33.
[17]	Tufi§, D., Dragomirescu, L. (2004). Tiered Tagging Revisited. In Proceedings of the 4th LREC Conference. Lisbon, Portugal, pp. 39-42.
Applying SD-Tree for Object-Oriented Query Processing
I. Elizabeth Shanthi Dept of Computer Science
Avinashilingam University for Women, Coimbatore, India. E-mail: shanthianto@yahoo.com
R. Nadarajan
Dept of Mathematics and Computer Applications PSG College of Technology, Coimbatore, India. E-mail: nadarajan_psg@yahoo.co.in
Keywords: signature files, B+ tree, indexing, object oriented data bspases Received: April 14, 2008
We follow signature-based approach to object-oriented query handling in this paper. The use of signature files as an index for full text search has been widely known and used. Signature file based access methods initially applied on text have now been used to handle set-oriented queries in Object-Oriented Data Bases (OODB). All the proposed methods use either efficient search method or tree based intermediate data structure to filter data objects matching the query. Use of search techniques retrieves the objects by sequentially comparing the positions of 1s in it. Such methods take longer retrieval time. On the other hand tree based structures traverse multiple paths making comparison process tedious. In this paper we describe a new indexing technique for representing signature file using the dynamic balancing of B+ tree called Signature Declustering tree (SD-tree). The structure has the positions of 1s in the signatures distributed over a set of leaf nodes. Using this for a given query signature all the matching signatures can be retrieved cumulatively from a single node. Also for signature insertion an optimal search path is calculated by keeping a threshold value and by using forward pointers in leaf nodes. To promote optimal search between subsequent queries the backward leaf node pointers are used. Experiments have been conducted to analyze the time and space overhead of the SD-tree by varying the signature length and the distribution of signature weight for varying query signature patterns. Also, to validate the proposed structure a hypothetical object schema is considered and sample queries evaluated.
Povzetek: SD drevesa so uporabljena za objektno obravnavo vprašanj.
1 Introduction
The advent of internet has made the volume of data	signature file for 1s sequentially and many non-
going high everyday in all computer-based applications.	qualifying objects are immediately rejected. This has entrusted researchers to design more powerful If all the 1s of Sq matches with that of the signature
techniques to generate and manipulate large amounts of	in the file it is called a drop. The signature file method
data to derive useful information. Indexing plays a vital	guarantees that all qualifying objects will pass through
role in the fast recovery of required data from large	the filtering mechanism; however some non-qualifying
databases.	objects may also pass the signature test. The drop that
Among the many indexing techniques reported the	actually matches the Sq is called an actual drop and drop
signature file approach is preferred for its efficient	that fails the test is called false drop. The next step in the
evaluation of set-oriented queries and easy handling of	query processing is the false drop resolution. To remove
insert and update operations. Initially applied on text data	false drops each drop is accessed and checked
[2, 3, 12, 17, 21] it has now been used in other	individually. The number of false drops can be
applications like office filing[6], relational and Object-	statistically controlled by careful design of the signature
Oriented Databases[14,16,26] and hypertext[9].	extraction method [7] and by using long signatures [3,6].
Signatures are hash coded abstractions of the original
data. It is a binary pattern of predefined length with fixed	1.1 Related work
number of 1s. The attributes' signatures are	^	i, i, u j- j u	i,
Different approaches have been discussed by researchers
superimposed to form obiects signature. To resolve a	^	: o- . x--, •	, • x-
query, the query signature say Sq is generated using the	eo ,repfsent Signnasturf ^ilSe in at • ^Sy ctnduciiver3^or
same hash function and compared with signatures in the	BvialSlting qSueriat:, suchfais rS3eiq|^en;i.au ,^.ifnvatureSFFile[tt 1],
^	^	Bit-Slice Signature file[31], Multilevel Signature
file[25], Compressed Multi Framed Signature file[23], Parallel Signature file[20], S-Tree and its variants[13,24], Signature Graph[28] and Signature tree[27,29,30].
1.2 Motivation for the current work
The signature tree developed by Chen [30] is having the following drawbacks:
•	Signatures are inserted considering both 0s and 1s whereas actual weight age is for set bits only.
•	Insertion path is dictated by the existing tree structure.
•	To process a query, bits appearing in the tree from root node are compared with query signature pattern for 0s and 1s and not by its set bits.
•	For a 0-bit in the query both left and right sub tree is followed leading to multiple traversals.
These observations laid the foundation for the current work. We study a new indexing technique for OODBSs using the dynamic balancing of B+ tree called Signature Declustering (SD)-tree in which the positions of 1s in the signatures are distributed over a set of leaf nodes. Using this for a given query signature all the matching signatures can be retrieved cumulatively in a single node.
The rest of the paper is organized as follows. In section 2 we discuss briefly the different approaches used to represent the signature file. In section 3 the structure of the proposed SD-tree is shown. Section 4 is devoted to the algorithms for insert, search and delete operations. Section 5 proposes a sample data set and queries to validate the new structure. Section 6 reports the results of the experiments conducted with the analytical comparison of SD-tree with that of Signature tree [30]. Finally section 7 concludes the work with further outlook on the work.
2 A summary of signature file techniques
2.1 Signature files
A Signature is a bit string formed from a given value. Compared to other index structures, signature file is more efficient in handling new insertions and queries on parts of words. Other advantages include its simple implementation and the ability to support a growing file. But it introduces information loss which can be minimized by carefully selecting the signature extraction method.
Techniques for signature extraction such as Word Signature (WS) [2,3,4,6], Superimposed Coding (SC) [1,2,3,4,5,6,10], Multilevel Superimposed Coding [12], Run Length Encoding (RL) [3,4,6], Bit-block Compression (BC) [3,4], Variable Bit-block Compression (VBC) [4,6] have been reported. The encoding scheme sets a constant number say m, of 1s in the range [1..F], where F is the length of the signature.
The resulting binary pattern with m number of 1s and (Fm) number of 0s is called a word signature. The signature of a text block or object can be obtained by superimposing (logical OR operation) all its constituent signatures (i.e) word signatures for text block and attributes' signatures for object. The set of all signatures form a signature file. An example of Superimposed Coding and a sample query evaluation is given below.
Information Retrieval	0010 0100 0100 0001
Block Signature	0110 0101
Sample queries
Matching query
Keyword = Information Query descriptor Block signature matches
False Match query
Keyword = Coding Query descriptor Block signature matches but keyword does not
0010 0100 0010 0100 (Actual Drop)
0010 0001 0010 0001 (False Drop)
Non-matching query
Keyword = Information 0010 0100 Keyword = Science 0000 0110 Query descriptor	0010 0110
Block signature does not match
2.2 Applications of signatures
Signatures are applied mainly in database access methods useful for text retrieval such as Full text scanning, Inversion, Clustering, Multi attribute retrieval methods like hashing and signature files. Such applications are discussed in [3,5,21]. Here the documents are stored sequentially in the "Text File".
Signatures which are abstractions of the documents are stored in the "Signature File". The latter serves as a filter on retrieval. It helps in discarding a large number of non-qualifying documents. Signatures have been applied in areas rich in text documents like telephone directory [1], office systems [3,4,6], Optical and Magnetic disk access [8], Data base Management system and Library automation.
Other applications include Access method for documents Indexing method for large text file [17, 18]. Access method for formatted data[1]. To speed up searching in editor[7]. To compress a vocabulary[7]. For a spelling checking program[7]. In differential file[7].
2.3 Physical representations of signature files
This section discusses briefly the various techniques used to represent signature files. We follow the lead of [30] for figures in this section.
2.3.1 Sequential Signature File (SSF)
The signatures are sequentially stored in a file called SSF as in Fig. 1. In single level signature methods every signature must be accessed and tested. Since signatures are abstractions of original data with smaller size, the method is faster than sequential scan of objects themselves. This method is easy to implement and requires low storage space and low update cost. The disadvantage is that more the number of objects exist, the more is the time spent on scanning signature file [8,31]. Therefore it is generally slow in retrieval. To support faster access multilevel signature methods are suggested.
^ ^ ^ ^ ^ ^ ^
Hash table
3]
I I
•	Text		
Figure 1: A typical Sequential signature file.
2.3.2 Bit-Sliced Signature File (BSSF)
BSSF stores signatures in a column-wise manner as illustrated in Fig. 2. Thus F bit-slice files one for each bit position of the set signatures are used. In retrieval only a part of the F bit-slice files have to be scanned and hence the search cost is lower than SSF. However update cost is higher. This is because a new signature insertion requires about F disk accesses one for each Bit-slice file [8,31].
Figure 3: A typical Compressed Bit-Sliced Signature File organization.
then searching the first bucket list, we find the position of the word Text. Although this approach gives some space saving, the number of false drops will definitely be increased due to sparse signature files.
2.3.4 S-Tree
S-Tree [24] is a B+ tree like structure [11,22] with leaf nodes containing a set of signatures with their Object Identifiers (OIDs). The internal nodes are formed by superimposing the lower level nodes as shown in Fig. 4. For example to retrieve a query signature Sq = 11000000, we search S-ree top down. From root node v1 we compare and move to v6. In the next level both v7 and v8 match and finally we end up with signatures o11, o12 in v7 and o13 in v8.
The advantage is simple tree searching way of obtaining signatures rather than searching the whole signature file.
Figure 2: A typical Bit-Sliced Signature File.
2.3.3 Compressed Bit-Sliced Signature File (CBSSF)
By choosing a proper hashing function for signature extraction the number of 1s is forced to be one. Here, the signature length should be increased to maintain the false drop probability at minimum. This creates a sparse matrix which is easy to compress [8, 23]. A simple way to compress is to replace each 1 with its corresponding physical address.
In Fig. 3 the hash table has a list of pointers pointing to the heads of linked list [8].
For example assume that the word Text has its first bit set to 1 and it appears at the 50th byte of text file
Figure 4: A typical S-tree organization.
The disadvantage is that due to superimposing, internal nodes in the upper level tend to have more weight which ultimately decreases selectivity. The S-Tree has been further improved in [13, 14]. In [13] a number of new split methods namely Linear split, Quadratic split, Cubic split and hierarchical clustering for S-tree is proposed to improve query response time. In [14] a new hybrid scheme combining linear hashing, S-tree and parametric weighted filter is used to evaluate subset-superset queries.
2.3.5 Multilevel Signature file
The structure is similar to S-Tree. However a signature at non-leaf node is formed by superimposed coding from all text blocks indexed by the subtree of which the signature is the root. Fig. 5 shows multilevel signature file for the set of signature values depicted in
50
4 Fig 4. Though this method improves selectivity in an n internal node, it requires more space. An improved e method for multilevel signature file is discussed in [25].
2.3.6 Signature Graph
The signature file is organized as a trie like structure [28,30]. However, the path visited in the graph to find a signature that matches a given query signature corresponds to a signature identifier which is not a continuous piece of bits, differentiating the signature graph from trie.
Figure 5.A: Typical Multilevel Signature file organization.
Though signatures are represented compactly, the search path length is not same for all queries. In other words the graph is not balanced. In worst case it degrades to a signature file. Fig. 6. shows the signature file and the corresponding signature graph.
52 -S'
lOllCillO 1011 1001 1010 Olli Olli 0110
Olli omi OlOl 1100 1110 01«) 1010 1011 Cl}



t
A

/m
Ij-J » 1
a

Cb)
Figure 7: A typical signature file and signature tree.
3 The Structure of SD-tree
In this section we describe the structure of SD tree. There are three types of nodes:
>	Internal nodes
>	Leaf nodes
>	Signature nodes
The internal nodes and leaf nodes are somewhat similar to the internal nodes and leaf nodes of B+ trees respectively. The internal nodes form the upper tree and leaf nodes at last but one level. The signature nodes are at the bottom level of the SD-tree. We will now explain the structure of the nodes in detail. To make discussion simple, we assume the tree order as 3 for a signature file with 8 block signatures of length 12.
3.1 Structure of Internal node
An internal node of SD-tree is illustrated in Fig. 8. Like B+ tree internal node pointers and keys alternate each other. For a tree of order 3 the internal node has two keys K1 and K2 and three pointers P1, P2, P3. These pointers are tree pointers pointing to the nodes at the lower level.
While searching, the left tree pointer is followed for values less than or equal to the node value, else right pointer is followed for values greater than the node value as in B+ tree.
Figure 6: A typical signature file and signature graph.
2.3.7 Signature Tree
The signature tree is a binary tree like structure with nodes representing the bit positions and left and right sub tree followed for binary values 0 and 1 respectively. Each signature is identified from the root by checking the bit positions dictated by the nodes. Nevertheless for a query signature the tree is searched top to bottom according to the bit positions dictated by the nodes rather than the 1s in query signature. Also, for a match with bit 1, searching follows the right sub tree and for 0 at a node both left and right sub trees are followed.
That is for a balanced signature tree more than one path is traversed. Fig. 7. indicates a signature file and its corresponding signature tree. The thick lines in Fig 7(b) indicates the signature identifier that corresponds to S3. In the following section we discuss a quite different method called SD-tree which considers only the positions of 1s in a given signature and decluster them over a set of leaf nodes so that query response time is improved.
P	K	P	K	P
1	1	2	2	3
Figure 8: A typical structure of an internal node.
3.2 Structure of a leaf node
The leaf nodes appear in the last but one level of the SD tree. Like B+ tree all the key values appear in ascending order of their values in the leaf nodes and are connected to promote sequential search. But unlike B+ tree in SD-tree each value is followed by a signature node instead of data pointer. This is depicted in Fig 2. Pointers P1 and P2 point to the corresponding signature nodes for K1, K2; P3 is the sequential pointer to next leaf node and P4 is the backward pointer from next leaf node.
P
P
P
Figure 9: A typical leaf node entry.
1
1
2
3.3 Structure of signature node
The structure of a signature node is shown in Fig. 10. The signature node for Ki has 2i-1 binary combinations denoting the possible prefixes. When a signature Su with 1 in the i th position is to be inserted the intermediate prefix formed (explained in section 4) is compared with the binary combinations in the signature node at Ki and u is inserted in the list.
Rootnode
B1	Signatures having prefix B1
B2	Signatures having prefix B2
	
Bn	Signatures having prefix Bn

I /		/	2			
		\ '				
1	2		--►	o 3		4 "ì
			. \ ^			
node
0		2 1	H'
-L			
1	^ 4		H'
			
10
Signature node
(a)
Figure 10: A typical signature node entry.
3.4 Overall structure of SD-tree
Consider the partially filled SD-tree shown in Fig 11(a) for discussion. The tree has been constructed for signature length F = 12. It is obvious from Fig. 11(a) tree of order 3 has height 2. Consider the signature file in Fig 11(b). To insert signature S1 with first occurrence of 1 at position 2, access the leaf node with value 2, follow its signature node and write the signature value as 1 (for S1) with the prefix 0 (for bit 1). In the same way S2 is inserted in signature node 1 with no prefix and signature node 3 with prefix 10 (for bits 1 and 2). S4 will be inserted at signature node 1 with no prefix and signature node 2 with prefix 1.
While inserting a signature there are two possible options to move to next 1s position in the leaf node. One is via the sequential pointer between leaf nodes and the other is the top-down traversal of tree from root. To ensure optimal search path the threshold (Th) is fixed at h+1 (h being the tree height plus one more level for accessing signature nodes). As long as the number of sequential pointers traversed in leaf nodes is within the specified Th value, follow the sequential pointers.
This is calculated by finding the difference (d) between two consecutive 1s in a signature divided by the number of entries per leaf node. This is given by the condition
d
— < Th , where p is the order of the tree. p-1
The division here is integer division. When the above condition is not true, a new tree access is initiated from root. Similarly to promote optimal query processing the query signature value Sq is taken in decimal form. The occurrence of the last one (in the least significant position) is found by performing D Mod 2 operation. The remaining binary prefix is formed in the process of decimal-to-binary conversion of D. The procedures for tree maintenance are explained in the following section.
S1	010..........
S2	101..........
S3	000..........
S4	110..........
(b)
Figure 11: Overall structure of SD-tree.
To minimize the number of false drops the database size is selected according to [4, 27, 30] as
F ln 2 = mD
(1)
where F is the signature length, m the number of set bits and D the average data block size.
4 Tree Search and Updates
This section lists the algorithms for signature insert, delete and search operations on SD-tree. A global flag F is set to 0 indicating the search path from the root of the tree by default. In the procedure after the first 1's insertion, depending on the d value F may be set to 1.
4.1 Insertion
The algorithm for signature insertion is outlined in this section. The call to procedure New(node) returns a new link in the signature node. Insert (Su)
Input : The signature to insert Su ;
1.	Let il, i2,.....in be the positions of 1 in Su ;
F ^ 0; Th = h+1; // for tree of order p
2.	Move Ì2 to in to queue Q. B = NULL;
3.	If (i1 = 1) then begin Access leaf node i1 ; // from root New(node);
insert u;
B = strcat ( B, '1'); // to denote bit 1 position end else begin for k = 1 to i1 - 1 do B = strcat ( B, '0');
2
4
2
Access leaf node ii ; // from root New(node); write (B); insert u ;
B = strcat ( B, '1'); end
f = i1; // store the current bit position F ^ 1; // enable sequential search in leaf
nodes
4. While Q not empty do begin
read x from Q; if (x = f+1) then begin
Access leaf node x; // via leaf node
pointers
If (not(B)) then New(node); // create
node
Write(B);
Write u @ prefix B; end
else begin
d = x -f;
If (d/(p-1)) > Th then F ^ 0; for k = f +1 to x-1 do B = strcat ( B, '0'); Access leaf node x; If (not(B)) then New(node); Write(B);
Write u @ prefix B end
B = strcat (B, '1'); f = x;
end. // until queue is empty 4.2 Searching
The following algorithm outlines the steps to search for signatures matching a given query signature Sq. In the procedure F ^ 0 always and the algorithm lands up directly in the signature node corresponding to last 1 from root. The match here is the exact match and the optimal signature length selected in equation (1) minimizes the false drops. Search(Sq)
Input : The (query) signature to search as decimal D. Output : The list of signatures matching the given signature.
1	Repeat i= D mod 2; D = D div 2; until i = 1
2	Let n be the corresponding bit position of i.
3	Let B = toBinary(D); // convert D to binary
4	Access leaf node n // from root node
5	Search for B.
6	If Found() then read and output the list of signatures.
1.	Let i1, i2,.....in be the positions of 1 in Su.
2.	For each ik (1 < k < n) form prefix B as in Insert
(Su).
3.	Access the leaf node and follow the signature node;
4.	Access prefix B and search for u.
5.	If present, delete it .
6.	Repeat steps (2) through (5) for all iks.
5 A sample validation model
This section discusses the evaluation of sample Object-Oriented queries on a hypothetical object base. Fig 12 shows the class diagrams in UML notation [15,19]. The classes and their relationships are listed in Table 1.
S. No	Class 1	Class 2	Relationship
1.	University	Dept	Composition
2.	University	Student	Aggregation
3.	Student	Programme	Association
4.	Dept	Instructor	Association
5.	Student	Male	Generalization
6.	Student	Female	Generalization
7.	Programme	Subject	Association
Table 1: Class relationships
The sample hashing outputs of attributes values are listed in Table 2.
Dept-name Mathematics - 1010 0000 Computerscience-0100 1000 Physics -0001 0010
Pgm-name M.E -0001 0100 M.Sc (S.E) - 1000 1000 M.Sc (Mat) -0100 1000 M.sc (Phy) -0011 0000
Inst-name John - 1000 0010 Adams - 1000 1000 James -0000 1001 Janes -0110 0000
Stud-name David -0100 0010 Elena - 1001 0000 Maria -0100 0001 Peter -0010 0001 Grace -0000 1100 Antony - 0000 0011
Male Sex -0010000
Female Sex -00001001
Table 2: Sample attributes
The attributes' hash coded values are superimposed to produce object's signature. The set of all object signatures of a class correspond to the signature file. The signature files created for various attribute combinations of objects are listed in Table 3.
4.3 Deletion
The algorithm to delete a signature from SD-tree is described below. Delete (Su)
Input : Su, the signature to delete.
Table 3: Signature files of classes
Class : Subject Sub-name
1.	Software engg - 0100 0001
2.	Comp. applns - 0010 1000
3.	Comp. engg -0100 0100
4.	Applied Maths - 1000 0001
5.	Calculus -0010 0100
6.	Nuclear physics - 0101 0000
Class : Male Stud-name Pgm-name Sex
1.	David
2.	Peter
3.	Antony
Stud-name
1.	Elena
2.	Maria
3.	Grace
M.Sc (S.E) M.E M.E
Male Male Male
Class : Female
Pgm-name	Sex
M.Sc (S.E)	Female
M.Sc (S.E)	Female
M.Sc (Phy)	Female
Object signature 1101 1010 1011 0101 1001 0111
Obj signature 1001 1001 1100 1001 00111101
Class : Programme
Subjects
Pgm-name
1.	M.E
2.	M.Sc (S.E)
3.	M.Sc(Phy)
Comp.applns, Applied Maths
Software engg, Comp. engg Comp. applns, Nuclear physics
Instname 1. John
2. Adams
3. James
4. Janes
Class : Instructor
Dept-name Subjects
Comp.sc
Mathematic s
Comp.sc Physics
Software
engg,
Comp.
applns
Applied
Mathematic
s, calculus
Software
engg
Nuclear
physics
Obj.
signature 1011 1101
1100 1101 0111 1000
Obj
signature 1110 1011
1010 1101
0100 1001 0111 0010
Class : Dept
Dept-name	Programs	Obj signature
1.Computer	science M.E,M.Sc (S.E)	1101 1100
2.	Mathematics M.Sc (Mat)	1110 1000
3.	Physics	M.Sc(Phy)	0011 0010
Figure 12: Sample Object schema.
Stud-name
1.	David
2.	Peter
3.	Antony
4.	Elena
5.	Maria
6.	Grace
Class : Student Pgm-name Obj signature
M.Sc(S.E)
M.E
M.E
M.Sc(S.E) M.Sc(S.E) M.Sc(Phy)
1100 1010 0011 0101 0001 0111 1001 1000 1100 1001 0011 1100
1.	List of students doing a given programme
2.	Instructors handling a given subject
3.	Dept(s) offering a given subject
3.	All instructors of a given dept
4.	Female students attending a given programme
Pgm-name M.Sc(S.E)
Student
Instructor
M.Sc(S.E) in subjects
1. Comp. applns in Programme subjects
1000 1000
0100 0001 0010 1000
2. Pgm-name 00010100, 00110000
Dept-name = Comp.sc Pgm-name = M.Sc(S.E)
Dept
Instructor Female
0100 1000 1000 1000
1100 1010 (David) 1001 1000 (Elena)
1100	1001 (Maria) 1110 1011 (John) 0100 1001 (James) 1011 1101 (M.E) 0111 1000 (M.ScPhy)
1101	1100 (Comp.sc) 0011 0010
(Phy)
1110 1011 (John) 0100 1001 (James) 1001 1001 (Elena) 1100 1001 (Maria)
Table 4: Sample queries
6 Experimental results
To validate the proposed structure we implement SD-tree in Java and for every test run the tree is constructed statically before signature insertion. The parameters considered in the experiments' data sets are Signature length (F), Signature weight (m) and signature weight distribution (swd).
The experiments were carried out in a standalone system with Intel Pentium IV processor. The main memory size is 512 MB and the hard disk capacity is 80 GB.
6.1	Signature tree Versus SD-tree
In this section the parameters which are generally considered in the analysis of indexing structures like time and space complexities are reported. We compare the results of Y. Chen's Signature tree [30] with that of the SD-tree. The observed results are listed in Table 5.
6.2	Time Complexity
Like in other signature applications we use the response time as the performance measure [23]. The time complexity of the insert algorithm initially is the sum of time taken to construct the B+ tree of order p and the time taken for inserting. Here B+ tree is constructed with
values 1,2, ....., F where F is the length of the signature
for a given dataset. Hence compared to the use of B+ tree as index structure for large datasets the value F is small which reduces the time taken for SD-tree construction considerably. In algorithm 4.1 the time complexity for insertion is bounded by O(nm) where n is the number of signatures in the file and m is the number of 1s in the given signature as against the O(nF) in the signature tree insertion[30], n being the number of signatures in the file and F is the full length of signature including 0's and 1's. Since deletion follows similar steps as insertion the time complexity is same for both. Another desirable characteristic of SD-tree is that for higher F values, by varying p, the value of h, height of the tree can be kept small to promote faster search. It is observed that the
time taken for tree construction when F = 10 and p = 3 is 2015421 nano seconds.
Parameter	Signature tree	SD-tree	Inference
Time complexity	O(nF)	O(nm)	m < F; Faster insertion
Tree height	O(log 2 n)	O(logp(F/(p-1))	p>2 ; Shorter tree
Search cost	O(1.log 2 n)	O(log p F+a)	F < n; Cost< Sig. tree
Space complexity	nlog 2 F + 2 X 2' (i+1) '=0 to k	O(F(a+f))	< Sig. tree when k>F
Table 5: Signature tree Vs SD-tree In Table 5,
n - Number of signatures in signature file F - Length of signature m - Number of set bits p - Order of SD-tree
1 - Number of path traversed in query searching a - Average no. of signatures / signature node k - log 2 n
f - Average no. of prefix values / signature node
Similarly search time is the sum of time taken to access the leaf node (Tl ) and signature node search time (Tsi). This is given by
Ts = Tl +Tsi
Here, Tl is constant for all leaf nodes for a dynamic balanced structure like SD- tree and Tsi is directly dependent on the i value and the number of signatures inserted in the signature node. In the worst case the search time is bounded by O(Tl + 2 ' -1).
To analyze the query response time the signature weight distribution was fixed as 100% , 70%, 50% and 30% for signature lengths 10 and 30. The values are plotted in Fig. 13. through Fig. 16. The weight of the
signature was biased in upper byte(U), lower byte(L) or uniformly distributed(M) and time values noted.
For 100% swd the insertion and search time is a constant of the signature weight bias. This is depicted in Fig. 13. In the same way the swd was fixed at 70, 50 and 30 and the observed values are plotted in Fig. 14, Fig. 15 and Fig. 16 respectively. The query response time between two consecutive queries is minimized by following the backward pointers in leaf nodes when the following condition is true. That is, di,j
- < Th
p-1
where d .j is the difference in Sqi's last 1's position and Sqj's first 1's position.
All the graphs show that the time taken for signature insertion grows linearly with the values of swd, F and the weight bias. Insertion time increases in upper nodes due to the complexity of the circuit in creating prefixes.
SD-tree maintenance and space overhead
SD-tree maintenance is quite simple that the tree is not subject to extensive node split or merge. This is because the insertions and deletions do not affect the node values or the height of the tree. Operations are reflected only in the signature node. In the experiments the binary prefix pattern nodes are created dynamically. The space consumption for insertion of a signature number at a signature node depends on the binary prefix length (l), the pointer size (p) and the space for writing the signature number(n). The prefix is created anew each time if it does not exist. Hence, the space complexity to insert with prefix for a signature of weight (w) is given by O[w((l+2p)+(n+p))]. Similarly the space complexity of a signature for which prefix already exists is bounded by O[w(n+p)].
Fig. 17 shows the space overhead of SD-tree. The tree is created statically. Signature nodes are created dynamically and are of fixed size. It is clear from the graph that the space consumption increases linearly with F and w. It is obvious that for all combinations of values the query search time for SD-tree is lesser than signature insertion time.
For swd of 70%, 50% and 30% the signature weight was biased in lower byte, upper byte or uniformly distributed and values noted. All the outputs clearly indicate that the time taken for signature insertion and query response is slightly higher in upper levels.
Sig. wt. distrn = 100%
300000 0 250000 Ü 200000 z 150000
.iE 100000
<u 50000 iE 0
		
		
		
		
L.Jl.^.J	■X	.a.:
□	Search Time
□	Insert Time
L - M - U - L - M - U -10 10 10 30 30 30
Sig. wt. bias
Figure 13: 100% swd.
Sig. wt. distrn = 70%
250000 200000 150000 100000 50000 0
			
	J	r	
□ Search Time ■ Insert Time
L - M - U - L - M - U -10 10 10 30 30 30
Sig. wt. bias
Figure 14: 70% swd.
Sig. wt. distrn = 50 %
S^ 200000 n
z 100000 50000
(U
E 0

in! R II R li
L - M - U - L - M - U -10 10 10 30 30 30
Sig. wt. bias
□ Search Time ■ Insert Time
Figure 15: 50% swd.
Nevertheless the query response time is lesser than signature insertion time. As the structure complexity increases in signature nodes in upper levels the swd was analyzed for both ends separately.
Sig. wt. distrn = 30 %
150000 100000
.i^ 50000 0)
.Ü 0
n ri J

□ Search Time ■ Insert Time
L - M - U - L - M - U -10 10 10 30 30 30
Sig. wt. bias
Figure 16: 30% swd.
Space overhead
"tin	1500 J
	
m	1000 -
	
"j?	500
O	
E	
o S	0 -

rdi rÉI
□	F = 10 DP = 20
□	F = 30
30% 50% 70% Sig. wt. distrn
Figure 17: Space overhead of SD-tree.
7 Conclusion and research directions
In this paper we presented a novel way to represent signatures in a B+ tree like structure called SD-tree and analyzed the performance for query response time.
By varying the signature length and distribution of 1s in the signature the query response time was noted and results plotted. It is clear from the graphs that considerable search time is saved.
The space overhead in SD-tree may be higher due to the presence of binary prefixes in higher order signature nodes, but the flexibility provided by the SD-tree outweighs all besides simple maintenance and faster query retrieval time.
The work is proposed to extend in the following directions. The synthetic data sets are to be replaced with, run and verified on a real Object Oriented Data Base system. Another direction is when the signature weight is more than 50%, use 0s so that number of signature nodes accessed for insertion and search is optimal. Also the structure can be modified to support point and range queries in Object Oriented Data Base system.
References
[1]	Charles S. Roberts, (1979) Partial-Match Retrieval via the Method of Superimposed Codes, Proc. of IEEE, Vol. 67, No. 12, pp 1624 - 1642.
[2]	Chris Faloutsos, Stavros Christodoulakis, (1984) Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation, ACM Trans on Office Information Systems, Vol.2, No. 4, pp 267 - 288.
[3]	Christos Faloutsos, (1985) Access Methods for Text", ACM Computing surveys, Vol. 17, No. 1, pp 49 - 74.
[4]	Chris Faloutsos, (1985) Signature files: Design and performance comparison of some signature extraction methods'" Proc. of ACM SIGMOD, pp 63 - 82.
[5]	Christos Faloutsos, Stavros Christodoulakis,(1985)
Design of a Signature File Method that Accounts
for Non-Uniform Occurrence and Query Frequencies, Proc. of VLDB, pp 165 - 170.
[6]	Christos Faloutsos, Stavros Christodoulakis,(1987) Description and Performance Analysis of Signature File Methods for Office Filing, ACM Trans on Office Information Systems, Vol. 5, No. 3, pp 237 -257.
[7]	Christos Faloutsos, Stavros Christodoulakis, (1987) Optimal Signature Extraction and Information Loss, ACM Trans. On Database Systems, Vol. 12, No. 3, pp 395 -428.
[8]	Christos Faloutsos, Raphael Chan, (1988) Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison, Proc. of VLDB, pp 280 -293.
[9]	Chris Faloutsos, Raymond Lee, Catherine Plaisant, Ben Shneiderman, (1990)
[10]	Incorporating String Search in a Hypertext System: User Interface and Signature File Design Issues, Hypermedia, Vol. 2, No. 3, pp 183 - 200.
[11]	D. Dervos, Y. Manolopoulous, P. Linardis, (1998) Comparison of Signature File Methods with Superimposed Coding, J. Information Processing , 65, pp 101 -106.
[12]	Douglas Comer, (1979) The Ubiquitos B-Tree, Computing Surveys, Vol. 11, No. 2, pp121 - 137.
[13]	Dik Lun Lee, Young Man Kim, Gaurav Patel, (1995) Efficient Signature File Methods for Text Retrieval, IEEE TKDE, Vol. 7, No. 3, pp 423 -435.
[14]	Eleni Tousidou, Alex Nanopoulos, Yannis Manolopoulos, (2000) Improved Methods for Signature-Tree Construction, The Computer Journal, Vol. 43, No. 4, pp 301 - 314.
[15]	Eleni Tousidou, Panayiotis Bozanis, Yannis Manolopoulos, (2002) Signature-based structures for objects with set-valued attributes, Information Systems, Vol. 27, No. 2, pp 93 - 121.
[16]	Grady Booch, James Rambaugh, Ivar Jacobson, (2003) The Unified Modeling Language User Guide, Pearson Education Pte Ltd.
[17]	Hwan-Seung Yong, Sukho Lee, Hyoung-Joo Kim, (1994) Applying Signatures for Forward Traversal Query Processing in Object-Oriented Databases, Proc. 10*^ Intl. conf. Data Engg, pp 518 - 525.
[18]	Justin Zobel, Alistair Moffat, Kotagiri Ramamohanarao, (1988) Inverted Files Versus Signature Files for Text Indexing, ACM Trans. On Database systems, Vol. 23, No. 4, pp 453 - 490.
[19]	A. Kent, R. Sacks-Davis, K. Ramamohanarao, (1990) A Signature File Scheme Based on Multiple Organizations for Indexing Very Large Text Databases, J. Am. Soc. Information Science, 1990, Vol. 41, No. 7, pp 508 - 534.
[20]	Martin Fowler, Kendall Scott, (2003) UML Distilled, A brief guide to the standard Object Modeling Language, II edition, Pearson Education Pte Ltd.
[21]	Paolo Ciaccia, Paolo tiberio, Pavel Zezula, (1996)
Declustering of Key-Based Partitioned Signature
Files, ACM Trans. On Database Systems, Vol. 21, No. 3, pp 295 -338.
[22]	Per-Ake Larson, (1984) A Method for Speeding up Text Retrieval, ACM SIGMIS, Database Winter, Vol.15, No. 2, pp 19 -23.
[23]	Rudolf Bayer, Karl Unterauer, (1977) Prefix B-Trees, ACM Trans. On Database Systems,Vol. 2, No. 1, pp 11 -26.
[24]	23. Seyit Kocberber, Fazli Can, (1999) Compressed Multi-Framed Signature Files: An Index Structure for Fast Information Retrieval, Proc. ACM Symp. Applied Computing, pp 221 - 226.
[25]	24. Uwe Deppisch, (1986; S-Tree: A Dynamic Balanced Signature Index for Office Retrieval, Proc. ACM SIGIR conf, pp 77 - 87.
[26]	Walter W. Chang, Hans J. Schek, (1989) A Signature Access Method for the Starburst Database System, Proc. of VLDB, 145 - 153.
[27]	Wang-chien Lee, Dik L. Lee, (1992) Signature File Methods for Indexing Object-Oriented Database Systems, Proc. 2nd Intl. Comp. Sc. Conf, pp 616 -622.
[28]	Yangjun Chen, (2002) Signature Files and Signature Trees, Information Processing Letters, 82, pp 213 -221.
[29]	Yangjun Chen, Yibin Chen, (2004) Signature File Hierarchies and Signature Graphs: a New Index Method for Object-Oriented Databases, Proc. of ACM Symp. on Applied Computing, pp 724 - 728.
[30]	Yangjun Chen, (2005) On the Signature Trees and Balanced Signature Trees, Proc. of ICDE, pp 742 -753.
[31]	Yangjun Chen, Yibin Chen, (2006) On the Signature Tree Construction and Analysis, IEEE TKDE ,Vol.18,No.9, pp 1207 - 1224.
[32]	Yoshiharu Ishiwaka, Hiroyuki Kitagawa, Nobuo Ohbo, (1993) Evaluation of Signature Files as Set Access Facilities in OODBs, Proc. of ACM SIGMOD, pp 247 - 256.
Improving Design Pattern Adoption with an Ontology-Based Repository
Luka Pavlič, Marjan Heričko, Vili Podgorelec and Ivan Rozman Institute of Informatics, University of Maribor, FERI Smetanova ulica 17, SI-2000 Maribor Slovenia
E-mail: luka.pavlic@uni-mb.si
Keywords: design patterns, semantic web, ontologies, design pattern repository
Received: February 7, 2008
In software engineering, an efficient approach towards reuse has become a crucial success factor. Conceptual simple high level approaches to reuse are the most appropriate for performing it in a useful manner. Design patterns are reliable and an effective high level approach that enables developers to produce high quality software in less time. Unfortunately, the rapidly growing number of design patterns has not yet been adequately supported by efficient search and management tools, making the patterns uninviting for a large part of the software development community. In this way, the issue of managing and selecting design patters in a straight-forward way has become the main challenge. In this paper, we propose a possible solution for the improvement of design pattern adoption and present a platform that should give design patterns some new and long-overdue momentum. Using our proposed technique for formal design pattern specifications, we have developed an experimental prototype of a new design pattern repository based on semantic web technologies. A new Ontology-Based Design Pattern Repository (OBDPR) has been developed that can also be used as a platform for introducing advanced services. Some fundamental services - searching, design pattern proposing, verification and training services - have already been developed and many others are proposed. Based on the conducted experiments, it is our strong belief that the proposed approach together with the platform's potential -- can significantly contribute to the improvement of design pattern adoption. Povzetek: Za ponovno uporabo programske opreme je razvita nova metoda z uporabo ontologije in repozitorija.
1 Introduction
Software patterns offer the possibility of achieving reuse	pattern user is critical for achieving the full benefits of
in the area of software engineering. In software	design patterns [4].
engineering, several levels of reuse are established. The	The goal of this paper is to improve design-pattern
reuse of concrete software elements such as functions,	adoption within the context of a typical pattern use case:
classes and components have already been well	a user has to select an appropriate design pattern,
established and practiced on a daily basis. However, if	understand it and its consequences in detail, and also use
we observe reuse at higher levels of abstraction, i.e.	it efficiently. To achieve this lofty goal, we will firstly
software patterns, reuse is still not practiced on a daily	explain how to introduce an appropriate design pattern
basis.	presentation technique. In order to do this, we will
A pattern is a form of knowledge that is used to	consider several formal design pattern presentation
capture a recurring successful practice [10]. Basically, a	techniques, as presented in this paper. Before we dig
pattern is an idea that has been used in a practical context	deeper, let us clarify the boundaries of our research.
and probably will be useful in others [24]. As such,	While speaking of software patterns we are addressing a
software patterns delineate the best practices for solving	whole family of patterns. There are a lot of software
recurring software design problems and are a proven way	pattern types that have been recognized so far. Some
of building high quality software [5]. They capture	authors have proposed a general software-pattern
knowledge that experienced developers understand	taxonomy [24] :
implicitly and facilitate training and knowledge transfer	• Patterns in software analysis, which are the most
to new developers [17]. One survey [13] has indicated a	abstract software patterns;
low adoption of design patterns among practitioners -	• Architectural patterns;
respondents estimated that no more than half of the	« Design patterns;
developers and architects in their organization knew of,	« Interaction patterns and
or used, design patterns. Therefore, bridging the gap	, Patterns in software implementation (also
between the pattern expert communities and the typical	known as idioms)
We can, however, split software pattern categories further (e. g. database design patterns, ontology design patterns, communication design patterns etc). It is almost impossible to provide a formal specification for software patterns in general, since every software pattern family addresses a different set of aspects. Software pattern formal specification is, as will be discussed later, necessary in order to provide advanced automatic services. We have limited our research to object-oriented design patterns, since they are the most used and well-known software patterns [27]. This limitation is based on the observation that catalogues with only a few design patterns have clearly been shown to be problematic [5]. However, it is our belief that the approach we propose could eventually also be used with other software patterns, especially when considering the requirement that one could also formalize aspects specific to covering expert knowledge.
In our work, we do not try to formalize all possible aspects. Our goal is to provide a human and machine understandable foundation, primarily to support the design pattern selection process. We do not therefore cope with formalizing the pattern implementation or verification, for instance. Instead, we formalize object-oriented concepts, relationships and expert knowledge on design patterns.
Design patterns improve software design productivity and quality for the following reasons [22]:
•	They capture previous design experiences, and make it available to other designers - designers do not need to discover solutions for every problem from scratch.
•	They form a more flexible foundation for reuse, as they can be reused in many ways.
•	They can be used as a tool for communication among software designers. In fact, this was the original idea of introducing design patterns.
Although design patterns could help significantly in producing high-quality software, developers are continuing to experience more and more difficulties e. g. when finding patterns to match their design problems. It seems that managing and searching facilities are not catching the growing number of design patterns. In this paper, we will also address this challenge. Since we have had many positive experiences in initiating developers to use design patterns, we have also decided to develop an integral web-based platform, primarily to help select design patterns. The platform (Ontology-Based Design Pattern Repository - OBDPR) presented in this paper is a platform and, as such, provides the basis for automatic and intelligent services to be built on top of formally presented design pattern knowledge. We have developed a set of services, also described in this paper, on top of OBDPR:
•	design pattern searching service,
•	design pattern proposing service, which can also be used as a design pattern suitability verification service,
•	training service.
Based on formal design pattern representation, we have introduced capabilities known from the artificial intelligence area into OBDPR and its services. In this paper, we will present the initial experiment performed, which should demonstrate the usefulness of our approach. Moreover, we are planning to perform additional rigorous experiments to indicate if and how much our platform helps software engineers, especially inexperienced ones. We assume that positive experiences with students could also be achieved with full-time developers.
The structure of this paper is as follows: In the "Related work" chapter we present known techniques in formalizing design patterns and their possible and concrete applications. Based on this, we present our own method for formal design pattern representation in chapter three. In that chapter, we also discuss why and how to use ontologies while addressing challenges related to selecting, understanding and using design patterns. Chapter four gives a detailed insight into the conceptual and technical background of OBDPR. The platform's functionality and additional services are also described. The results of introducing the platform are presented in chapter five. The findings of the initial experiment, where users were exposed to solving design problems with and without our tools, are also presented. Chapter six shows some future trends in our research activities. The final chapter summarizes the most important points of this paper and concludes it.
2 Related work
Since 1994, when design patterns were introduced, many different approaches have been used for documentation purposes. In general, there are three main categories for descriptions:
•	informal representations,
•	semiformal representations based on graphical notations such as UML and
•	various formal representations, which also include notations using semantic web technologies.
Design patterns are traditionally represented by informal, loosely structured documents. These documents are in a canonical form, which consists of a series of fields (name, intent, applicability, structure, participants, consequences, implementation etc), defined by informal descriptions. They help developers understand patterns, but there are glaringly obvious issues regarding advanced knowledge management possibilities. We can also find several semiformal representations, most of them are based on UML [6, 8, 3, 12]. These representations are efficient for a basic understanding of patterns since they cover their structural elements. They are strongly supported by tools, which enable developers to include design patterns in their solutions in a straightforward way. They are successful at capturing structures (static image, usually shown with
class diagrams) and behavior (dynamic image, usually shown with sequence or collaboration diagrams). They do not provide information and knowledge on high level aspects such as intent, usability and consequences, on the other hand. For enabling sophisticated services on design patterns, e.g. the ones listed in the prior chapter, we need fully formal representations. The main goals of formalizing design patterns that are recognized within the community are [22]:
•	Better understanding of patterns and their composition. It helps to know when and how to use patterns properly in order to take full advantage of them.
•	Resolving issues regarding relationships between patterns. It is not only relevant which design patterns are used to solve a problem, it is also important in which order they are applied.
•	Allow the development of tool support in activities related to patterns.
In general, when talking about tool support, researchers are currently trying to develop a formal representation of design patterns, primarily for [22]:
•	searching for patterns in existing solutions,
•	automatic code generation,
•	formal solution validation.
A typical use case for this would be the following:
•	Developers include a design pattern in their solution. The code is generated automatically. Tool support mostly includes the design pattern in UML diagrams. Other languages are also supported in some tools (e. g. DPML, RSL, RBML, LePUS - see [22]).
•	After further development the solution is completed.
•	Testers can use tools for formal testing based on design patterns. Tools can find semantic errors in a syntactically perfect solution. Improvements can also be proposed. Since this functionality can really demonstrate the tool's ability to do some inference, it is supported by almost all tools based on languages that enable code X-ray and inference (see Table 1). PEC (Pattern Enforcing Compiler) goes even further - it includes design patterns in the solution at compile time in order to avoid some errors.
There have been several attempts at introducing formal representations in the design patterns area. Some of them are based on pure mathematics, such as firstorder logic, temporal logic, object-calculus, p-calculus and others [22]. On the other hand, some authors [7][8] are trying to formalize design patterns and keep them understandable for humans at the same time. It is the idea, similar to the semantic web (to keep data semantically understandable both to human and machine). Their representation is mostly supported with ontologies. It is used primarily to describe the structure of source code, which is done according to particular design pattern. One of those used in the "Web of
Patterns" (WoP) project [1] has addressed the area of describing knowledge on design patterns.
As stated above, the authors [22] propose several tools; some of them are available for production environments. However, one could also imagine other tools supporting activities regarding design patterns. For instance: before we introduce a design pattern to our solution, how to select an appropriate one? If we have an idea of using a design pattern, one can imagine if the selected pattern would do the desired job. One could also speculate if there is any design pattern that is more suitable than the one currently used. Those were just a few ideas about how to use formal design pattern knowledge in applications. We have not found complete and proven solutions to these challenges, even if there are a few tries, based on keywords rather on design pattern knowledge (for instance [25]). To summarize, for supporting those and other activities, the authors are trying to formalize several aspects of design patterns. They can be divided into the following areas [21]:
•	pattern structure (classes, methods, relationships etc),
•	pattern behavior (e.g. method call sequence),
•	pattern implementation,
•	context prerequisites for using design patterns,
•	verifying design and implementation based on patterns,
•	pattern compositions.
It depends on the formal representation goal for which area of pattern will be formalized (see Table 1). For details on several methods see [22]. In Table 1, we summarize the most important techniques available today. We believe it is important if a method has only theoretical foundations or if it actually has direct tool support. In Table 1, we also show the aspects being formalized by a certain method (we limit the summary to a static and dynamic aspect of a design pattern).
Table 1: The most important formal methods for
Name	Tool support?	Static or dynamic aspect	Purpose
DPML	0	both	MDA (Model-Driven Architecture)
RSL	0	both	MDA, code verification
OCSID		both	
SPINE	0	static	Code verification
SPQR		static	Code X-ray
Object-Calculus		both	
RBML	0	both	MDA
Slam-SL	0	both	Inference in general
ODOL (OWL)	0	static	Pattern Repository
URN		both	Ease of use
PEC	0	static	Enforcing compiler
LePUS	0	static	Pattern Repository
TLA		dynamic	
FOL		static	
Prolog		static	
BPSL		both	Inference in general
After reviewing related works and the benefits of using ontologies, which will be explained later, we also decided to employ them in our work. Although there are some ontologies available (e.g. ODOL), we did not use any existing one. Having a separate ontology is not a problem, since there is a possibility of connecting ontologies in a straightforward way. As will be seen in subsequent chapters, we can benefit from combining our solution with others - especially the WoP project [1].
3 The role of ontology in OBDPR
3.1 Using semantic web technologies in OBDPR
The idea of the semantic web allows automatic, intelligent inferring of knowledge, supported by ontologies. The basic idea of the semantic web is a different organization and storage of data and, subsequently, new possibilities for using this data [19]. The barrier that prevents more advanced usage of available data is believed to be the semantic poorness of today's solutions. The vast majority of data is presented as a very simple, non-structured human readable and human understandable material. The result is an inability to make real use of the enormous amount of available "knowledge". In order to overcome these difficulties, the concept of meta-data was introduced into the core of the semantic web. Using meta-data, so called smart agents can be used to search for information by content and to infer on gathered concepts. As a foundation, there has been a lot of work done with regard to common formats for the interchange of data and the common understanding of common concepts. This allows a person to browse, understand and use data in a more straightforward way, and a machine to perform some intelligent tasks on data automatically. Furthermore, semantic web ideas can be used in an internal enterprise information system for knowledge management in a different way to introduce new intelligent services. In the semantic web, knowledge is represented as graphs, and written down in an XML-based language called RDF (Resource Description Framework) [16]. RDF deals with URIs (another W3C standard for naming resources globally unique). The advanced use of semantically annotated data can only be accomplished using ontologies in RDFS or OWL (Web Ontology Language) [14] documents. There is also a language for efficiently querying RDF-represented knowledge, SPARQL [18]. The whole stack of semantic web technologies is available and described in [20] (see Figure 1).
Figure 1: The semantic web technologies stack [20] 3.2 A new ontology
The semantic web allows knowledge to be expressed in a way that enables machine processing and its use in web environments by both intelligent agents and human users [23]. It is considered to provide an efficient way of presenting data, information and knowledge on the internet or in the scope of a global interconnected database. Since many semantic web technologies have reached high community consolidation and have become W3C standards (including RDF and OWL) it can also be considered a long-term platform for intelligent services based on a common knowledge base [20].
One of the enabling approaches used in the semantic web is metadata. It is supported by the concept of ontologies and has its foundation in W3C standards. Ontology describes the subject domain using the notions of concepts, instances, attributes, relations and axioms. Among others, concepts can also be organized into taxonomies whereby inheritance mechanisms can be used in ontology. Ontologies are built on description languages, such as RDF(S) and OWL, and add semantics to the model representation. Their formal, explicit and shared nature makes them an ideal object repository for catalogues.
With the presented facts, we also justify our decision to use ontologies as well as other semantic web technologies to provide a basis, not only for design pattern descriptions, but also for future intelligent services:
•	Ontologies in the semantic web has its foundation in W3C standards.
•	Ontology-based design pattern descriptions are computer readable and therefore suitable for automated (computer) processing.
•	Transforming OWL and RDF based design pattern representations into other kinds of representations (in textual or graphical form) can be achieved easily with simple transformations.
•	Enabling technologies are well established, recognized and extendable.
•	They enable the exchangeability of design pattern descriptions in a straightforward way.
•	The semantic web introduces technology that enables knowledge to be distributed.
•	More and more OWL-enabled tools are available which can use and manipulate an ontology-enabled knowledge base.
OBDPR's underlying ontology is implemented using OWL. A core ontology fragment is shown in Figure 2. We use a hierarchical organization of pattern containers. Every pattern container may contain several pattern containers and patterns. This enables us to capture several divisions of design patterns, not only those found in fundamental literature. Every pattern can be included in several containers; the same is true for containers. Patterns themselves are connected in a more logical way by means of related, similar, composed patterns and pattern hierarchies (also mentioned as a pattern language by some authors). Not only patterns and pattern containers themselves are included in the ontology, but there are also real-world examples using patterns to give more meaning to the OBDPR user ("TestCase" class).
Question
TestCase
hasAnswer \
\	answer
caseSolution mayBeAlternafiveTo i	, _ ^ ~
/isRelCedTo, /^PPlV^ ........J
I / isPartOf\ i ' applysTo
Answer
AnswerRelevance
Pattern
PatternContainer
isMemberOf [transitive]
isSubPatternOf [transitive]
isMemberOf [transitive]
Figure 2: Core of OBDPR ontology
There are many benefits to using such ontology. Beware of transitive relations. Using a relation which has transitive properties can help significantly when dealing with design patterns and design-pattern containers. For instance: a service, built on top of OBDPR, has direct access to all members of a particular pattern container -without performing advanced searching. A pattern language (i. e. interrelated patterns) can also be presented in straightforward way. The solution is also prepared for connecting our own ontology and ODOL (ontology in WoP project [1]). We can introduce the relation theSameAs (a relation supported by OWL) between our Pattern concept and Pattern in ODOL. So we have automatic access to a formal representation not only to expert knowledge, but also the structure and behavior of a particular design pattern. Those were just a couple of strong mechanisms supported by the presented ontology.
Furthermore, the expert knowledge aspect is also supported by the presented ontology. Design pattern experts can provide experiences in question-answer pairs, which enables them to capture their implicit knowledge
on design patterns. Not only experts can give experiences to tell which design pattern is used in a particular reallife situation ("Question" class), but they can also specify more possible solutions to a real-life situation ("Answer") with specified probability ("AnswerRelevance"). This value ranges from 0% to 100% and tells the user how likely it is that their particular candidate ("Pattern" or "PatternContainer") is used when the answer to a given question is confirmed as positive. Answers and possible candidates can easily be updated or added to questions at any time with the aid of a rich user-friendly web interface. For instance: Question: How do you want to create objects? Possible answer: Separate construction of a complex object from its representation so that the same construction process can create different representations. ^ You should use the Builder pattern (100%).
Possible answer: Ensure that the class has only one instance. ^ You should use the Singleton pattern (100%).
Possible answer: Create objects without prior knowledge about their concrete classes. ^ You can use several design patterns: Prototype (33%), AbstractFactory(33%) or Factory Method(33%).
This knowledge can also be used by services, run on top of OBDPR in order to achieve intelligent functionalities, such as a guided question-answer dialogue for selecting patterns or verifying design decisions.
4 OBDPR - repository and platform
OBDPR is completely based on semantic web technologies. As a data store, it uses RDF. Since we do not want to rediscover all design pattern knowledge from scratch, we have also integrated knowledge found in other data sources (e. g. Wikipedia, Sun J2EE BluePrints, GoF online patterns etc). These are transformed to RDF, integrated and supported by the presented ontology (see Figure 3). Furthermore, OBDPR is not just a design pattern repository. It is a platform for building intelligent services to improve design pattern adoption. As such, it includes several functionalities:
•	It holds design pattern descriptions, containers and an expert knowledge repository.
•	Allows design pattern experts to annotate patterns with additional knowledge.
•	Integrates knowledge on a particular design pattern from the web (Wikipedia, Sun Blueprints etc) and additional data sources.
•	User-friendly transformations of raw RDF data.
•	Indexes all the integrated data for supporting full text-search capabilities.
•	Full access to RDF data to services built on the platform including questions and answers, which will enable intelligent services to use expert system-like proposing or validating services.
A set of real world examples and appropriate design patterns solutions in order to enable services to be used to train users or to demonstrate the appropriate use of design patterns in real-world examples.
šiiii .'.Ontology-Based
- •	Design Pattern Repositorv
Home Browse Pattern Containers Expert Knowledge Search Patterns Propose a Patte
View Pattern: Memento
Pattern Containers:
Pattern Details
S Presentation Tier J2EE Patterns ^ Direct Members (6);
D Front Controller ^ Ö Intercepting Filter # O Service To Worker ^ D Dispatcher Visw Ö Composite Viaiv #
□	View Helper ^
S Control GoF Patterns ^ Direct Members (4);
□	Strategy #
□	State ^
□	Command
Ö Chain of responsibility ♦
S State Handling GoF Patterns # Direct Members (4); Ö Memento ^
□	Flyweight # Q Singleton
□	Prototype ^
Pattern	Memento
Name;
Pattern	Capture and restore an objed
Description;
Resource GofMemento [ ♦ browse )
ID;
Directly	g state Handling GoF Patterns
included in; ^ Behavioral Patterns
Memento - avoiding histereses Memento - saving state of iterč
A Data a Object Factory Wikripedia
Linked Diagrams;
•SsiMern&rtofir n^merto) ■ +Orra:eMemGrtoli ,
Figure 4: Simple pattern view
recognized in a particular enterprise. Additional expert knowledge can also be provided. Implementing this knowledge is supported for services as well as for browsing with a user-friendly interface - using relatively simple SPARQL queries and transformations (see Figure 5).
Figure 3: OBDPR architecture
To achieve all the above-mentioned functionalities efficiently, we have also used other standard-based tools. For example, to access data written in RDF we use the Jena framework [9]. It is also exposed for services that will run on top of a platform. Figure 4 shows the user interface that uses an RDF presentation of design patterns, accessed with SPARQL queries. The data is then transformed to show a user-friendly view on design pattern container structure and selected design pattern details. This view is provided with minimum coding effort and is truly one of the most successful experiences within OBDPR.
The current OBDPR prototype implements all functionalities mentioned at the start of this chapter. It also offers services built on top of them:
•	A full-text search service,
•	A design pattern proposing service and
•	A training service
All of them are primarily intended to help the design pattern novice.
At the moment, the OBDPR prototype includes all design patterns found in GoF [5] and J2EE [2] design pattern catalogues. It is not limited to those since it is possible to include additional patterns - even those
Figure 5: Expert knowledge view
The implementation technology for OBDPR is Java EE with a Jena [9] framework for accessing and performing core semantic operations on data and ontology. A simple user interface framework with basic functionalities like a raw and user-friendly view on the repository is prepared as previously shown. A framework is fully prepared to host additional services, which are

developed in the future. To address this requirement efficiently, we have implemented an MVC design pattern into our solution. As a case, we have implemented three services on top of the platform. Since they are the core ones, they can also be used by other services. They enable us to use full text search capabilities in OBDPR (Figure 6) as well as training (Figure 5) and proposing services (figure 7). Not only is the data in OBDPR indexed for a full text search, but also data from the web, such as design-pattern-related content from Wikipedia and other design pattern related pages. The underlying ontology also improves the full text search capabilities.
Figure 6: Search module
Figure 7: Proposing module
5 Preliminary experiment
We preliminary tested our approach and a platform for advising on design patterns by conducting an experiment on a group of software developers who had different levels of expertise in the field of design patterns. A series of 19 design problems were presented to each of them. Each participant tried to identify the most appropriate design pattern to solve each of the design problems. The experiment was done in four phases: in the first phase, participants had to answer a few questions concerning their development background and level of expertise. In the second phase, they were given the opportunity to solve their design problems without any assistance and/or tools. In the third phase, the platform was provided to help them solve the same set of design problems. In the end, a post-experiment survey was conducted to gather participants' opinions regarding the usefulness of the platform.
We invited 10 software developers to participate in the experiment. According to their own assessment of their design patterns expertise, five of them self-described as "good", two as "very good" and two as "excellent". Only one participant claimed poor
knowledge of GoF design patterns. The level of expertise was assessed in terms of how many patterns of the GoF catalogue a participant could identify. A comparison of the results achieved in the second and third phase showed that only one participant did not make any progress when using the platform. For the rest of the participants, the platform helped identify, at a minimum, an additional 50 percent in correct solutions. Using the statistical analysis of the results (paired t-test) we determined that the difference between the number of correct solutions found both with and without the platform is statistically significant (P = 0.000706). The results of the postexperiment survey have shown that only one of the participants found the use of the platform to be less efficient than searching for an appropriate solution without the platform. A decreased standard deviation in the results showed that the efficiency of the less-experienced developers (according to design patterns) became more similar to experienced developers. At the same time, the mean value of successfully solved problems rose significantly. We can conclude that developers with less experience in the area of design patterns benefited the most from using the platform. By comparing the frequency of the correct solutions representing a particular design pattern and the frequency1 of use for the same pattern in practice, we found that they did not directly correlate. This can be at least partly explained by the fact that the design problems presented to participants were not of equal complexity. The second possible explanation is that a higher frequency of use does not necessarily mean that the design pattern is better known or indeed easier to understand.
It should be noted that the results have to be taken with some caution, since the number of participants was quite small. Nevertheless, the experiment shows that our platform for advising is helpful to those developers who have a relatively poor knowledge of design patterns. For developers who already have solid expertise, this approach does not offer as many advantages, because the questionnaire does not follow their thought processes (mind maps). However, should a wider variety of design patterns be covered by the platform, the differentiation between experienced developers and those who lack expertise would presumably decrease.
For detailed information about experiment please see [26]. In that paper you can find detailed experiment structure, proposing service is also discussed in depth.
6 Further work
We have developed an integral web-based platform, primarily to help select design patterns. To strengthen confidence in the results, some rigorous real life experiments should be performed in addition to this initial one. They might show if and how much OBDPR helps with formally presented design patterns when adopting design patterns. As previously shown, some
Frequency of use was taken from http://www.dofactory.com/Default.aspx
preliminary experiments have already been performed. Since the results were promising (design pattern adoption rose significantly) we are quite confident that we are on the right track. More rigorous experiments are planned for the future.
Besides experimenting with a repository, there is also another idea to expose OBDPR to simple software interfaces. It would not only enable further integration but can also enable the development of plug-ins for the most popular development tools, such as Eclipse, NetBeans or Visual Studio. Having OBDPR always at hand during development would certainly seem beneficial.
Even existing services need some improvements before going into production. For example, we are trying to personalize the proposing service. The proposing component could learn about the user from past proposals and, for instance, ask personalized questions or ask more questions to verify possibly contradictory answers. The idea behind the proposing service includes verifying if the developer knows what a certain question mean. This could be achieved with question redundancy: if the developer answers a question with an option that prefers pattern A and another question with an option that does not prefer pattern A, it is possible that the developer is confused. With this in mind, we can reduce questions asked during the proposing service, if we consider the developer's past dialogues. OBDPR enables the analysis of exhaustive logs of usage. We have data on each proposing process if a selection is well done. If not, we can review which question was shown to be problematic and where the user starts to get confused (by measuring several attributes for each question including repetition number, time spent, premature finishing etc). This can be used as guidance for experts to review and improve questions and answers or to provide more questions connecting particular candidates.
After performing research activities by means of experimenting with the platform on industry developers, we plan to develop a holistic methodology for design pattern selection. It will include both a design pattern expert and user activities. OBDPR will be given the role of an enabling tool for the developed methodology. To take full advantage of formalized design patterns aspects, there is basically no limitation for creating additional services on top of the platform.
7 Conclusions
The platform (Ontology-Based Design Pattern Repository - OBDPR) presented in this paper is the basis for automatic and intelligent services built on top of formally presented design pattern knowledge. The main aim of OBDPR is to introduce formal methods of design pattern representation in order to drawn upon capabilities known from the area of artificial intelligence. It also simultaneously keeps patterns in human-friendly form. Therefore, the semantic web approach and technology were used. OBDPR addresses the challenges of selecting, understanding and using design patterns in the rapidly increasing number of design patterns.
In the paper, we have presented several important components for our approach:
•	The proposed formal design pattern presentation technique can easily be used by automatic intelligent services as well as by human users. It can additionally be integrated with existing presentations, especially ODOL [1].
•	A new, fully functional OBDPR with the capability to serve as a basis for more advanced services (we have so far developed a searching, proposing, training and validating service).
•	The platform and services have initially been exposed to real-world usage. The initial experiment has encouraged us to carry on with our work and perform additional, rigorous experiments.
We are confident that our work can contribute to an increase in using design patterns, especially by helping to find a suitable design pattern for a given situation. This issue constitutes a great challenge for the typical developer. OBDPR was therefore developed primarily to capture design patterns, explicit and implicit expert knowledge, and to enable the further development of intelligent services and to test our belief that we can improve design pattern adoption.
Introducing semantic web concepts and technology into the design pattern field has revealed itself to be the correct solution so far. It creates new possibilities for making design patterns more approachable for software engineers.
8 References
[1]	A. H. Eden et al, Precise Specification and Automatic Application of Design Patterns, International Conference on Automated Software Engineering, IEEE Press, 1997.
[2]	Core J2EE Patterns, http://java.sun.com/blueprints/ corej2eepatterns.
[3]	D. K. Kim at al, A UML-based Metamodeling Language to Specify Design Patterns, Proceedings of the Workshop Software Model Eng. (WiSME) with Unified Modeling Language Conf. 2003, October 2003.
[4]	D.Manolescu, W. Kozaczynski, A. Miller, J. Hogg, "The Growing Divide in the Patterns World", IEEE Software, Vol. 24, No. 4., July/August 2007, pp. 61-67.
[5]	E. Gamma et al, Design patterns: Elements of reusable object orientated software, Addison Wesley Longman, 1998.
[6]	Gerson Sunyé et al, Design Pattern Application in UML, ECOOP'00, http://www.ifs.uni-linz.ac.at/~ecoop/cd/papers/1850/18500044.pdf.
[7]	J. M. Rosengard, M. F. Ursu, Ontological Representations of Software Patterns, KES'04, Lecture Notes in Computer Science, SpringerVerlag,	2004, http://w2.syronex.com/jmr/pubs/2004/ontology-pattern.pdf.
[8]	J. M. Rosengard, M. F. Ursu, Ontological Representations of Software Patterns, KES'04, Lecture Notes in Computer Science, SpringerVerlag,	2004, http://w2.syronex.com/jmr/pubs/2004/ontology-pattern.pdf.
[9]	Jena Semantic Web Framework, http://jena.sourceforge.net.
[10]	L. Rising, "Understanding the Power of Abstraction in Patterns", IEEE Software, July/August 2007, Vol. 24, No. 4., pp. 46-51.
[11]	L. Rising, The Pattern Almanac 2000, Addison Wesley, 2000.
[12]	Marcus Fontoura and Carlos Lucena , Extending UML to Improve the Representation of Design Patterns, Computer Science Department, Pontifical Catholic University of Rio de Janeiro.
[13]	Microsoft, Microsoft Patterns & Practices, http://msdn.microsoft.com/practices.
[14]	OWL Web Ontology Language Overview, http://www.w3.org/TR/owl-features.
[15]	R. Singh, Drive: An RDF Parser for .NET, http://www.driverdf.org/.
[16]	RDF/XML	Syntax	Specification, http://www.w3.org/TR/rdf-syntax-grammar.
[17]	Schmidt, D.C., "Using Design Patterns to Develop Reusable Object-Oriented Communication Software", Communications of the ACM, October 1995.
[18]	SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query.
[19]	T. Berners-Lee, "Business Model for the Semantic Web", http://www.w3.org/ Designlssues/Overview .html.
[20]	W3C, "Semantic Web", http://www.w3.org/2001/ sw/.
[21]	J. Helin, P. Kellomäki, T. Mikkonen. Patterns of Collective Behavior in Ocsid. p. 73-93. Design Patterns Formalization Techniques. IGI Publishing. March 2007.
[22]	T. Taibi, Design Patterns Formalization Techniques, United Arab Emirates University, UAE, Preface, IGI Publishing. December 2006.
[23]	T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web, Scientific American, 284 (5) (2001) 28-37.
[24]	P. Bonillo, N. Zambrano y Eleonora Acosta, Methodologic Proposal for Business Process Management sustained in the use of Patterns, Journal of Object Technology, vol. 7, no. 7, September - October, pp. 131-145, http://www.jot.fm/issues/issue_2008_09/article5.
[25]	P. Gomes, F.C. Pereira, P. Paiva, N. Seco, P. Carreiro, J. Ferreira, C. Bento, Selection and Reuse of Software Design Patterns Using CBR and WordNet, 15th International Conference on Software Engineering and Knowledge Engineering, SEKE 2003, Proceedings, (SEKE'03), pp. 289-296.
[26]	M. Heričko, I. Brejc, L. Pavlič, V. Podgorelec, A Question-Based Design Pattern Advisement Approach, Journal of Systems and Software, Submitted to publication in November 2008.
[27]	U. Zdun, Systematic Pattern Selection Using Pattern Language Grammars and Design Space Analysis, Software: Practice & Experience, vol. 37, no. 9, p 983-1016, Wiley, 2007.
Historical Impulse Response of Return Analysis Shows Information Technology Improves Stock Market Efficiency
William Leigh
University of Central Florida
College of Business Administration
PO Box 161400
Orlando, FL 32816-1400, USA
E-mail: wleigh@bus.ucf.edu; http://www.bus.ucf.edu/leigh/
Russell Purvis Clemson University
College of Business & Behavioural Science 165 Sirrine Hall Clemson, SC 29634, USA
E-mail: rlpurvi@clemson.edu; http://people.clemson.edu/~rlpurvi/
Keywords: empirical impulse response function, impact of information technology, market efficiency, information diffusion, econophysics, stock market as information system
Received: March 9, 2007
We compare average impulse response of rate of return curves computed from more than sixty-seven years of historical Dow Jones Industrial, Transportation, and Utility Average closing values. The curves are relatively consistent in shape until the 1990s, when marked changes, indicative of improved market efficiency, occur for the Dow Jones Industrial Average. We argue that the effect is a result of the increased availability and reduced cost of online stock trading and of the more rapid dissemination and diffusion of information made possible by information technology. The basis for this argument is that: 1) the effect occurs only for the Industrial Average, which is comprised of stocks which are most well-known to the investing public, and not for the Transportation or Utility Averages, which are comprised of less well-known stocks; and 2) the effect is progressive and contemporaneous with the growth of the use of personal computing and the internet.
Povzetek: Z analizo prejšnjih dogajanj je pokazano, da informacijske tehnologije izboljšujejo učinkovitost borz.
1 Introduction
The output signal response from a continuous system to a	well as any newly revealed information is considered and
narrow pulse input signal is the system's impulse	reflected in the current price, and this means that
response. Physical systems may be analyzed by	historical price information alone may not be used to
observing the system's response to a sharp input, by	forecast future prices with any useful degree of success.
striking a bell, for example, and observing the movement	Perfect market efficiency requires the instantaneous
of points on the surface of the bell. Economic systems	availability and instantaneous correct application of
are not amenable to the sort of laboratory analysis that	perfect information by all market participants, and these
may be applied to physical systems, but an investigator	assumptions are not true in the real world of the stock
of the stock market may detect narrow pulses of	market. Thus, as friction is assumed to be negligible in
increased (or decreased) rate of return in the time series	order to consider an ideal "F=MA", the effects of
of daily rates of return for the stock market and then	investor psychology and of non-instantaneous and non-
observe the behavior of the subsequent daily returns, thus	homogeneous information communication and
employing a synthetic impulse response analysis.	application are assumed to be negligible by the
The assumption of market efficiency is important in	assumption of a perfectly efficient market. Brynjolfsson
the theory of the academic discipline of Finance. A	and Smith (2000) consider internet retail as a
perfectly efficient market adjusts its prices to new	"frictionless" commerce.
information instantaneously and correctly. This implies	Society is interested in the fairness and efficiency of
that for a perfectly efficient market, all relevant	markets because society makes resource allocation
information available in the past history of stock price as	decisions based on market prices. In practice our markets
may not be truly efficient, but perfect efficiency is a benchmark by which we can evaluate how well a market serves society. Perfect stock market efficiency is a worthwhile ideal, though perfect market efficiency is impossible. Kauffman and Walden (2001) identify the effect of the widespread use of modern information technology on market efficiency as a topic for future study.
Successes with the application of theory and techniques from physics, such as impulse response analysis, to understanding and predicting the behavior of the stock market are reported primarily in the physics journals (for example, Sornette and Zhou 2006). There are some reports in the economics and finance journals of the use of techniques from physics for modeling economic and financial systems (for example, Sornette and Zhou 2005), but as most academics in economics and finance hold to the assumption that the stock markets are perfectly efficient, and as the physics models tend to be useful only in modeling markets which are not perfectly efficient, the physics journals (where this work is sometimes called "econophysics") are a more likely place to find the application of physics models to the stock markets than are the economics and finance journals.
Donella Meadows (1999) identifies "leverage points", places to intervene in a system to effect change. These leverage points are, in increasing order of effectiveness:
•	Constants, parameters.
•	Sizes of buffers and stocks.
•	Structure of material stocks and flows.
•	Lengths of delays.
•	Strength of negative feedback loops.
•	Gain around driving positive feedback loops.
•	Structure of information flows (who does and
does not have access to what kinds of information.)
•	Rules of system.
•	Power to add or change the system structure.
•	Goals of the system.
•	Mindset out of which the system arises.
•	Power to transcend paradigms.
The introduction of innovations in the stock market system is ongoing and continuous (White 2003). The behavior of the stock market may be affected by changes in the structure of the market itself, in the form of new products and new regulations, or by changes in the information systems used by the market participants. Offering new stock market products constitutes level 10 leverage point change to the stock market system. New regulations, including the spate of new regulations enacted after the 1987 Crash (Lindsey and Pecora 1998), are examples of leverage point change no more powerful than level 8. The increased availability and reduced cost of online stock trading and the more rapid dissemination of information, all made possible by the personal computer and internet information technology revolution,
constitute more powerful level 6 change at least, and may involve change as powerful as level 4. "Day trading", enabled by information technology, may constitute change as strong as level 2.
Diffusion models are used to model the adoption of innovation, and this is well-studied; Rogers (1983) is a seminal survey. Physicists model radiation, dispersion, and diffusion phenomena, as surveyed in Beltrami (2002): for example, Frick's Law of Diffusion, Newton's Law of Cooling, Fisher's Equation. These models from physics may be applicable to information diffusion in the stock market system. Houthakker and Williamson (1996) survey the research concerning the characteristics of the stock market traders' information medium and how these characteristics may affect the diffusion of information. Surowiecki (2004) surveys the research explaining how markets can arrive at correct prices and can be efficient if the market participants receive timely information.
An impulse response of return graph may be constructed with rate of return for a period (40 trading days in this study) in the vertical axis and time in the horizontal axis. If there is no persistence of the effect of the impulse, then the average impulse response of return behavior for the period is "flat", in graphical terms, after the impulse, and we may say that the market is perfectly efficient. If the graph is not flat then the market is not perfectly efficient.
Change in price in the stock market is the effect of information on the traders' value estimates for the stocks. Lags in the diffusion and adoption of new information can explain non-flat response of return behavior. Using the terminology of the innovation diffusion literature, "early adopters" may receive information early and act quickly, while "late adopters" may receive new information late and/or act slowly.
2 Method
The results in this paper are from the use of a method for deriving an empirical impulse response function of return, such as is discussed in Koop et al (1996). The method is applied to rolling 5000 trading day intervals of closing values of three Dow Jones Averages (Industrial --DJIA, Transportation -- DJTA, and Utility -- DJUA) for the period 5/28/1936 to 1/26/2004, which comprises 17,000 trading days.
For each trading day t, a return, rt, is computed, which is the change in closing price over the 40 trading days following as a fraction of price, p. Next we determine a trading day history window for pt such that pt is not included in calculating return values for prices in that history window. Since any return for trading days t-40 through t includes pt in the calculation, a 400 trading day history window preceding trading day t that does not include pt would begin at day a = t-400-40 = t-440 and end at day b = t-40-1 = t-41. This presentation uses 40 trading days as the return horizon and 400 trading days as the history period throughout.
Hence we have:
t = 1,^,17000 trading days
Pt= closing value of Dow Jones Average on trading day t
rt= ( pt+40 — pt) / pt return for trading day t
Ra^ = { rt | a < t < b } a set of returns for trading days a to b
Rt-440,t-41 = { rt | t-440 < t < t-41 } Historical Return Set for pt
A positive impulse is identified if a return, rt, exceeds the average of the returns in pt's historical return set Rt.44o.t-4i by a multiple of the standard deviation of the returns in that historical return set. A value of 1.5 is used for this multiplier throughout this paper.
ft = 1/nZk=1 rk the average return for the historical return set, Rt-440,t-41
st = V [1/n-1Zk=1 (rk - ft)2] the standard deviation for the historical return set, Rt-440,t-41
m = 1.5 a constant multiplier
0, o
it = < 1, if rt > ft + m^st the impulse for trading day t otherwise
Table 1 contains the beginning and ending trading day numbers and the corresponding dates for the intervals used. For each interval the table lists the number of positive impulses which occurred in that interval for each of the three Dow Jones Averages.
Interval	Trading Day		Date		Impulses		
	Begin	1 End	Begin |	End	djiAdjtAdjua		
36-56	1	5000	05/28/36	05/21/56	415	427	395
56-76	5001	10000	05/22/56	05/03/76	376	528	500
60-80	6001	11000	05/11/60	04/17/80	409	522	491
64-84	7001	12000	05/04/64	03/30/84	469	496	481
68-88	8001	13000	04 24 68	03/16/88	541	510	507
72-92	9001	14000	05/16/72	02/28/92	497	466	477
76-96	10001	15000	05/04/76	02/13/96	514	446	441
80-00	11001	16000	04/18/80	01/31/00	525	393	458
84-04	12001	17000	04/02/84	01/26/04	470	378	521
Table 1: Intervals used in terms of trading days and the number of impulses in each interval for each Dow Jones Average.
The method defines the "response" to the impulse in terms of time lags expressed in trading days. This impulse response function is derived from an aggregation of trading days in a response set. Returns for trading days are identified for inclusion in the impulse response return set by relating them to the return for the trading day preceding them by a lag of between 1 and 200 trading days. If a trading day's return is identified as a return associated with an impulse, then the lagged trading day
return is a member of the response set. There is an impulse response return set, Ia,b.L, for each of the 200 lag values for trading day interval from a to b. That is, we have:
L = 200 the lag in trading days
Ia,b,L = { rk | ik-L = 1 and a < k < b } the impulse response return set given L from trading day a to trading day b
ia,b,L = 1/nZk=1 rk where rk is in Ia,b,L the average impulse response return in Ia,b,L for lag L
Figure 1(a) shows a graph of impulse responses for the time intervals 36-56 (the 200 I1.3000.L values for the years 1936 to 1956, denoted by the squares in the graph) and 56-76 (the 200 I3001.10000.L values for the years 1956 to 1976, denoted by dashes) for lags of 1 to 200. Figure 1(b) graphs the impulse response values for interval 36-56 on the x-axis and interval 56-76 on the y-axis for lags of 40 to 100. The method is ex-ante for lags of 40 and up, and so it is the impulse responses for those lags that are of interest for forecasting and trading applications.
(a)
(b)
Figure 1: (a) Impulse response graphs for intervals 36-56 (squares) and 56-76 (dashes) for lags 1 to 200. (b) Graph of impulse response values for interval 36-56 against impulse response values for interval 56-76 for lags of 40 to 100.
3 Historical comparison
Table 2 shows correlation coefficient values calculated for each interval, for each of the three Dow Jones Averages, for lags 40 to 100, and for lags 40 to 200. The correlation coefficient is calculated for the impulse response curve for interval 36-56 for the respective Dow Jones Average as correlated against each of the other intervals for the same Dow Jones Average. The correlation coefficients then are:
cx,Y = correlation coefficient as computed from the sets X and Y of values paired as (xL,yL) where L is the lag value
Ca,b,c,d,L1,Ln
= correlation coefficient as computed
from c ia,b, ic,d ( { ia,b,L | L1 < L < Ln},
ic,d,L| L1 < L < Ln} )
For example, the top value in the second column of Table 2 is 0.907. This is the evaluation of
C15000.5001.10000.40.100 for the Dow Jones Industrial Average.
Interval	DJIA		DJTA		DJUA	
	40-100 40-200		40-1001	140-200	40-100	40-200
56-76	.907	.734	.434	.627	.654	.385
60-80	.943	.700	.667	.547	.376	.161
64-84	.923	.765	.661	.594	.343	.298
68-88	.910	.805	.700	.642	.334	.269
72-92	.863	.751	.600	.498	.077	.235
76-96	.893	.828	.490	.215	-.125	.124
80-00	-.496	.706	.330	.426	.355	.276
84-04	-.475	.391	.490	.026	.617	.301
Table 2: Correlation coefficient values for the impulse response function of each interval correlated with the first interval 36-56 for each Dow Jones Average.
The correlation coefficient is useful for comparing the impulse response curves in a rough-and-ready way. In Table 2 it may be seen that the correlation coefficient values for lags 40 to 100 change markedly in sign and magnitude for the DJIA for intervals 80-00 and 84-04 as compared with the correlation coefficient values for the intervals which went before. Note the negative correlation values for the Industrial Average in the last two rows of Table 2.
Table 3 is prepared in a similar way to Table 2 except that the 36-56 interval for the Dow Jones Industrial Average is used as the correlation partner for each of the other intervals (instead of the 36-56 interval in the respective Dow Jones Average.)
Interval	DJTA		DJUA	
	40-100	40-200	40-10040-200	
56-76	.426	.816	.953	.217
60-80	.850	.812	.865	.203
64-84	.875	.827	.865	.395
68-88	.970	.895	.870	.459
72-92	.927	.727	.718	.557
76-96	.803	.537	.600	.683
80-00	.581	.594	.861	.749
84-04	.714	.367	.938	.609
Table 3: Similar to Table 2 but 36-56 interval for DJIA is used as correlation partner for each of the intervals in the DJTA and in the DJUA.
In Table 3 the correlation coefficient values are generally higher for the DJTA and DJUA than they are in Table 2. The correlations continue to be relatively strong through the 80-00 and 84-04 intervals.
Figure 2 shows graphs of the impulse response functions for all of the intervals for the DJIA, except that the curves are offset by the average profit in each of the intervals. Thus, on the y-axis Figure 2 shows excess profit, which is the difference between la.b.L, the average impulse response return in the set Ia.bL for trading days a to b and for lag L, and the average of returns from trading days a to b, that is the set Ra.b. A market timing rule which would have worked before 1990 on the DJIA and still worked on the DJTA and DJUA to the end of the study: "Buy at impulse lag day 80 and sell after holding for 40 days".
0.03 0.02 0.01 0 -0.01 -0.02
40
80
120
160
200
Figure 2: Impulse response for DJIA for each interval, adjusted by average profit in interval. Values for interval 36-56 are denoted by the square, for interval 80-00 by "+", and for interval 84-04 by "x".
4 Conclusion
The stock market acts like an elastic medium in time which transmits shock waves, in this case for rate of return shocks. The impulse response behavior of the stock market information transmission medium was relatively homogeneous from 1936 up until about 1990. In Figure 2, an oscillating wave is seen, with over-reaction and under-reaction out to 200 days. About 1990, the stock market price information transmission medium changed drastically for the DJIA, with the response from the shock dampened out immediately. No similar dampening is seen for the DJTA and DJUA, though perhaps we can expect the DJTA and DJUA to be transformed in the same way in the future.
The progressive and distinct flattening of the impulse response curves for DJIA intervals 80-00 and 84-04 (see Figure 2) can be considered to indicate increasing efficiency in the market underlying the DJIA, showing that the market mechanism has been improved. The changes in kind and velocity of information processing resulting from the computer and communications technology revolution of the last few decades is truly new to the stock market, and only for this information technology revolution is it possible that the impact be sufficient to have caused the effect observed. This with the fact that the effect on stock market information processing reported in this paper is exactly
contemporaneous with this revolution in information technology and that the effect is pronounced in the case of the stocks most widely known and traded by the segment of the stock trading population which has been newly empowered by the information technology revolution leads us to make the argument that the effects are primarily the result of information technology.
This work investigated a trading horizon of 40 trading days. An objective of future work will be to determine the effect on impulse response of information technology through the full spectrum of trading horizons, from 1 day to 100 days or more. An interesting hypothesis, consistent with the idea that the technology and the traders use of it become more effective as time goes on, is that the "flattening" begins at the longer trading horizons and over time, progresses, to the shorter trading horizons.
Finance: Financial Markets and Asset Pricing, Vol. 1B. North Holland, New York.
References
[1]	Beltrami, E. (2002). Mathematical Models for Society and Biology. Academic Press, San Diego. 199 pages.
[2]	Brynjolfsson, E. and M. Smith (2000). Economics and electronic commerce: survey and research directions. International Journal of Electronic Commerce, 5(4), 5-117.
[3]	Houthakker, H.S. and P.J. Williamson (1996). The Economics of Financial Markets. Oxford University Press, Oxford. 361 pages.
[4]	Kauffman, R.J. and E.A.Walden (2001).Economics and electronic commerce: survey and research directions. International Journal of Electronic Commerce. 5(4), 5-117.
[5]	Koop, G., M.H. Pesaran and S.M. Potter (1996). Impulse response analysis in nonlinear multivariate models. Journal of Econometrics, 74(1), 119-148.
[6]	Lindsey, R.R. and A.P. Pecora (1998). Ten years after: regulatory developments in the securities markets since the 1987 market break. Journal of Financial Services Research, 13(3), 283-314.
[7]	Meadows, D. (1999). Leverage Points Places to Intervene in a System. The Sustainability Institute, Hartland, Vermont. 19 pages.
[8]	Rogers, E.M. (1983). Diffusion of Innovations. The Free Press, New York. 453 pages.
[9]	Sornette, D. and W Zhou (2005). Non-parametric determination of real-time lag structure between two time series: the "optimal thermal causal path" method. Quantitative Finance, 5, 577-591.
[10]	Sornette, D. and W Zhou (2006). Importance of positive feedbacks and overconfidence in a self-fulfilling Ising model of financial markets. Physica A: Statistical and Theoretical Physics, 370(2), 704726
[11]	Surowiecki, J. (2004). The Wisdom of Crowds. Doubleday, New York. 296 pages.
[12]	White, L.J. (2003). Technological change, financial innovation, and financial regulation in the U.S.: the challenges for public policy. In G. Constantinides, M. Harris, and R.M. Stulz (Ed.), Economics of
Fall Detection and Activity Recognition with Machine Learning
Mitja Luštrek and Boštjan Kaluža Jožef Stefan Institute, Department of Intelligent Systems Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: mitja.lustrek@ijs.si, bostjan.kaluza@ijs.si
Keywords: fall detection, activity recognition, posture and movement reconstruction, machine learning Received: July 16, 2008
Due to the rapid aging of the European population, an effort needs to be made to ensure that the elderly can live longer independently with minimal support of the working-age population. The Confidence project aims to do this by unobtrusively monitoring their activity to recognize falls and other health problems. This is achieved by equipping the user with radio tags, from which the locations of body parts are determined, thus enabling posture and movement reconstruction. In the paper we first give a general overview of the research on fall detection and activity recognition. We proceed to describe the machine learning approach to activity recognition to be used in the Confidence project. In this approach, the attributes characterizing the user's behavior and a machine learning algorithm must be selected. The attributes we consider are the locations of body parts in the reference coordinate system (fixed with respect to the environment), the locations of body parts in a body coordinate system (affixed to the user's body) and the angles between adjacent body parts. Eight machine learning algorithms are compared. The highest classification accuracy of over 95 % is achieved by Support Vector Machine used on the reference attributes and angles.
Povzetek: Članek opisuje zaznavanje padcev in prepoznavanja aktivnosti nasploh ter izvedbo prepoznavanja aktivnosti s strojnim učenjem za potrebe projekta Confidence.
1 Introduction
The European population is aging due to the increase in life expectancy and decrease in birth rate. The percentage of population aged over 65 years is anticipated to rise from 17.9 % in 2007 to 53.5 % in 2060 [7]. As a consequence, the number of the elderly will exceed the society's capacity for taking care of them. Thus an effort needs to be made to ensure that the elderly can live longer independently with minimal support of the working-age population. This is the primary goal of the EC Seventh Framework project Confidence [4].
The Confidence project will develop a ubiquitous care system to unobtrusively monitor the user, raise an alarm if a fall is detected and warn of changes in behavior that may indicate a health problem. This will improve the chances of a timely medical intervention and give the user a sense of security and confidence, thus prolonging his/her independence.
The user of the Confidence system will wear small inexpensive wireless tags on the significant places on the body, such as wrists, elbows, shoulders, ankles, knees and hips. The precise number and placement of tags will be defined during development. The tags may even be sewn into the clothes. The locations of the tags will be detected by a base station placed in the apartment and a portable device carried outside. This will make it possible to reconstruct the user's posture and movement and to recognize his/her activity. Some tags may be placed in the user's environment at locations such as bed
and chair to recognize activities such as the user lying in a bed and sitting in a chair. Finally, the user's behavior will be interpreted as normal or abnormal. An alarm or warning will be raised in the latter case.
This paper describes machine learning methods for activity recognition [12][13] to be used in the Confidence project. We focus on the selection of attributes and machine learning algorithm to maximize the recognition accuracy. The activities to recognize are falling, the process of lying down, the process of sitting down, standing/walking, sitting and lying. Falling is important in itself because fall detection is one of the main goals of the project. For the processes of lying down and sitting down, we wanted to see whether they can be distinguished from falling. The recognition of standing/walking, sitting and lying is needed to detect changes in behavior, such as the user walking less and lying more, which may indicate a health problem.
The paper is structured as follows. Section 2 gives a detailed overview of related work on fall detection and activity recognition [9]. Section 3 describes the recordings of user behavior used as input data. Section 4 lists the attributes extracted from the input data that are fed into the machine learning algorithms. Section 5 presents the experiments in which the various attributes and machine learning algorithms are compared. Finally, Section 6 concludes the paper in outlines the future work.
2 Related work
We divide the work on fall detection and activity recognition into four approaches presented in the following four subsections. They are distinguished by the equipment used and by the features extracted from sensor data.
The first approach is based on accelerometers. An accelerometer is a device for detecting the magnitude and direction of the acceleration along a single axis or along multiple axes. Three-axis accelerometers are typically used. By detecting the acceleration caused by the earth's gravity, one can also compute the accelerometer's angle with respect to the earth.
The second approach uses gyroscopes, which measure orientation. A gyroscope consists of a spinning wheel whose axle is free to take any orientation. It can measure the orientation along one axis or multiple axes. By equipping an object with the gyroscope(s) to measure the orientation along three axes, it is possible to exactly determine the object's orientation and the changes in orientation, from which the angular velocity can be computed.
The third approach is denoted visual detection without posture reconstruction. It is based on extracting input data from still images or from video. Various computer vision techniques are applied to the input data, but the human posture is not reconstructed explicitly.
The fourth approach, named visual detection with posture reconstruction, is based on 3D locations of markers placed on an object, typically human body. The approach also uses video recordings, but, in contrast to the third approach, the visual information is used only to reconstruct the 3D locations of the markers. Additional processing uses the markers' coordinates as input data. If a sufficient number of markers are provided, it is possible to reconstruct the shape of an object, which in our case means the human posture.
2.1 Accelerometers
The most common and simple methodology for fall detection is using a tri-axial accelerometer with threshold algorithms [3][10]. Such algorithms simply raise the alarm when the threshold value of acceleration is reached. There are several sensors with hardware built-in fall detection [1][5][15], having the accuracy of over 80 %.
Zhang et al. [25] designed a fall detector based on Support Vector Machine (SVM) algorithm. The detector was using one waist-worn accelerometer. The features for machine learning were the accelerations in each direction, changes in acceleration etc. Their method detected falls with 96.7 % accuracy. Researches embedded an accelerometer in a cell phone [24] and detected falls with the proposed method. The cell phone was put in a pocket of clothes or hanged around the neck, which made the detection more difficult as with the body-fixed sensor. The cell-phone system correctly raised the alarm in 93.3 % of the cases.
Tapia et al. [18] presented a real-time algorithm for automatic recognition of not only physical activities, but also, in some cases, their intensities, using five wireless accelerometers and a wireless heart rate monitor. The accelerometers were placed at shoulder, wrist, hip, upper part of the thigh and ankle. The features, e.g., FFT peaks, variance, energy, correlation coefficients, were extracted from time and frequency domains using a predefined window size on the signal. The classification of activity was done with C4.5 and Naive Bayes classifiers into three groups: postures (standing, sitting etc.), activities (walking, cycling etc.) and other activities (running, using stairs etc). For these three classes they obtained the recognition accuracy of 94.6 % using subject-dependent training and 56.3 % using subject-independent training.
Willis [21] developed a fall detection system based on belief network models, which enable probabilistic modeling of scenarios (e.g., normal walking, tripping/stumbling and running) and the transitions between them. The sensors were placed under the heel and toe, which made it possible to reconstruct gait cycle and to detect falls. The accuracy was not reported.
Researchers using accelerometers give a lot of attention to the optimal sensor placement on the body [3][10]. A head-worn accelerometer provides excellent impact detection sensitivity, but its limitations are usability and user acceptance. A better option is a waist-worn accelerometer. The wrist did not appear to be an optimal site for fall detection. Some researchers made a step further and used accelerometers for trying to recognize the impact and posture after the fall [11].
In the Confidence system, accelerations could in principle be derived from the movement of tags. However, we believe this approach to be unreliable: first, because the acceleration is the second derivative of tag location and as such strongly affected by sensor noise, and second, because the data acquisition frequency in Confidence is expected to be relatively low. The studies of sensor placement may be valuable for deciding where to place tags in Confidence.
2.2 Gyroscopes
Bourke and Lyons [2] introduced a threshold algorithm to distinguish between normal activities (sitting down and standing up, lying down and standing up, getting in and out of a car seat, walking etc.) and falls. The ability to discriminate was achieved using a bi-axial gyroscope mounted on the torso, measuring pitch and roll angular velocities. They applied a threshold algorithm to the peaks in the angular velocity signal, angular acceleration and torso angle change. The system proved 100 % successful in fall detection.
The Confidence system derives velocities from the movement of tags. The velocity, being the second derivative of tag location and being less affected by the low data acquisition frequency, is more reliable than acceleration. However, since the data available in Confidence is much richer than that provided by gyroscopes, we decided against simple threshold-based fall detection.
2.3	Visual detection without posture reconstruction
Vishwakarma et al. [20] presented a video approach for fall detection. First, they eliminated the background of the video and extracted a set of features from the remaining objects' bounding boxes, e.g., the aspect ratio, horizontal and vertical gradients etc. In the next step they detected falls based on the angle between an object's bounding box and the ground. The final step was fall confirmation, which was rule-based, e.g., the abovementioned angle had to be less than 45°. The method achieved 95 % accuracy on single-object fall detection and 64 % accuracy on multiple objects.
Fu et al. [8] described a vision system designed to detect accidental falls in elderly home care applications. They used a temporal contrast vision sensor, which extracts changing pixels from the background. An algorithm was observing the dynamic of motion and reported falls when it indicated significant changes in the vertical downward direction. They were able to distinguish falls from normal human behaviors, such as walking, crouching down and sitting down. The accuracy was not reported.
The proposed methods are quite capable of dealing with fall detection, but it is not clear how to adapt them to the sensor data available in the Confidence system.
2.4	Visual detection with posture reconstruction
Wu [23] studied unique features of the velocity during normal and abnormal (i.e. fall) activities so as to make the automatic detection of falls during the descending phase of a fall possible. Normal activities included walking, rising from a chair and sitting down, descending stairs, picking up an object from the floor, transferring in and out of a tub and lying down on a bed. The study provides exhaustive velocity parameters for fall detection, gathered by three markers placed on the posterior side of the torso, recorded by three cameras with the sampling rate of 50 Hz. The aim of the study was to suggest velocity characteristics, so the author did not actually implement automatic fall detection.
Qian et al. [16] introduced a gesture-driven interactive dance system capable of real-time feedback. They used 41 markers on the body recorded by 8 cameras with the frame rate of 120 Hz to construct a human body model. The model was used to extract features such as torso orientation, angles between adjacent body parts etc., which was used to represent different gestures. Each gesture was statistically modeled with a Gaussian random vector defined as the statistical distribution of the features for that gesture. To recognize a new pose, the likelihood of its feature vector given the vector of each known gesture was computed. The new pose was classified as the gesture for which this likelihood was the largest. Experimental results with two dancers performing 21 different gestures achieved gesture recognition rate of 99.3 %.
Sukthankar and Sycara [17] presented a system that reconstructs the users' posture and recognizes predefined behaviors. The data were captured with 43 body markers and 12 cameras with the sampling rate of 120 Hz. They constructed a human body model from the raw marker coordinates, and computed features, e.g. the angles between body parts, limb lengths, range of motion etc. from the model. Learning was performed using SVM. The method achieved 76.9 % accuracy in detecting the following elementary activities: walking, running, sneaking, being wounded, probing, crouching, and rising. Behavior was defined as a sequence of elementary activities and was modeled with Hidden Markov models. The authors defined a number of behavior models and classified a new sequence of activities into the model that fit it best.
The markers in the proposed systems have the same role as the tags in the Confidence system. The methods by Qian et al. and even more so by Sukthankar and Sycara inspired the approach we used for activity recognition in Confidence. We are not aware of anybody having used this kind of methods for fall detection, though.
3 Input data
The goal of our research was to classify the user's behavior into one of the following activities: falling, lying down, sitting down, standing/walking, sitting and lying. To obtain training data for a classifier to recognize these activities, we recorded 45 examples of the behavior of three persons. Each recording consisted of multiple activities:
•	3 X 15 recordings of falling, consisting of standing/walking, falling and lying.
•	3 X 10 recordings of lying down, consisting of standing/walking, lying down and lying.
•	3 X 10 recordings of sitting down, consisting of walking, sitting down and sitting.
•	3 X 10 recordings of walking.
The recordings consisted of the coordinates of 12 body tags attached to the shoulders, elbows, wrists, hips, knees and ankles. This is the full complement of tags that will probably be reduced in the future. Since the equipment with which the Confidence system will acquire tag coordinates is still under development, the commercially available Smart infrared motion capture system [6] was used instead. The coordinates were acquired with 60 Hz. The frequency was afterwards reduced to 10 Hz, which is the expected Confidence data acquisition frequency. To make the recordings even more similar to what we expect of the Confidence equipment, we added Gaussian noise to them. The standard deviation of the noise was 4.36 cm horizontally and 5.44 cm vertically. This corresponds to the noise measured in the Ubisense real time location system [19]. The Ubisense system is similar to the equipment planned for acquiring tag coordinates in Confidence. The noise in the recordings was smoothed with Kalman filter [14].
4 Attributes for machine learning
Finding the appropriate representation of the user's behavior activity was probably the most challenging part of our research. The behavior needs to be represented with simple and general attributes, so that the classifier using these attributes will also be general and work well on behaviors different from those in our recordings. It is not difficult to design attributes specific to our recordings; such attributes would work well on them. However, since our recordings captured only a small part of the whole range of human behavior, overly specific attributes would likely fail on general behavior.
The attribute vector from which the classifier infers the user's activity consists of ten consecutive snapshots of the user's posture, describing one second of activity. When multiple activities took place within a given second, the attribute vector was assigned the longest one.
We designed three sets of attributes describing the user's behavior. Reference attributes are expressed in the reference coordinate system, which is fixed with respect to the user's environment. Body attributes are expressed in a coordinate system affixed to the user's body. Angle attributes are the angles between adjacent body parts.
4.1	Reference attributes
When selecting reference attributes, we ignored x and y coordinates. These coordinates describe the user's location in the environment, but the activities of interest can generally take place at any location.
In the list of reference attributes, the upper index t indicates the time within the one-second interval: t = 1 ... 10. The lower index i indicates the tag: i = 1 ... 12. The lower index R indicates the reference coordinate system and distinguishes reference attributes from those belonging to the other two sets.
•	... z coordinate of tag / at time t
•	... the absolute velocity of the tag
•	V aR ... the velocity of the tag in the z direction
•	dij-^ ... the absolute distance between the tags / anàj,j = i+ 1 ... 12
•	d-^ij-^ ... the distance between tags / in j in the z direction
4.2	Body attributes
Body attributes are expressed in a coordinate system affixed to the user's body. This makes it possible to observe x and y coordinates of the user's body parts, since these coordinates no longer depend on the user's location in the environment.
The body coordinate system is shown in Figure 1. Its origin O is at the mid-point of the line connecting the hip tags (Hr and HL for the right and left hip respectively). This line also defines the y axis, which points towards the left hip. The z axis is perpendicular to the y axis, touches the line connecting both shoulder tags (SR and SL for the right and left shoulder respectively) at point Sz, and points upwards. The x axis is perpendicular to the y and z axes and points forwards.
Figure 1: The body coordinate system.
In order to translate reference coordinates into body coordinates, we need to express the origin O and basis (i, j, k) of the body coordinate system in the reference coordinate system. Note that bold type denotes vectors and X denotes a vector from the origin to the point X. Equation (1) expresses the origin of the body coordinate system in the reference coordinate system.
^L +_h-R	(1)
o =
7 =
Equation (2) gives us the basis vector j. Äl - o
(2)
To obtain k, Equation (3) is first used to calculate Sz. = Sr + a(sL - SR) (s^ - o)(hL - Är) = 0 (sr - o)(fcL - Air)
(3)
a =
k =
(Sl - SR)(ÄL - /IR)
Once sz is calculated, Equation (4) gives us k. S, — o
(4)
l^z - o|
Finally we obtain i using Equation (5). i^jxk	(5)
We also experimented with a variant of body coordinate system with the reference z axis, which is shown in Figure 2. Its origin O is again at the mid-point of the line connecting the hip tags. The z axis is the z axis of the reference coordinate system. The y axis is perpendicular to the z axis, lies on the plane defined by the hip tags and a point on the z axis, and points towards the left hip. The x axis is perpendicular to the y and z axes and points forwards when the user is upright (in general it points in the direction of the cross product of the basis vectors j and k).
Figure 2: The body coordinate system with reference z axis.
In the body coordinate system with the reference z axis, the origin is again calculated with Equation (1). The basis vector k equals the basis vector k in the reference coordinate system: k = (0, 0, 1). The basis vector i is perpendicular to k and to the vector from O to HL, which is expressed with Equation (6).
i =
fe X (Ail - o)
(6)
|fcX(hL-0)|
The basis vector j is obtained with Equation (7). j ^kX i	(7)
To finally translate the coordinates in the reference coordinate system into the coordinates in either of the body coordinate systems, Equation (8) is used. The vectorpR = (xR, yR, zR, 1) corresponds to the point (xR, yR, zr) in the reference coordinate system. The vector pß = (xb, yB, zb, 1) corresponds to the point (xb, ys, zb) in a body coordinate system. Tr^b is the transformation matrix from the reference to the body coordinate system. Notation i(B)R refers to the basis vector i belonging to the body coordinate system, expressed in the reference coordinate system. PB = TR-->BPR
''r->B -
^i(B)R yi(B)R ^i(B)R
—o
(B)RÌ(B)R
(8)
Xj(B)K yj(B)R ^J(B)R ~0(B)R/(B)R ^fc(B)R yfc(B)R Zfe(B)R ~0(B)Rft(B)R 0 0 0 1
Body attributes (in either of the body coordinate systems) are labeled with a lower index B:
•	y,B, Zffl) ••• coordinates of the tag / at the time t
•	v',B ... absolute velocity of the tag
•	(cp ffl, 0',b) • • • the angles of movement of the tag with respect to the z axis and xz plane
If a body coordinate system is used, the attributes describing its location, orientation and movement with respect to the reference coordinate system are added to the attribute vector:
•	z OR ... z coordinate of the origin of the body coordinate system
•	(®'OR, 0'Or) ... the direction of the x axis of the body coordinate system with respect to the z axis and xz plane
•	Vor ... absolute velocity of the origin of the body coordinate system
•	(cpoR, 0or) ... the angles of movement of the origin of the body coordinate system with respect to the z axis and xz plane
So far we expressed body attributes in the body coordinate system of each snapshot of the user's posture. However, the attributes in all ten snapshots within a one-second interval can be expressed in the coordinate system belonging to the first snapshot in the interval. This captures the changes in the x and y coordinates between snapshots within the interval. First-snapshot body attributes are the same as body attributes, except that they are labeled with Bf instead of B. The attributes describing the location and orientation of the first-snapshot body coordinate system with respect to the reference coordinate system are somewhat different, though:
•	ZofR ... z coordinate of the origin of the first-snapshot body coordinate system
•	(®ofR, ©ofR) ... the direction of the x axis of the first-snapshot body coordinate system with respect to the z axis and xz plane
4.3 Angle attributes
The paper will not delve into the details of the computation of body angles. The angles between body parts that rotate in more than one direction are expressed with quaternions:
•	q^sL and <5''sr ... left and right shoulder angles with respect to the upper torso at the time t
•	<5'hl and q*^ ... left and right hip angles with respect to the lower torso
•	q\ ... the angle between the lower and upper torso
•	«el, «er, «kl and «kr ... left and right elbow angles, left and right knee angles
5 Machine learning experiments
We tried various machine learning algorithms to train classifiers for classifying the behavior into the six activities (falling, lying down, sitting down, standing/walking, sitting and lying). To do so, sections of the 135 recordings described in Section 3 were first manually labeled with the activities. Afterwards, the recordings were split into overlapping one-second intervals (one interval starting every one-tenth of a second). The attributes described in Section 4 were extracted from these intervals. This gave us 5,760 attribute vectors consisting of 240-2,700 attributes each (depending on the combination of attributes used). An activity was then assigned to each attribute vector. Finally these vectors were used as training data for eight machine learning algorithms: C4.5 decision trees, RIPPER decision rules, Naive Bayes, 3-Nearest Neighbors, Support Vector Machine (SVM), Random
Forest, Bagging and Adaboost M1 boosting. The algorithms were implemented in Weka [22], an open-source machine learning suite. Default parameter settings were used in all cases, except for Adaboost M1, where the algorithm to train the base classifier was replaced with Fast Decision Tree Learner. Machine learning experiments proceeded in two steps.
In the first step of machine learning experiments we compared the classification accuracy of the eight machine learning algorithms and of all single attributes sets described Section 4: reference, body, body with reference z, first-snapshot body, first-snapshot body with reference z and angles. The results are shown in Table 1. The accuracy was computed with ten-fold cross-validation. The accuracy of the best attribute set for each algorithm is in bold type; the accuracy of the best algorithm for each attribute set is on gray background.
\ Attribute set Algorithm \	e cen enr erf efr	dy o b	z e cen ner erf Si h wi yd o b	yd o b t s p ns -tsir if	yd dob z sh enr ap erf asn efr sirts- rith fi w	s le lg n a
	Clean data					
C4.5 decision trees	94.1	92.8	93.7	92.9	93.2	91.8
RIPPER decision rules	93.1	91.4	92.8	92.0	93.0	90.9
Naive Bayes	89.5	88.7	90.6	86.8	88.2	76.7
3-Nearest Neighbor	97.1	92.0	82.8	88.1	85.1	96.9
SVM	97.7	94.4	95.0	94.1	94.3	90.5
Random Forest	97.0	96.5	96.8	96.0	96.0	96.8
Bagging	95.9	95.3	95.7	95.4	94.9	94.5
Adaboost M1 boosting	97.7	94.9	95.3	94.7	94.7	94.4
	Noisy data					
C4.5 decision trees	90.1	88.4	89.9	88.9	90.0	80.8
RIPPER decision rules	87.5	84.7	88.1	86.2	88.6	80.0
Naive Bayes	83.9	79.1	84.0	81.0	82.2	78.2
3-Nearest Neighbor	95.3	74.6	79.7	73.4	74.7	93.3
SVM	96.3	87.2	91.6	89.9	91.1	87.2
Random Forest	93.9	90.5	93.4	91.9	93.2	90.5
Bagging	93.6	91.8	93.3	92.3	93.5	89.1
Adaboost M1 boosting	93.2	92.0	93.1	92.1	92.9	88.4
Table 1 : Classification accuracy for all the algorithms and all single attribute sets.
For the next step of machine learning experiments, we retained the best algorithms and the best attribute sets. To rank them, we compared the classification accuracies of all pairs of algorithms and all pairs of attribute sets. Table 2 shows the number of comparisons in which a given algorithm statistically significantly (p < 0.05) wins over another algorithm, minus the number of comparisons where it loses. Table 3 shows the same for
the attribute sets. The accuracies of the algorithms and attribute sets selected for the second step are on grey background; the accuracies of the best algorithm and attribute set are in bold type. Since the second step consisted of combining the attribute sets, the selection of the sets to retain was based more on redundancy than classification accuracy. Thus we retained angles, but not the two first-snapshot body attributes (even though the latter have a higher accuracy), because first-snapshot body attributes are very similar to regular body attributes. We chose body attributes with the body z axis over body attributes with the reference z axis (even though the latter again have a higher accuracy), because the reference z coordinates are already included in the reference attributes. The comparison between every-snapshot and first-snapshot body attributes slightly favors the latter, but we nevertheless retained the former because they are computed more quickly.
Algorithm	Wins -	- losses
	Clean	Noisy
C4.5 decision trees	-12	-10
RIPPER decision rules	-18	-21
Naive Bayes	-38	-34
3-Nearest Neighbor	-13	-16
SVM	13	11
Random Forest	38	23
Bagging	17	25
Adaboost M1 boosting	13	22
Table 2: The number of wins - the number of losses of		
every algorithm against the others for clean and noisy		
data		
Attribute set	Wins -	- losses
	Clean	Noisy
reference	25	28
body	-2	-21
body with reference z	9	20
first-snapshot body	-11	-9
first-snapshot body with reference z	-2	12
angles	-19	-30
Table 3 : The number of wins - the number of losses of every single attribute set against the others for clean and noisy data
After selecting the best algorithms and attribute sets, we proceeded with the second step of machine learning experiments. In this step we tried combinations of attribute sets. Table 4 shows the classification accuracy for the four algorithms we retained and all the reasonable combinations of the remaining attribute sets. The accuracy of the best combination of attributes for each algorithm is in bold type; the accuracy of the best algorithm for each combination of attributes is on gray background.
\ Attribute set \combination Algorithm \	yd o Xi + e cen er erf efr	N e cen ner erf enr y ef d eefr dob	s le lg + e cen er erf efr	s lgn a + yd o b	+ N e cen ner Si h it wi s y les d lg on ba	al	(D er erf efr al
	Clean data						
SVM	96.6	96.9	97.7	95.3	95.5	96.7	96.9
Random Forest	97.0	97.0	97.2	96.7	96.9	97.1	97.0
Bagging	96.1	96.0	96.1	95.6	95.7	96.3	96.0
Adaboost M1 boosting	95.7	95.6	95.5	95.3	95.3	95.6	95.5
	Noisy data						
SVM	95.5	95.4	96.5	91.9	92.5	95.6	95.5
Random Forest	93.8	94.2	94.1	91.8	93.5	93.9	94.0
Bagging	93.8	94.1	93.7	92.4	93.4	93.8	94.1
Adaboost M1 boosting	93.6	93.7	93.2	93.2	93.3	93.6	93.7
Table 4: Classification accuracy for the retained algorithms and combinations of attribute sets.
6 Conclusion
We first investigated the work done so far in the area of fall detection and activity recognition. Fall detection methods were based on the accelerations and velocities of body parts and on visual cues. These data will not be available in the Confidence system, at least not directly. What will be available are the locations of body parts. Accelerations and velocities can be computed from the changes in these locations, but with questionable accuracy. We decided to use velocities, since they are expected to be more accurate than accelerations, and the locations of body parts themselves. Some work on activity recognition was also based on accelerations and velocities, but there were approaches better suited to Confidence as well. We were mostly inspired by the work of Sukthankar and Sycara [17], who used machine learning on attributes representing the body posture.
We then examined various attributes and machine learning algorithms to detect six common activities. The attributes were the coordinates of body parts in the reference coordinate system, the coordinates of body parts in four different body coordinate systems and the angles between adjacent body parts. We first compared the attribute sets in isolation and then in combinations. The reference coordinates were the best single attribute set. In combination with the angles, they gave the highest overall classification accuracy, although it should be noted that all the combinations were close in performance. We compared eight machine learning algorithms, from which Support Vector Machine produced the most accurate classifier: the accuracy on clean data was 97.7 % and on noisy data 96.5 %. It was closely followed by Random Forest, Bagging and Adaboost M1 boosting.
There are four directions for future work. The first is tuning the machine learning algorithms discussed in this paper and augment them with feature selection techniques. This is done relatively easily, but will probably not contribute much to the classification accuracy. The second direction is to take into account the temporal information: each activity takes usually lasts for some time and some transitions between activities are more likely than others. This information can help us correct some erroneous classifications, e.g., a single falling in a long sequence of walking must be an error. The third direction is using fewer than 12 tags, since a potential product resulting from the Confidence project is unlikely to use the full complement of tags. The last direction for future work is experimenting with recordings of additional behaviors. These may be variations of the existing ones to test the robustness of the classifier or entirely new activities to increase the classifier's scope.
Concerning the Confidence project, the results described in this paper are encouraging. The classification accuracy of over 95 % leads us to believe that once the planned improvements are implemented, the frequency of false alarms will be low enough for the Confidence system to be useful.
Acknowledgement
This work was supported by the Slovenian Research Agency under the Research Programme P2-0209 Artificial Intelligence and Intelligent Systems. The research leading to these results has also received funding from the European Community's Framework Programme FP7/2007-2013 under grant agreement n° 214986. Consortium: CEIT (coordinator), Fraunhofer Institute for Integrated Circuits (IIS), Jožef Stefan Institute, Ikerlan, COOSS Marche, University of Jyväskylä, Umeä Municipality, eDevice, CUP2000 S.p.A/Ltd., ZENON S.A. Robotics & Informatics. We would like to thank Matjaž Gams for suggestions and discussion and Barbara Tvrdi for help with programming.
References
[1]	AlertOne Services, Inc. iLife™ Fall Detection Sensor. http://www.falldetection.com, 2008-07-18.
[2]	Bourke, A. K., and Lyons, G. M. A threshold-based fall-detection algorithm using a bi-axial gyroscope sensor. Medical Engineering & Physics, vol. 30, issue 1, pp. 84-90, 2006.
[3]	Bourke, A. K., Scanaill, C. N., Culhane, K. M., O'Brien, J. V., and Lyons, G. M. An optimum accelerometer configuration and simple algorithm for accurately detecting falls. In Proceedings of the 24th IASTED international Conference on Biomedical Engineering, pp. 156-160, 2006.
[4]	Confidence. http://www.confidence-eu.org, 200809-15
[5]	Doughty, K., Lewis, R., and McIntosh, A. The
design of a practical and reliable fall detector for community and institutional telecare. Journal of Telemedicine and Telecare, vol. 6, pp. 150-154, 2000.
[6]	eMotion. Smart motion capture system. http://www. emotion3d.com/smart/smart.html, 2008-09-15.
[7]	Eurostat. http://epp.eurostat.ec.europa.eu, 2008-0909.
[8]	Fu, Z., Culurciello, E., Lichtsteiner, P., and Delbruck, T. Fall detection using an address-event temporal contrast vision sensor. In Proceedengs of the IEEE International Symposium on Circuits and Systems - ISCAS 2008, pp. 424-427, 2008.
[9]	Kaluža, B., and Luštrek M.. Fall Detection and Activity Recognition Methods for the Confidence Project: A Survey. In Proceedings of the 12th International Multiconference Information Society 2008, vol. A, pp. 22-25, 2008.
[10]	Kangas, M., Konttila, A., Lindgren, P., Winblad, P.,
and Jamsa, T. Comparison of low-complexity fall detection algorithms for body attached accelerometers. Gait & Posture, vol. 28, issue 2, pp. 285-291, 2008.
[11]	Kangas, M., Konttila, A., Winblad, I., and Jamsa, T. Determination of simple thresholds for accelerometry-based parameters for fall detection. In Proceedings of the 29th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, pp. 1367-1370, 2007.
[12]	Luštrek, M., and Gams, M. Posture and movement recognition from locations of body tags. European Conference on Ambient Intelligence, "Ambient Assisted Living" and "Personal Health" - between Paragdigms, Projects and Products workshop, 2008.
[13]	Luštrek, M., and Gams, M. Prepoznava položaja telesa s strojnim učenjem. In Proceedings of the 12th International Multiconference Information Society 2008, vol. A, pp. 30-33, 2008.
[14]	Maybeck, P. S. Stochastic models, estimation, and control. Mathematics in Science and Engineering 141, 1979.
[15]	Noury, N., Barralon, P., Virone, G., Boissy, P., Hamel. M., and Rumeau, P. A smart sensor based on rules and its evaluation in daily routines. In Proceedings of the 25th Annual International
Conference of the IEEE, Engineering in Medicine and Biology Society, vol. 4, pp. 3286-3289, 2003.
[16]	Qian, G., Guo, F., Ingalls, T., Olson, L., James, J., and Rikakis, T. A Gesture-Driven Multimodal Interactive Dance System. In Proceedings of the International Conference on Multimedia and Expo, Taipei, Taiwan, 2004.
[17]	Sukthankar, G., and Sycara, K. A Cost Minimization Approach to Human Behavior Recognition. In Proceedings of the Fourth international Joint Conference on Autonomous Agents and Multiagent Systems 2005, pp. 1067-1074, 2005.
[18]	Tapia, E. M., Intille, S. S., Haskell, W., Larson, K., Wright, J., King, A., and Friedman, R. Real-Time Recognition of Physical Activities and Their Intensities Using Wireless Accelerometers and a Heart Rate Monitor. In Proceedengs of the 11th IEEE International Symposium on Wearable Computers, pp. 37-40, 2007.
[19]	Ubisense. http://www.ubisense.net, 2008-09-15.
[20]	Vishwakarma, V., Mandal, C., and Sura, S. Automatic Detection of Human Fall in Video. Pattern Recognition and Machine Intelligence: Automatic Detection of Human Fall in Video, pp. 616-623, 2007.
[21]	Willis, D. J. Ambulation Monitoring and Fall Detection System using Dynamic Belief Networks. PhD Thesis, School of Computer Science and Software Engineering, Monash University, 2000.
[22]	Witten, I. H., and Frank, E. Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco, USA, 2005.
[23]	Wu, G. Distinguishing fall activities from normal activities by velocity characteristics. Journal of Biomechanics, vol. 33, issue 11, pp. 1497-1500, 2000.
[24]	Zhang, T., Wang, J., Liu, P., and Hou, J. Fall Detection by Embedding an Accelerometer in Cellphone and Using KFD Algorithm. International Journal of Computer Science and Network Security, vol. 6, issue 10, 2006.
[25]	Zhang, T., Wang, J., Liu, P., and Hou, J.. Fall Detection by Wearable Sensor and One-Class SVM Algorithm. Lecture Notes in Control and Information Science, issue 345, pp. 858-863, 2006.
Rajan Transform and its Uses in Pattern Recognition
Ekambaram Naidu Mandalapu
Professor, TRR College of Engineering, Hyderabad, India Research Scholar of the University of Mysore E-mail: menaidu2004@yahoo.co.in
Rajan E. G. Professor and Dean
JB Institute of Engineering & Technology, Hyderabad President, Pentagram Research Foundation An Undertaking of Pentagram Research Centre (P) Limited, India E-mail: president@pentagramresearch. com
Keywords: transform, pattern classification, image processing, permutation invariant systems Received: June 13, 2006
This transform was introduced in the year 1997 by Rajan [2], [4] on the lines of Hadamard Transform. This paper presents, in addition to its formulation, the algebraic properties of the transform and its uses in pattern recognition. Rajan Transform (RT) is a coding morphism by which a number sequence (integer, rational, real or complex) of length equal to any power of two is transformed into a highly correlated number sequence of the same length. It is a homomorphism that maps a set consisting of a number sequence, its graphical inverse and their cyclic and dyadic permutations, to a set consisting of a unique number sequence ensuring the invariance property under such permutations. This invariance property is also true for the permutation class of the dual sequence of the number sequence under consideration. A number sequence and its dual are like an object and its mirror image. For example, the four point sequences x(n) = 3, 1, 3, 3 and y(n) = 2, 4, 2, 2 are duals to each other. Observe that the sum of each sequence is 10 and one sequence could be obtained from the other by subtracting the elements of the other sequence from 5, which is half of its sum. Since RT of a number sequence is an organized number sequence with a high degree of correlation, it is suitable for effective data compression. This paper describes in detail the techniques of using RT for pattern recognition purposes. For example, pattern recognition operations like extracting lines, curves, isolated points and points of intersection of lines from a digital gray or colour image could be carried out using RT based fast algorithms. Povzetek: Prispevek opisuje Rajanovo transformacijo in njeno uporabo v prepoznavanju vzorcev.
purpose of this paper is to report the research carried out 1 Introduction	till date on the algebraic properties of RT and its role in
developing high- speed pattern recognition algorithms Pattern recognition is essentially a classification process. like contour detection, thinning, corners detection, lines
Set-theoretically, it is partitioning of domain set by a detection, curves detection, and detection of isolated
rule. One can also visualize pattern recognition as a	points in a given grey or colour image. Some algebraic
homomorphic map connecting domain and range sets,	properties like Regenerative Property of RT are found to while isomorphic maps are used for the purpose of have potential applications related to Cryptography. analysis. Mostly, mathematical transforms are associated with inverses and hence they are isomorphic in nature. For example, Discrete Fourier Transform is defined as an
Analysis-Synthesis pair like many other transforms. It is	Rajan Transform is essentially a fast algorithm developed
reasonable to raise a question here whether at all it is	on the lines of Decimation-In-Frequency (DIF) Fast
possible to develop a homomorphic transform, which	Fourier Transform algorithm, but it is different from the
would advocate operations related to pattern recognition.	DIF-FFT algorithm. Given a number sequence x(n) of
It is after a prolonged research and study of related	length N, which is a power of 2, first it is divided into the
techniques like Permutation Invariant Systems and	first half and the second half each consisting of (N/2)
Number Theory, that Professor Rajan and his research	points so that the following hold good. team introduced in the year 1997 a novel algorithm for
classification purposes, which they called as Rapid	g(j) = x(i)+x(i+(N/2)) ; 0<j<N/2 ; 0<i<N/2
Transform. Subsequent research on the algebraic	h(j) = |x (i) - x(i-N/2) )| ; 0 < j < N/2 ; (N/2) < i < N properties of this transform has exposed its richness and recently it was renamed as Rajan Transform (RT). The
2 Rajan Transform
Now each (N/2)-point segment is further divided into two halves each consisting of (N/4) points so that the following hold good.
gi(k)= gO) + gO + (N/4)) ; 0<k<(N/4); 0<j<(N/4) g2(k) = IgO) - gO - (N/4))| ; 0< k< (N/4); (N/4)< j< (N/2) hi(k)= h(j) + h(j + (N/4)) ; 0< k< (N/4); 0< j< (N/4) h2(k) = |h(j) - h(j - (N/4))| ; 0< k< (N/4); (N/4)< j< (N/2)
This process is continued till no more division is possible. The total number of stages thus turns out to be log2N. Let us denote the sum and difference operators respectively as + and ~.
Then the signal flow graph for the Rajan Transform of a number sequence of length 8 would be of the form shown in figure 2.1.
If x(n) is a number sequence of length N = 2k; K>0, then its Rajan Transform is denoted as X(k). RT is applicable to any number sequence and it induces an isomorphism in a class of sequences, that is, it maps a domain set consisting of the cyclic and dyadic permutations of a sequence on to a range set consisting of sequences of the form X(k)E(r) where x(k) denotes the permutation invariant RT and E(r) an encryption code corresponding to an element in the domain set. This map is a one-to-one and on to correspondence and an inverse map also exists. Hence it is viewed as a transform. Consider a sequence x(n) = 3, 8, 5, 6, 0, 2, 9, 6. Then X(k) = 39, 5, 13, 9, 13, 1, 7, 5. The signal flow graph for
this transform together with the encryption key (number 0 or 1 inside the brackets) is given in figure 2.1.
Observe from the diagram that E(r) is a union of three sequences Ei(r) = 0, 0, 0, 0, 1, 1, 0, 0 , E2(r) = 0, 0,
0,	0, 0, 0, 0, 1 and E3(r) = 0, 0, 0, 1, 0, 1, 0, 0. That is, E(r) = E1(r)E2(r)E3(r). Now, the sequence X(k)E(r) = 39, 5, 13, 9, 13, 1, 7, 5, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1,	0, 0, 0, 1, 0, 1, 0, 0 is the RT of the input sequence x(n) = 3, 8, 5, 6, 0, 2, 9, 6. In general, the first point in a RT is called 'Cumulative point Index' (CPI).
A closed form expression for RT is described below. RT is applicable to sequences consisting of both positive as well as negative numbers. The forward transform can be obtained by operating the input sequence using a matrix operator called R matrix, recursively by dividing input sequence as well as by partitioning the R matrix in the following manner.
^.2.1
Figure 2.1: Signal flow diagram of one-dimensional 8-point RT.
X is the column matrix containing input sequence of length N. IN/2 is the identity matrix of size N/2, that is, half the size of the input sequence matrix XNxi. Basic identity matrix I1=1; and ek is the encryption function with k as encryption value which is defined as ek = (-1)k such that
k = 1; for x(i + N/2) < x(i) ; 0<i<N/2	... 2.2
= 0; otherwise, and x(i) is the i'h element of matrix Xnx1 . Now divide the matrix ANX1 into two column matrices, namely A00 and A01 of size N/2 and treating these matrices as two input sequences, compute the two new A10 and A11 matrices by using the operator RN/2 for both the sequences. Note that size of the R matrices used is one half of the original RN matrix. Continue the above procedure iteratively till the size of the sub matrices is reduced to two. So, we have N = 2n-p, for p'h iteration stage, with p = 1, n. All the encryption binary values, that is, k values thus generated during the process of computation per iteration are to be associated with the final spectrum, to recover the original sequence through inverse transformation technique. Note that this closed form expression precisely defines the algorithm stated in the beginning of this section.
3 Inverse Rajan Transform
Retrieval of the information or the signal x(n) can be done by Inverse Rajan Transform (IRT) [2], [4]. The basic requirements for the IRT computation are the RT coefficients associated with the encryption values (k values) that are generated by encryption function while computing forward RT.
The strategy adopted here is to retrace the forward transform signal flow graph. It was observed that the point-wise addition of a constant value, say K to the sample sequence x(n) while computing forward RT, changes only the DC component X0, that is, the CPI to X0+NK, which is the first coefficient of the newly
computed RT, and the remaining spectral values remain the same of the original spectrum. For example, the RT spectrum X(k) of the sequence x(n) = 3, 8, 5, 6, 0, 2, 9, 6 is 39, 5, 13, 9, 13, 1, 7, 5. Now let us build a new sequence x1(n) = 4, 9, 6, 7, 1, 3, 10, 7 by adding K=1 to the sequence x(n) = 3, 8, 5, 6, 0, 2, 9, 6. The RT spectrum of the new sequence is X1(k) = 47, 5, 13, 9, 13, 1, 7, 5.
Now, in order to work with sequences containing negative sample values, we proceed as usual in the case of forward transform. But, the inverse transform is calculated just by adding a constant value N(2M-1) to the CPI value of the spectrum. M is the bit length required to represent the maximum quantization level of the samples and N is the length of the sequence. This constant factor K = (2m-1) is chosen such that all the maximum possible negative values of the sequence x(n) are level shifted to 0 or above. This DC shift is required, because we hide the sign of the negative values that are generated while computing the forward RT. As mentioned earlier, RT induces an isomorphism between the domain set consisting of the inverse, cyclic, dyadic and dual class permutations of a sequence on to a range set consisting of sequences of the form X(k)E(r) where X(k) denotes the permutation invariant RT and E(r) an encryption code corresponding to an element in the domain set. This map is a one-to-one and on-to correspondence and an inverse map also exists. Thus RT is viewed as a transform. Now we provide a technique for obtaining the inverse of Rajan Transform.
Inverse Rajan transform (IRT) is a recursive algorithm and it transforms a RT code X(k)E(r) of length N(1+m) where N = 2m and m is the number of stages of computation, into one of its original sequences belonging to its permutation class depending on the encryption code E(r). The computation of IRT is carried out in the following manner. First the input sequence is divided into segments each consisting of two points so that either g(2j+1) = (X(2k)+X(2k+1))/2 g(2j) = max (X(2k), X(2k+1))-g(2j+1) if El (2r)= 0 and Ei(2r+1) = 0; 0<j<N; 0<k<N; 0<r<N, or
g(2j) = (X(2k)+X(2k+1))/2 g(2j+1) = max (X(2k), X(2k+1) - g(2j) if El (2r)= lor El (2r+l) = 1; 0<j<N; 0< k<N; 0<r<N. The resulting sequence is divided into segments each consisting of four points. Each 4-point segment is synthesized as per the above procedure. The resulting sequence is further divided into segments each consisting of eight points and the same procedure is carried out. This process is continued till no more division is possible. Consider X(k) = 39, 5, 13, 9, 13, 1, 7, 5. Then its IRT is x(n) = 3, 8, 5, 6, 0, 2, 9, 6. The inverse x(n) is obtained from the given X(k)E(r) as shown in figure 3.1.
The symbols ^ > and ~ respectively denote the operators average of two, maximum of two and difference of two. Note that IRT will work only in the presence of encryption sequence E(r) and for every member of the permutation class there would be a unique
encryption sequence. Study of the class of encryption sequences corresponding input sequences itself is a field of active research.
Figure 3.1: Signal flow diagram of IRT of a spectral sequence.
As already outlined that RT hides negative signs and so it is important to add N(2M-1) to X(0) in order to get back the original sequence. Let XNXi be the column matrix of the RT coefficients of length N, and the associated encryption values generated per iteration are orderly arranged in the form of column matrices E1, E2, En-1, of length N/2, where n = log2N . For N = 8, we have E1 and E2, two encryption matrices of size 4X1 with binary encryption values as elements. Here XNx1 is the RT coefficient matrix. Rnx1 is the intermediate matrix. Anx1 is the matrix with the elements positioned in their appropriate positions using the encryption matrix Er corresponding to the current iteration. Er is the matrix obtained by element wise complementing the Er matrix, where r = n-1, n-2, 1.
Since we trace back the forward algorithm, we start initially with N=2p, with p = 1, 2, 3, _ , n. Hence, we use initially the encryption matrix that is generated at the last stage of forward RT computation and further use the remaining matrices in the reverse order, for IRT
computation. Each encryption matrix is to be partitioned as per the requirement.
The closed form expression shown as equation 3.1 represents the Inverse Rajan Transform. For example, let us consider a sequence x(n) which has a maximum

R.
0
upto —(zeros)

... 3.1

R.,

I F^ 1
. NnJ L N^IJLN^I
R.
sample value of -15. The RT of this sequence is X(k).
......3.1
Now one can assume the value of M to be 4 so that the maximum negative sample value in the sequence is -15. Addition of +15 to each sample value in an 8-point input sequence x(n) will lead to the sequence x1(n) = x(0)+15, x(1)+15, _ , x(7)+15, which is level shifted to 0 and above. Now the spectrum of this level shifted sequence x1(n) would be X1(k) = X1(0), X1(1), _ , X1(7), where X1(0) = X(0)+120, X1(1) = X(1), _ , X1(7) = X(7).
A specific example would make things easy. Let us consider a sequence x(n) = 3, 8, 5, 6, 0, 2, 9, 6 and its RT, X(k) = 39, 5, 13, 9, 13, 1, 7, 5. Now addition of 15 to each sample point in x(n) would yield a sequence x1(n) = 18, 23, 20, 21, 15, 17, 24, 21 and its RT would be X1(k) = 159, 5, 13, 9, 13, 1, 7, 5. Observe that X1(0) = X(0) + 120 = 39 + 120 = 159. As outlined earlier, one can add N(2m-1) to X(0) of a spectral sequence and compute the IRT as usual. Then it is important to subtract 2M-1 from every sample point so as to obtain the actual sample domain sequence. Let us take the case of X(k) = 39, 5, 13, 9, 13, 1, 7, 5. Now let us add 120 only to X(0) so that the spectral sequence becomes 159, 5, 13, 9, 13, 1, 7, 5. The IRT of this sequence is 18, 23, 20, 21, 15, 17, 24, 21. By subtracting 15 from each sample value we obtain the actual sequence 3, 8, 5, 6, 0, 2, 9, 6.
4 Algebraic Properties of Rajan Transform
RT has interesting algebraic properties like permutation invariance, scalar property and linear pair forming property. All such properties are explained in detail in the following.
4.1	Cyclic shift invariance property
Let us consider the same sequence x(n) = 3, 8, 5, 6, 0, 2, 9, 6. Using this sequence one can generate seven more cyclic shifted versions such as xc1(n) = 6, 3, 8, 5, 6, 0, 2, 9 , xc2(n) = 9, 6, 3, 8, 5, 6, 0, 2 , xc3(n) = 2, 9, 6, 3, 8, 5, 6, 0 , xc4(n) = 0, 2, 9, 6, 3, 8, 5, 6 , xc5(n) = 6, 0, 2, 9, 6, 3, 8, 5 , xc6(n) = 5, 6, 0, 2, 9, 6, 3, 8 and xc7(n) = 8, 5, 6, 0, 2, 9, 6, 3. It is obvious that the cyclic shifted version of xc7(n) is x(n) itself. One can easily verify that all these eight sequences have the same X(k), that is, the sequence 39, 5, 13, 9, 13, 1, 7, 5 but different E(r).
4.2	Graphical inverse invariance property
The sequence x(n) = 3, 8, 5, 6, 0, 2, 9, 6 has the graphical inverse x"1(n) = 6, 9, 2, 0, 6, 5, 8, 3. Using this sequence one can generate seven more cyclic shifted versions such as xci"1(n) = 3, 6, 9, 2, 0, 6, 5, 8 , xc2"1(n) = 8, 3, 6, 9, 2,
0,	6, 5 , xc3-1(n) = 5, 8, 3, 6, 9, 2, 0, 6 , xc4-1(n) = 6, 5, 8, 3, 6, 9, 2, 0 , xc5-1(n) = 0, 6, 5, 8, 3, 6, 9, 2 , xc6-1(n) = 2, 0, 6, 5, 8, 3, 6, 9 and xc7-1(n) = 9, 2, 0, 6, 5, 8, 3, 6. It is obvious that the cyclic shifted version of xc7-1(n) is x-1(n) itself. One can easily verify that all these eight sequences have the same X(k), that is, the sequence 39, 5, 13, 9, 13,
1,	7, 5 but different E(r)
4.3	Dyadic shift invariance property
The term 'dyad' refers to a group of two, and the term 'dyadic shift' to the operation of transposition of two blocks of elements in a sequence. For instance, let us take x(n) = 3, 8, 5, 6, 0, 2, 9, 6 and transpose its first half with the second half. The resulting sequence Td(2)[x(n)] = 0, 2, 9, 6, 3, 8, 5, 6 is the 2-block dyadic shifted version of x (n). The symbol Td(2) denotes the 2-block dyadic shift operator. In the same manner, we obtain Td(4)[Td(2)[x(n)]] = 9, 6, 0, 2, 5, 6, 3, 8 and Td(8)[Td(4)[Td(2)[x(n)]]]=6, 9, 2, 0, 6, 5, 8, 3. One can easily verify that all these dyadic shifted sequences have the same X(k), that is, the sequence 39, 5, 13, 9, 13, 7, 5 but different E(r). There is yet another way of dyadic shifting the input sequence x(n) to Td(2)[Td(4)[Td(8)[x(n)]]]. Let us take x(n) = 3, 8, 5, 6, 0, 2, 9, 6 and obtain the following dyadic shifts: Td(8)[x(n)] = 8, 3, 6, 5, 2, 0, 6, 9 , Td(4)[Td(8)[x(n)]] = 6, 5, 8, 3, 6, 9, 2, 0 and Td(2)[Td(4)[Td(8)[x(n)]]] = 6, 9, 2, 0, 6, 5, 8, 3. Note that Td(2)[Td(4)[Td(8)[x(n)]]] = Td(8)[Td(4)[Td(2)[x(n)]]]. One can easily verify from the above that other than Td(4)[Td(2)[x(n)]] and Td(8)[x(n)], all other dyadically permuted sequences fall under the category of the cyclic permutation class of x(n) and x"1(n). This amounts to saying that the cyclic permutation class of x(n) has eight non-repeating independent sequences, that of x"1(n) has eight non-repeating independent sequences and the dyadic permutation classes of x(n) has two non-repeating independent sequences. To conclude, all these 18 sequences could be seen to have the same X(k). Each of these 18 sequences has an independent encryption key E(r).
4.4 Dual class invariance
Given a sequence x(n), one can construct another sequence y(n) consisting of at least one number which is not present in x(n) such that X(k) = Y(k). In such a case, y(n) is called the dual of x(n). Consider two sequences x(n) = 2, 4, 2, 2 and y(n) = 3, 1, 3, 3. Then X(k) = Y(k) = 10, 2, 2, 2. Figure 4.4.1 shows the symmetry property of the sequences x(n) and y(n). The term Differential Mean (DM) refers to the mean about which the two sequences are symmetrical to each other.
Figure 4.4.1: Symmetry property of dual sequences (DM=2.5).
The major contribution of this paper is a characterization theorem (theorem 4.4.1) for classifying sequences, which have dual sequences. Theorem 4.4.1: A sequence of length N=2n is said to have a dual if and only if its CPI is an even number and is divisible by N/2.
Corollary 4.4.1.1: Theorem 4.4.1 advocates a necessary condition but not a sufficient condition. This means that all sequences that satisfy theorem 4.4.1 need not form valid dual pairs with other sequence vide definition of dual. For example, let us consider the sequence x(n) = 6, 8, 2, 0. This satisfies theorem 4.4.1. That is its CPI is 16 and the value of CPI/(N/2) is 8. Now its dual is computed as y(n) = 2, 0, 6, 8 , which is not a dual of x(n) as per the definition.
PROPERTIES OF DUAL SEQUENCES
1.	If y(n) is a dual of x(n), then x(n) is also called the dual of y(n). Hence the pair <x(n), y(n)> is called 'dual pair'.
2.	Dual of a sequence, say y(n) will necessarily exhibit geometric symmetry together with the original sequence x(n).
3.	Each dual pair has a value called 'Differential Mean' (DM), which is equal to (|x(i)-y(i)|)/2; 0 < i < (N-1). It is obvious that the DM could be a real number.
4.5 Regenerative property
As per theorem 4.4.1, an arbitrary sequence x(n) of length N=2n is said to have a dual y(n) if and only if its CPI is an even number and is divisible by N/2. In other words, x(n) is said to form a dual pair with y(n). It was observed that the sequence |x(0)-y(0)|, |x(1)-y(1)|, |x(2)-
y(2)|, ....., |x(N-2)-y(N-2)|, |x(N-1)-y(N-1)| also satisfies
theorem 4.4.1 and is eligible to form a dual pair with yet another sequence. For example let us consider a sequence x(n) = 3, 1, 3, 3. This satisfies theorem 4.4.1 since its CPI is 10 and it is divisible by N/2, that is, 2 yielding the value 5. Now one can obtain the dual sequence y(n) = 2, 4, 2, 2 by subtracting each element of x(n) from 5. Now x(n) and y(n) form a pair and yield their 'first generation sequence', say X1(n) = 1, 3, 1, 1 , which is obtained by finding the point wise difference between the parent sequences x(n) and y(n). One can easily verify that xi(n) also satisfies theorem 4.4.1 and forms a dual pair with y1(n) = 2, 0, 2, 2. This property is termed here as 'regenerative property'. Another important observation follows. x1(n) forms a dual pair with y1(n) but this pair does not produce an offspring sequence as it was done by the pair <x(n), y(n)>.This could be easily verified by the fact that the point wise difference between x1(n) and y1(n) yields x1(n) only. Hence for brevity we call the sequences xi(n) and yi(n) as 'sterile sequences' and the pair <x1(n), y1(n)> as a 'sterile pair'. This property exhibits the possibility of further research on generative and sterile pairs.
4.6	Scalar property
Let x(n) be a number sequence and I be a scalar. Then the RT of ^x(n) will be ^X(k), where X(k) is the RT of x(n). For example, let us consider a sequence x(n) = 1, 3, 1, 2 and a scalar I of value 2. Now the RT X(k) of x(n) is 7, 3, 1, 1. The RT of ^x(n) = 2, 6, 2, 4 is 14, 6, 2, 2 which is nothing but ^X(k).
4.7	Linearity property
In general, RT does not satisfy the linearity property. However, it was observed that for a pair x(n) and y(n) which are number sequences either in the increasing order or in the decreasing order, the linearity property holds. That is, for ^x(n)+my(n) where I and m are scalars and x(n) and y(n) are two number sequences either in the increasing or decreasing order, the RT will be ^X(k)+mY(k), where X(k) and Y(k) are respectively the RTs of x(n) and y(n). A characterization theorem is yet to be established for categorizing pairs of sequences which would satisfy linearity property.
4.8	Linear pair forming property
Two sequences x(n) and y(n) are said to form a linear pair when X(k)+Y(k) is the RT of x(n)+y(n), where X(k) and Y(k) are the RTs of x(n) and y(n) respectively. The symbol + denotes the point wise addition of two sequences. As outlined earlier, pairs of sequences consisting of increasing numbers of decreasing numbers only form linear pair. In general arbitrary sequences do not form linear pairs. Consequently RT could be viewed as a nonlinear transform. However, higher order RT spectra do form linear pairs, and this has been identified as a very useful property for pattern recognition
purposes. For instance let us consider two arbitrary sequences x(n) and y(n) given below: x(n) = 2, 2, 2, 1, 6, 2, 6, 1, 2, 3, 0, 0, 2, 5, 5, 4; y(n) = 4, 5, 3, 1, 0, 1, 4, 6, 6, 8, 0, 0, 7, 9, 0, 5; Now the RT of x(n) denoted as X1(k) is computed as X1(k) = 43, 7, 7, 5, 19, 7, 7, 3, 15, 1, 1, 1, 9, 1, 3, 3 and the RT of y(n) denoted as Y1(k) is computed as Y1(k) = 59, 11, 21, 1, 17, 9, 9, 5, 29, 3, 11, 7, 11, 1, 9, 1. Now, z(n) = x(n) + y(n) is given by the sequence 6, 7, 5, 2, 6, 3, 10, 7, 8, 11, 0, 0, 9, 14, 16, 6, 8, 6, 8, 6, where as X1(k) + Y1(k) = 102, 18, 28, 6, 36, 16, 16, 8, 34, 4, 12, 8, 20, 2, 12, 4. Note that Z(k) is not equal to X1(k) + Y1(k). Now, the second order RT spectra of X1(k) and Y1(k) are respectively computed as X2(k) = 132, 76, 72, 64, 32, 32, 28, 28, 64, 32, 36, 20, 24, 16, 20, 12 and Y2(k) = 204, 128, 76, 56, 80, 68, 48, 44, 72, 20, 32, 20, 36, 32, 16, 12. Let z1(n) be the sequence X1(k) + Y1(k). Then Z1(k) is the RT of and z1(n) and is computed to be 326, 194, 138, 110, 98, 86, 70, 66, 138, 70, 86, 42, 66, 62, 42, 38. Note that Z1(k) is not equal to X2(k) + Y2(k) = 336, 204, 148, 120, 112, 100, 76, 72, 136, 52, 68, 40, 60, 48, 36, 24. This procedure is repeated for the third order spectra as follows: X3(k) = 688, 128, 128, 64, 304, 96, 96, 64, 240, 0, 32, 32, 144, 32, 32, 32 and Y3(k) = 944, 184, 336, 104, 272, 136, 144, 88, 464, 40, 176, 24, 176, 24, 144, 8 are the RTs of X2(k) and Y2(k) respectively. The RT of X2(k) + Y2(k) is computed to be 1632, 312, 464, 168, 576, 232, 240, 152, 704, 40, 208, 56, 320, 56, 176, 40 which is nothing but X3(k) + Y3(k). Thus it is clear from the above example that higher order RT spectra of arbitrary sequences of equal length do form linear pairs, and so the name 'linear pair forming property'. Linear pair forming property is also called self-organizing property.
5 Pattern Recognition and Processing using Rajan Transform
5.1 Basic philosophy
Spectral domain image processing advocates the manipulation of spectral components in deciding the conditions for extracting various features in a given image. For example, high pass filtering of frequency spectral components allows the extraction of edges in a digital image, where as, low pass filtering of frequency spectral components causes smoothing of edges. Rajan Transform is basically a fast algorithm which yields spectral components of a signal sequence. Thus it is
reasonable to surmise that one can as well think of using RT for spectral domain processing of digital images and one may proceed on the following lines. The given digital image is scanned by the neighbourhood window, which is shown in figure 5.1.1.
The image pixels corresponding to the cells in the window are labeled as x0, x1, x2, x3, x4, xs, x6 and xv. RT of this sequence is X(k). The CPI is divided by 8, which yields the average value. The number of spectral values in the RT sequence, which are below or above the average value, plays an important role in the processing of an image. Consider a sequence x(n) = 1, 3, 2, 1, 2, 3, 1, 2 and its RT : X(k) = 15, 3, 3, 3, 3, 1, 1, 1.
Figure 5.1.1: Neighbourhood window.
1 2 3 4 5 6 7 Figure 5.1.2: Pictorial representation of x(n) and X(k).
With reference to figure 5.1.2, the CPI: X0 is 15 and the average value is X0 / 8 = 15 / 8 = 1.875. Note that there are three spectral values less than the average value and three are above the average value, excluding X0. The spectral components 1, 1 and 1 shown in figure 5.1.2 are less than the average value 1.875. The central pixel value of the image is updated based on such facts.
5.2 Sample algorithms for image processing and pattern recognition
In what follows, we discuss RT based fast algorithms for certain traditional image processing operations like contouring and thinning and for pattern recognition operations like corners detection, lines detection and curves detection [3].
5.2.1 Contouring algorithm
The given digital image is scanned by a 3X3 empty window. On each move, the boundary pixels of the 3X3 sub image covered by this window are stored in a temporary array and its RT found. Then the average value T is calculated by dividing the CPI by 8. Now each RT element is compared with T. The number of elements that are less than T is counted. If this count is greater than 6, then the central pixel is substituted with a value 0.
Otherwise, the window moves to the right. This procedure is continued till the entire image is scanned.
The overall effect is that the boundaries of regions that appear to be uniform are retained and the central pixels are erased. One can always choose the value of T from 3 to 7 depending on the requirement. Figure 5.2.1.1 shows the digital image of a pattern and its contour map obtained the RT based algorithm.
Figure 5.2.1.1: Original image and its contour map. 5.2.2 Thinning algorithm
Thinning is the complementing operation of contouring. The algorithm given for contouring is used in this case also but with the exception that instead of removing the central pixel, the boundary pixels are removed. This operation is otherwise called Onion Peeling. Repeated boundary removal leads to thinned version of the original image. Figure 5.2.2.1 shows the pattern and its thinned version obtained the above RT based thinning algorithm.
The above procedure is repeated till the entire image is scanned. The overall effect is that the resulting image would consist only of corner points. Figure 5.2.3.1 shows the image of the contoured pattern and its processed image consisting of corners only using the above RT based corners detection algorithm.
Figure 5.2.3.1: Contoured pattern and image consisting of corners.
5.2.4 Lines detection
On every move of the 3X3 window, the RT of each boundary values sequence is checked for the following condition:
(1) The number of RT elements that are less than T must be 4. (2) The first four RT elements including the CPI should be greater than T.
In such a case, the central pixel has to be a mid point of a line and so is chosen. The overall effect is that the resulting image would consist only of straight lines. Figure 5.2.4.1 shows the image of the contoured pattern and its processed image consisting of lines only using the above RT based lines detection algorithm.
Figure 5.2.2.1 : Original image and its thinned version. 5.2.3 Corners detection
On every move of the 3X3 window, the RT sequence, say,	G[0],G[1],G[2],G[3],G[4],G[5],G[6],G[7]
corresponding to the sequence of the numbered cell values is checked for the following conditions : Corners due to pairs of lines subtending an angle of 45° or its integral multiples:
(1) The number of RT elements that are less than T must be 4. (2) Alternate RT elements G[0], G[2], G[4], and G[6] should be greater than T.
Comers due to pairs of lines subtending 90° or its integral multiples:
(1) The number of RT elements that are less than T must be 4. (2) The RT elements G[0], G[1], G[4] and G[5] should be greater than T.
r		n
■ '■■iiifiMn	llflNi	n lini If I •
—■		■—
Figure 5.2.4.1: Contoured pattern and image consisting of lines.
5.2.5 Curves detection
A curve is anything other than a straight line. So, the algorithm for detecting curves advocates the invalidity of the conditions for detecting a straight line. Figure 5.2.5.1 shows the image of the contoured pattern and its processed image consisting of curves only using the above RT based curves detection algorithm.
Figure 5.2.5.1: Contoured pattern and image consisting of curves
6	Conclusions
Rajan Transform has been used as a high-speed spectral domain tool to carry out pattern recognition and digital image processing operations. Some of the algorithms meant for pattern recognition using feature extraction have been presented in this paper. Recent investigations show that RT could be one of the best tools that could be used in cryptography. RT based watermarking of text and images and character recognition would yield a repertoire of tools of the future. Efforts are being made to abstract the notion of RT to Symbolic Processing of Signals and Images in the DNA computing paradigm.
7	Acknowledgements
This paper is the result of a fundamental research carried out in Pentagram Research Centre, Hyderabad. We sincerely thank Professor Rajan for his valuable guidance and encouragement to work on the concept introduced by him in the year 1997. Thanks are due to Mrs. G. Chitra, Managing Director of Pentagram Research Centre, Hyderabad, for all administrative and financial support given to our team to carry out this research.
Rajan Transform in Recognising Patterns due to defects in the weld", JOM—8 International Conference on the Joining of Materials, American Welding Society, JOM Institute, Helsingor, Denmark, May 1997.
[7]	Rajan, E.G., Prakas Rao, K., Vikram, K., Harish Babu, S., Rammurthy, D., Rajasekhar, "A Generalised Rajan Transform Based Expert System for Character Unduerstanding", World multiconference on Systemics, Cybernetics and Informatics, Caracas, Venezuela, July 7-11, 1997.
[8]	Rajan, E.G., Venugopal, S., "The role of RT in the machine vision based factory automation", National conference on Intelligent manufacturing systems, February 6 - 8, 1997, Coimbatore Institute of Technology.
[9]	Srinivas Rao, N., — Data Encryption and Decryption Using Generalised Rajan Transform", M. Tech Thesis, department of Electronics and Communication Engineering, Regional Engineering College, Warangal, 1997.
[10]	Rajan E. G., "Fast algorithm for detecting volumetric and superficial features in 3-D images", International Conference on Biomedical Engineering, Osmania University, Hyderabad, 1994.
[11]	Kishore K, Subramanyam J. V., and Rajan E. G., "Thinning Lattice Gas Automation Model for Solidification Processes, National Conference by ASME, USA, Hilton Hotel, California.
[12]	Rajan E. G., "On the notion of a geometric filter and its relevance in the neighborhood processing of digital images in hexagonal grids", International Conference on Control, Automation, Robotics and Vision, Westim Stamford, Singapore 1996.
8 References
[1]	Rajan, E G, Cellular Logic Array Processing Techniques for High-Throughput Image Processing Systems, Invited paper, SADHANA, Special Issue on Computer Vision, The Indian Academy of Sciences, Vol 18, Part-2, June 1993, pp 279-300.
[2]	Rajan, E G, On the Notion of Generalized Rapid Transformation, World multi conference on Systemics, Cybernetics and Informatics, Caracas, Venezuela, July 7 - 11, 1997
[3]	Rajan, E.G., Symbolic Computing - Signal and Image Processing, B.S. Publications, Hyderabad, 2003.
[4]	Venugopal S, Pattern Recognition for Intelligent Manufacturing Systems using Rajan Transform, MS Thesis, Jawaharlal Nehru Technological University, Hyderabad, 1999.
[5]	Rajan E. G., "Symbolic Computing - Signal and Image Processing", Anshan, CBS, UK. 2004.
[6]	Rajan, E.G., Ramakrishna Rao, P., Mallikarjuna Rao, G., and Venugopal, S., "Use of Generalised
A Petri-Net Approach to Refining Object Behavioural Specifications
King-Sing Cheung University of Hong Kong Pokfulam, Hong Kong E-mail : ks.cheung@hku.hk
Paul Kai-On Chow City University of Hong Kong Tat Chee Avenue, Hong Kong E-mail : cspchow@cityu.edu.hk
Keywords: Petri net, object-oriented system, object-oriented design, behavioural specification Received: April 12, 2008
In object-oriented system design, functional requirements are given and expressed as object interaction scenarios whereas implementation is based on classes of objects. One need to derive, from the given object interaction scenarios, object-based behavioural specifications which reflect exactly these object interaction scenarios for implementation purposes. In this paper, a Petri-net-based method is proposed for the refinement. It begins with specifying each object interaction scenario as a labelled Petri net with an AMG-structure. These labelled Petri nets are synthesised into a single integrated net which represents the integrated system. By making use of the special properties of the AMG-structure, the system can be effectively analysed on its liveness, boundedness, reversibility and conservativeness. Duplicate labels are then eliminated by fusing common subnets, so as to attain a uniquely labelled net on which individual object-based behavioural specifications are obtained as projections.
Povzetek: Uporabljen je pristop Petrijevih mrež za objektne specifikacije.
1 Introduction
In the past two decades, object orientation has been an influential discipline in software engineering1. According to the principles of object orientation, an object is an entity that encapsulates states and behaviours. A system is considered as a collection of objects which are interacting with others in order to accomplish the system functionalities. It can be abstracted in two aspects (structure and behaviour) and two levels (intra-object and inter-object) as shown in Figure 1 [1, 2, 3, 4, 5, 6, 7, 8, 9]. Structurally, objects with the same attributes are grouped into classes while classes having common attributes are generalised to form an inheritance hierarchy. Objects exhibit different behaviours on interacting with others, thus demonstrating different object interaction scenarios. This paper investigates the behavioural aspect of objects.
In object-oriented system design, the functional requirements of a system are given by the end-users as use cases - the typical cases of how a system can be used [10, 11]. These use cases are elaborated and expressed in terms of object interaction scenarios and specified as UML sequence diagrams and collaboration diagrams [11, 12, 13, 14, 15, 16]. We need to create, from the object interaction scenarios, object-based specifications
delineating the behaviours of individual objects for detailed system design and implementation.
structural aspect
I
behavioural aspect
I
inter-object level
intra-object level
Object Relationship inheritance hierarchy, association, aggregation (inheritance hierarchy structure)	Object Interaction object interaction object collaboration (sequence diagram, collaboration diagram)
Object Classification class of objects, class attributes, operations (class diagram, object diagram)	Object Lifecycle states, transition of states, state actions, activities (statechart diagram, activity diagram)
1This paper is an extended version of the authors' paper presented at the REFINE 2006 workshop.
Figure 1: An object-oriented system by aspects and abstraction levels.
In this refinement process, at least the following problems have to be tackled.
Specification constructs for object interaction scenarios being too primitive. Conventional specification constructs for object interaction scenarios lacks the formality for representing the pre-conditions and postconditions for each event occurrence. These are however essentially required in deriving the object behavioural specifications, where the conditions, events and their causal relationships need to be explicitly specified.
Different abstractions between intra-object lifecycle and inter-object interaction. It is difficult to derive individual object behaviours (within a single object lifecycle) from the object interaction scenarios (among multiple objects) because of the difference in abstraction (intra-object versus inter-object). In the literature of object-oriented system design, there is a lack of systematic approaches to solving this problem satisfactorily.
Difficulty in verifying the correctness of the object behavioural specifications. The object behavioural specifications so derived should be correct in the sense that they reflect exactly the given object interaction scenarios [4, 17, 18, 19, 20]. Without a formal method, one needs to go through all possible object interaction scenarios to ensure correctness of the specifications. The process is time-consuming.
Lack of rigorous methods for analysing the system properties. One major objective in system design is to obtain a live, bounded and reversible system - liveness implies freeness of deadlocks, and boundedness implies absence of capacity overflows, while reversibility refers to recoverability. Without a rigorous analysis method, it is difficult for one to analyse whether the outcome system design is live, bounded and reversible.
In the literature, there are only a few approaches or methods for deriving an object-based behavioural specifications from a given set of use cases or object interaction scenarios. Bordeleau proposed an approach which takes a traceable progression from use cases to the object-based state machines [21, 22]. Dano proposed an approach where the use cases are synthesised into a system design according to some mapping rules [23, 24]. However, these approaches solve only trivial issues. The system design cannot be rigorously analysed on its liveness, boundedness and reversibility. Moreover, they are themselves incomplete and insufficient in the sense that the derived object-based state machines may not reflect exactly the given use cases or object interaction scenarios.
On the other hand, there are approaches or methods which derive a system from a given set of event traces or sequences. Graubmann proposed a method for constructing an elementary net system from a set of event traces [25]. The method is based on the dependence relation between events. A set of possible states and state transitions, which are compatible to the dependence relation, are deduced. Smith proposed a method for constructing a condition-event system from a set of occurrence nets based on the concept of quotient nets [26]. Hiraishi proposed a method for constructing a Petri net from a set of firing sequences [27]. In Hiraishi's method, a language is first identified from the firing sequences. Based on the dependency relation extracted from the language, a safe Petri net is created. Lee also proposed an approach for integration of use cases using constraint-based modular Petri nets [28]. However, without concepts of object-orientation, these approaches and methods cannot be applied to object-oriented system design.
In this paper, based on Petri nets, we propose a method for refining a given set of object interaction scenarios into object-based behavioural specifications, where the above-mentioned problems can be resolved effectively. It involves the following steps :
Step 1. Each object interaction scenario is specified as a labelled Petri net (labelled net) with an AMG-structure (i.e. structurally an augmented marked graph).
Step 2. The labelled nets are synthesised into an integrated net which serves to represent the system. Based on the properties of AMG-structure, the system is analysed.
Step 3. Duplicate labels are eliminated from the integrated net, while preserving the firing sequences (event sequences).
Step 4. Individual object-based specifications are obtained as projections of the integrated net onto the objects.
Figure 2 shows an overview of the proposed method.
4. projecting the net onto the individual objects
Figure 2: Overview of the proposed method.
Our proposed method offers a number of distinctive features.
Formal specification of object interaction scenarios. The object interaction scenarios are specified as unambiguous and semantically rich labelled nets. The partial orderings of events as well as the causal relationships between events and conditions are explicitly represented.
Effective analysis on the essential system properties. The integrated system possesses an AMG-structure. By making use of the special properties of AMG-structure, the system can be effectively analysed on its liveness, boundedness, reversibility and conservativeness.
Correctness of the derived specifications. Individual object behavioural specifications are rigorously derived from the object interaction scenarios through synthesis and projection. The specifications so obtained reflect exactly the given object interaction scenarios.
Readiness for implementation purposes. In the specifications, every condition or event is uniquely represented so that they can be readily used for implementation purposes.
The rest of this paper is organised as follows. Section 2 provides the preliminaries to be used in this paper. Section 3 introduces the AMG-structure, where augmented marked graphs and their properties are described. In Section 4, we show the formal specification of object interaction scenarios as labelled nets (Step 1 of the proposed method). In Section 5, we focus on synthesising the labelled nets into an integrated system, and analysing the system properties (Step 2 of the proposed method). Section 6 then presents an algorithm for eliminating duplicate labels from the integrated net (Step 3 of the proposed method). In Section 7, we show how individual object-based behavioural specifications are obtained as projections of the integrated net (Step 4 of the proposed method). Section 8 gives a real-life example for illustration. Section 9 concludes this paper.
It should be noted that this paper primarily focus on refinement of object-based behavioural specifications -deriving individual object-based specifications from the object interaction scenarios. The structural aspect of an object-oriented system will not be investigated.
2 Preliminaries
This section provides the preliminaries for readers who are not familiar with Petri nets [29, 30, 31, 32].
A place-transition net (PT-net) is a directed graph consisting of two sorts of nodes called places and transitions, such that no arcs connect two nodes of the same sort. Graphically, a place is denoted by a circle, a transition by a box, and an arc by a directed line. A Petri net is a PT-net with some tokens assigned to its places, and the token distribution over its places is denoted by a marking function.
Definition 2.1. A place-transition net (PT-net) is a 4-tuple N = < P, T, F, W ), where P is a set of places, T is a set of transitions, F c (P x T) u (T x P) is a flow relation and W : F ^ { 1, 2,... } is a weight function. N is said to be ordinary if and only if the range of W is { 1 }. An ordinary PT-net is usually written as < P, T, F ).
Definition 2.2. Let N = < P, T, F, W ) be a PT-net. For X e (P u T), 'x = { y I (y, x) e F } and x* = { y I (x, y) e F } are called the pre-set and post-set of x, respectively. For X = { Xj, Xj, ..., x^ } c (P u T), 'X = "xi u 'x2 u ... u 'xn and X' = x/ u X2' u ... u x^* are called the pre-set and post-set of X, respectively.
Definition 2.3. For a PT-net N = < P, T, F, W ), a path is a sequence of nodes < xj, X2, ..., x^ ), where (Xj, Xi+i) e F for i = 1, 2, ..., n-1. A path is said to be elementary if and only if it does not contain the same node more than once.
Definition 2.4. For a PT-net N = ( P, T, F, W ), a cycle is a sequence of places ( pi, P2, •••, Pn ) such that 3
ti, t2, ..., tn e T : ( Pi, ti, P2, t2, ..., Pn, tn ) foHus au elementaiy path and (tn, Pi) e F.
Definition 2.5. For a PT-net N = ( P, T, F, W ), a marking is a function M : P ^ { 0, 1, 2, ... } where M(p) is the number of tokens in p. (N, Mq) represents N with an initial marking Mq.
Definition 2.6. For a PT-net N = ( P, T, F, W ), a transition t is said to be enabled at a marking M if and only if V p £ 't : M(p) > W(p,t). On firing t, M is changed to M' such that V p e P : M'(p) = M(p) - W(p,t) + W(t,p). In notation, M [N,t) M' or M [t) M'.
Definition 2.7. For a PT-net (N, Mq), a sequence of transitions a = ( ti, t2, ..., tn ) is called a firing sequence if and only if Mq [ti)... [tn) Mn. In notation, Mq [N,a) Mn or Mo [a) Mn.
Definition 2.8. For a PT-net (N, Mq), a marking M is said to be reachable if and only if there exists a firing sequence a such that Mq [ct) M. In notation, Mq [N,*) M or Mo [*) M. [N, Mo) or [Mq) represents the set of all reachable markings of (N, Mo).
Definition 2.9. Let N = < P, T, F, W ) be a PT-net, where P = { pi, p2, ..., Pm } and T = { ti, t2, ..., tn }. The incidence matrix of N is an m x n matrix V whose typical entry Vy = W(pi,tj) - W(tj,Pi) represents the change in number of tokens in pi after firing tj once, for i = 1, 2, ..., m and j = 1, 2, ..., n.
Liveness, boundedness, safeness, reversibility and conservativeness are well known properties of Petri nets. Liveness implies deadlock freeness. Boundedness refers to the property that the system is free from any potential capacity overflow. Safeness and conservativeness are two special cases of boundedness. Reversibility refers to the capability of a system of being recovered or reinitialised from any reachable state. In general, liveness, boundedness and reversibility collectively characterise a robust or well-behaved system.
Definition 2.10. For a PT-net (N, Mo), a transition t is said to be live if and only if V M e [Mo), 3 M' : M [*) M' [t). (N, Mo) is said to be live if and only if every transition is live.
Definition 2.11. For a PT-net (N, Mo), a place p is said to be k-bounded (or bounded) if and only if V M e [Mo) : M(p) < k, where k > 0. (N, Mo) is said to be k-bounded (or bounded) if and only if every place is k-bounded.
Definition 2.12. A PT-net (N, M0) is said to be safe if and only if every place is 1-bounded.
Definition 2.13. A PT-net (N, Mo) is said to be reversible if and only if V M £ [Mo) : M [*) Mo.
Definition 2.14. A PT-net (N, Mo) is said to be well-behaved if and only if it is live, bounded and reversible.
Definition 2.15. A PT-net N = ( P, T, F, W ) is said to be conservative if and only if there exists a m-vector a > 0 such that aV = 0, where m = | P | and V is the incidence matrix of N.
Figure 3 shows a PT-net (N, M0) which is live, bounded, safe, reversible and conservative.
Ots riDto	P6 Ot,o Ot„
Figure 3. A live, bounded, safe, reversible and conservative PT-net.
3 AMG-structure and its properties
AMG-structure refers to an augmented-marked-graph structure. In the literature, augmented marked graphs are not well known, as compared to other sub-classes of Petri nets such as free-choice nets [33]. However, they possess many special properties pertaining to liveness, boundedness and reversibility. This section introduces augmented marked graphs and their special properties.
Definition 3.1 [34]. An augmented marked graph (N, M0; R) is an ordinary PT-net (N, M0) with a specific subset of places R, satisfying that : (a) Every place in R is marked by M0. (b) The net (N', M0') obtained from (N, Mo; R) by removing the places in R and their associated arcs is a marked graph, (c) For each place r £ R, there exist kr > 1 pairs of transitions D, = { (tsi, tm), (1,2, tia), •••, (tskr, thkr) }, such that r' = { t^i, t,2, t^kr } E T and 'r = { thi, th2, •••, thkr } c T and that, for each (t^i, tu) e D„ there exists in N' an elementary path p^ connecting t^i to tu. (d) In (N', Mo'), eveiy cycle is marked and no Pn is marked.
Figure 4 shows an augmented marked graph (N, Mo; R), where R = { r1, r2 }.
Definition 3.3. For a PT-net (N, Mo), a set of places S is called a siphon if and only if'S c S'. S is said to be minimal if and only if there does not exist any siphon S' in N such that S' c S. S is said to be empty at a marking M e [Mo) if and only if S contains no places which are marked by M.
Definition 3.4. For a PT-net (N, Mo), a set of places Q is called a trap if and only if Q' c 'Q. Q is said to be maximal if and only if there does not exist any trap Q' in N such that Q c Q'. Q is said to be marked at a marking M e [Mo) if and only if Q contains at least one place which is marked by M.
Property 3.1 [34]. An augmented marked graph is live and reversible if and only if it does not contain any potential deadlock. (Note : A potential deadlock is a siphon which would eventually become empty.)
Definition 3.5. For an augmented marked graph (N, M0; R), a minimal siphon is called an R-siphon if and only if it contains at least one place in R.
Property 3.2 [35, 36]. An augmented marked graph (N, M0; R) is live and reversible if and only if no R-siphons eventually become empty.
Property 3.3 [34, 35, 36]. An augmented marked graph (N, M0; R) is live and reversible if every R-siphon contains a marked trap.
For the augmented marked graph (N, M0; R) shown in Figure 4, each R-siphon contains a marked trap. (N, M0; R) is live and reversible.
Definition 3.6 [37]. Suppose an augmented marked graph (N, Mo; R) is transformed into a PT-net (N', Mo') as follows. For each r £ R, where D, = { (t^i, tm), (1,2, thi), (tskr, W }, replace r with a set of places { qi, q2, ..., Qkr }, such that Mo'[qi] = Mo[r] and Qi' = {1,1} and 'Qi = { thi } for i = 1, 2, ..., kr. (N', M0') is called the R-transform of (N, M0; R).
Property 3.4 [37]. Let (N', M0') be the R-transform of an augmented marked graph (N, M0; R). (N, M0; R) is bounded and conservative if and only if every place in (N', M0') belongs to a cycle.
Figure 5 shows the R-transform (N', M0') of the augmented marked graph (N, M0; R) in Figure 4. (N', M0') is bounded, where every place belongs to a cycle. (N, M0; R) is bounded and conservative.
Figure 4. An augmented marked graph.
Definition 3.2. Let (N, Mo) be a PT-net, where R = { ri, r2, ..., rk } is the set of marked places such that I'ri | > 0 and I r; I > 0 for i = 1, 2, ..., k. (N, Mo) is said to be of an AMG-structure if and only if (N, M0; R) is an augmented marked graph.
Figure 5. The R-transform of the augmented marked graph in Figure 4.
pi
4 Specifying object interaction scenarios as labelled nets
In this section, we show how an object interaction scenario can be formally specified as a labelled net with an AMG-structure (Step 1 of our proposed method).
A labelled net is a Petri net, where labels are assigned to places and transitions. Usually, places are labelled by conditions to denote specific system substates where the conditions hold, and transitions by events to denote specific occurrences of the events.
Definition 4.1. A labelled Petri net (or labelled net) is a 7-tuple N = < P, T, F, C, E, Lp, Lt ), where < P, T, F ) is an ordinary PT-net, C is a set of condition labels, E is a set of event labels, Lp : P ^ C is a function for assigning a condition label to every place, and Lt : T ^ E is a function for assigning an event label to every transition.
Definition 4.2. Let N = < P, T, F, C, E, Lp, Lt ) be a labelled net. A place p is said to be uniquely labelled in N if and only if V p' £ P : (Lp(p') = Lp(p)) ^ (p' = p). A transition t is said to be uniquely labelled in N if and only if V t' e T : (Lt(t') = Lt(t)) ^ (f = t). N is said to be uniquely labelled if and only if all places and transitions are uniquely labelled.
Figure 6 shows a typical labelled net. Places ps, p4, p5, p6, p9 and p10 are uniquely labelled, whereas p1, p2, p7 and p8 are not, as for example, condition label ci appears in p1 and p7, and c2 in p2 and p8. Transitions t3, t4 and t5 are uniquely labelled, whereas t1, t2, t6 and t7 are not, as for example, event label e1 appears in t1 and te, and e2 in t2 and tv. Therefore, the labelled net is not uniquely labelled.
Figure 6. A labelled net which is not uniquely labelled.
For an object interaction scenario specified as a labelled net, the location where an event occurs is represented by a transition and the location of a condition by a place. The semantic meanings of conditions and events are denoted by the labels of the places and transitions respectively. For an event to occur, some conditions must be fulfilled in advance and some afterwards. These pre-conditions and post-conditions are represented by the pre-set and post-set of the transition representing the event.
Step 1 of the proposed method is to specify the given object interaction scenarios as labelled nets with an AMG-structure. Consider an object-oriented system involving two objects, x and y, of classes X and Y respectively. There are three typical interaction scenarios exhibited by x and y, specified as UML sequence diagrams and collaboration diagrams (BRJ99, RJB99) in Figure 7. In conventional UML sequence diagrams and collaboration diagrams, there are no formal notations for denoting the pre-condition and post-condition of each event occurrence in an object interaction scenario. Therefore, for an explicit representation of the causal relationship between events and conditions, appropriate condition labels are appended to these diagrams.
Scenario 1 :
[x:
c11 c12
c13
c11
X E
. e1
e2
Y
c21 c22
c23 f)e3
c24
4 : e4
Scenario 2 :
Scenario 3 :
x :	X 1		|y	Y 1
cii		e5		c2i
ci4	e6 .			c22
c15				c25
	Pe7			;2)e3
ci6		e8		c24
cii				c2i
x :	X 1		|y	Y 1
cii		e9		c2i
ci4	ei ^			c26
ci7				
	Pe7			c27
ci6		ei		
cii				c2i
1 : e5
2 : e6 J 4 : e
3.1 : e3
3 : e7
1 : e9 I
x	: X
	2 : e
y	: Y
Figure 7. Object interaction scenarios in UML sequence diagrams and collaboration diagram.
Figure 8 shows object interaction Scenarios 1, 2 and 3, specified as labelled nets (N1, M10), (N2, M20) and (N3, M30) respectively. They all are of AMG-structure.
(N1, M10) is constructed for representing scenario 1 as follows. For each location of a condition, a new place with a proper condition label is created. For example, p11 denotes a location of condition c11 for object x, so condition label x.c11 is assigned to p11. For each event occurrence, a new transition with a proper event label is constructed. For example, tu denotes an occurrence of event e1, so event label e1 is assigned to tu. The event occurrence has a pre-condition x.Cii and a post-condition x.Ci2. Hence, 'tn = { Pn } and tn* = { pi2 }. Arcs between p11 (pre-condition) and t11 and between t11 and p12 (post-condition) are appended for denoting their causal relationships. The rest locations of conditions and events are created accordingly. Following the same rules, (N2, M20) and (N3, M30) are constructed for representing scenarios 2 and 3, respectively.
1 : e
e
c
y.C21 lp25
Consider the labelled nets (N1, M10), (N2, M20) and (N3, M30) in Figure 8. Places pn in (Nj, Mj0), p2j in (N2, M20) and p31 in (N3, M30) refer to the same condition x.cu. Also, places pJ5 in (NJ, MJo), p24 in (N2, M20) and p34 in (N3, M30) refer to the same condition y.C2J. Hence, p11, p21 and p31 are fused into one place p41, and p15, p24 and p34 into p42.
Figure 9 then shows the integrated net (N, M0) obtained after synthesising (NJ, MJ0), (N2, M20) and (N3, M30). (N, M0) is of an AMG-structure, meaning that it is structurally an augmented marked graph (N, M0; R), where R = { p4J, p42 }.
(N3, M30)
Figure 8. Labelled nets representing the object interaction scenarios in Figure 7.
5 Synthesising and analysing the integrated system
After specifying the object interaction scenarios as augmented marked graphs (Step 1 of the proposed method), we synthesise these scenarios into an integrated system. In principle, a scenario portrays partial system behaviours of how the objects are interacted in order to perform a specific functionality. These augmented marked graphs are essentially partial system behavioural specifications which are to be synthesised together to form a single coherent whole.
This section describes Step 2 of our proposed method - the synthesis of labelled nets into an integrated net which represents the integrated system, and analysis of the system. The synthesis is based on the authors' earlier work on use-case-driven system design [38]. It is made by fusing those places with refer to the same system initial state or condition. The integrated net so obtained is of AMG-structure, so its liveness, boundedness, reversibility and conservativeness can be effectively analysed by making use of the special properties of augmented marked graphs.
Figure 9. The integrated net obtained by synthesising the labelled nets in Figure 8.
For (N, M0; R), every R-siphon contains a marked place, and hence, would never become empty. According to Properties 3.2 and 3.3, (N, M0; R) is live and reversible. Since every place in its R-transform is covered by cycles, according to Property 3.4, (N, M0; R) is also bounded and conservative. Therefore, it can be concluded that the integrated system is well-behaved.
6 Eliminating duplicate labels from the integrated net
Consolidating the object interaction scenarios, the integrated net obtained from Step 2 of the proposed method serves to represent the system as a coherent integrated whole. In general, this integrated net is not necessarily uniquely labelled. For the integrated net (N, M0) in Figure 9 for example, places pJ5 and p26 have the same condition label y.c22, and transitions tJ3 and t24 have the same event label e3. This reflects the fact that the locations or conditions for occurrence of the same event may be different at different moments within a scenario or among different scenarios. Yet, every condition is eventually implemented as a unique system substate and every event as a unique operation. Therefore, in order for the integrated net to be effectively used for implementation purposes, it need to be uniquely labelled where all the duplicate condition labels and duplicate event labels are eliminated.
The elimination cannot be done by just fusing places with the same condition label, and transitions with the same event label. This is because the resulting net may exhibit firing sequences different from the original ones. In other words, the system behaviours may be distorted. Step 3 of the proposed method is to eliminate all duplicate labels while preserving the original firing sequences (event sequences). This section describes this step in details.
Definition 6.1. Let S be a uniquely labelled subnet of a labelled net N. The pattern of S in N, denoted as Patt(N, S), is a condition-event net with an identical structure and label allocation as S while ignoring the identities of places and transitions of S.
Definition 6.2. Let L^ and Ly be patterns of subnets in a labelled net. Lx u Ly and Lx n Ly denote the union and intersection of Lx and Ly, respectively. Lx \ Ly denotes the displacement of Lx from Ly. Lx and Ly are said to be disjoint if and only if Lx n Ly = 0.
Definition 6.3. For a labelled net N, a uniquely labelled subnet S is called a common subnet if and only if there exists at least one uniquely labelled subnet S' such that S' S and Patt(N, S') = Patt(N, S). Let S be a pattern of the common subnets in N. [N, L] = { S | Patt(N, S) = L } represents the group of common subnets having the same pattern L.
Definition 6.4. For a subnet S = ( P', T', F' ) of a PT-net, Pre(S) = ('P'yX') u ('T'VP') is called the pre-set of S, Post(S) = (P"\T') u (T"\P') is called the post-set of S, Head(S) = Pre(S)' n (P' u T') is called the head of S, and Tail(S) = 'PostCS) n (P' u T') is called the tail of S.
Definition 6.5. A subnet S of a PT-net N = < P, T, F ) is said to be of PP-type if and only if Head(S) c P and Tail(S) c P.
Figure 10 shows a uniquely labelled subnet S which is PP-type. Figure 11 shows the pattern of S.
Figure 10. A uniquely labelled subnet S of a labelled net.
Figure 11. Pattern of S of the labelled net in Figure 10.
We propose to eliminate duplicate labels by fusion of common subnets, as outlined below.
Identify groups of common subnets for fusion. These groups of common subnets need to be maximal and disjoint for two reasons. First, the net obtained after the fusion will become uniquely labelled. Second, the number of groups of common subnets for fusion can be reduced to minimum as they are maximal.
Transformation of common subnets. For preservation of firing sequences, common subnets are transformed before fusion. Based on coloured Petri nets [39], a unique colour is assigned to each common subnet as colour labels of its ingoing and outgoing arcs. A token flowing into the common subnet is coloured according to the colour label of the ingoing arc. Its colour is reset as it flows out via the same colour-labelled outgoing arc. Besides, the subnets are converted to PP-type.
Fusion of transformed common subnets. The transformed common subnets of each group are fused into a single subnet. A uniquely labelled net is ultimately obtained.
The following algorithm formally describes the elimination process. A detailed elaboration of the elimination process can be found in the authors' earlier work [40].
Elimination of Duplicate Labels from a Labelled Net
1.	Identify maximal disjoint groups of common subnets :
1.1	Find all possible common subnets from N. Let 3 = { Li, L2, ..., Ln} be their patterns.
1.2	Retain only the maximal patterns : Remove any U from 3 if there exists Lj e 3 such that Li is a sub-pattern of Lj and V Si e [N, L], 3 Sj e [N, Lj] : Si is a subnet of Sj.
1.3	Make the overlapping patterns disjoint : For every L, Lj e 3 such that L Lj and L and Lj are not disjoint, set 3 = (3 - { L, Lj}) u { L n Lj} u { Li\Lj} u { Lj\L }.
1.4	Categorise the common subnets of N into groups {[N, L], L e 3 }.
2.	For each group of common subnets [N, L] :
2.1	Convert each subnet S e [N, L] if S is not of PP-type :
2.1.1	For each transition ti e Head(S) : (a) Create dummy transition ti' with unique label Si, dummy place Pi' with label (pi, and arcs (ti', Pi') and (pi', ti), (b) For each p e "ti : Remove arc (p, ti), and then create arc (p, ti'), (c) Re-define S by including pi' and (pi', ti).
2.1.2	For each transition tj e Tail(S) : (a) Create dummy transition tj' with unique label Sj, dummy place p' with label (pj, and arcs (tj, p') and (pj', tj'). (b) For each p e tj" : Remove arc (tj, p), and then create arc (tj', p). (c) Re-define S by including pj' and (tj, pj').
2.2	Assign a unique colour label k for each subnet S e [N, L] :
2.2.1	For each arc (ti, Pi) where ti e Pre(S) and p e Head(S) : Assign colour label k to (ti, pi).
2.2.2	For each arc (p, tj) where p e Tail(S) and tj e Post(S) : Assign colour label k to (pj, tj).
2.3	Fuse the common subnets in [N, L] into one single subnet.
We apply the algorithm for eliminating the duplicate labels for the integrated net (N, M0) in Figure 9. Figure 12 shows the uniquely labelled net (N', M0') so obtained.
eDt2, 139:1 ts, [7Jt„
ft,, Ledt22 Le^t32 npt,2
P4, LeJt,2 □□t43 Ebt44 □ht4,	[ijT] t32	P42
V'^Op«	(<P, )p45 ^
(Nx Mxo)
t^ö^,, I t34 I	I 88 I t25
(Ny Myo)
Figure 12. The uniquely labelled net obtained after eliminating duplicate labels from the integrated net in Figure 9.
7	Obtaining object-Based behavioural specifications
In this section, we show Step 4 of our proposed method -to obtain the individual object-based behavioural specifications. These individual object-based behavioural specifications are obtained by projecting the integrated net onto individual objects.
The projection is made by ignoring those places, transitions and arcs which are irrelevant to the object concerned. The projected net so obtained serves as the object behavioural specifications.
Consider the integrated net (N', M0') in Figure 12. The projection onto object x is obtained as follows. We keep those places with object label x (including dummy places) and those transitions (including dummy transitions) having at least one input place or output place labelled by x, as well as their associated arcs. Similarly, for the projection onto object y, we keep those places with object label y (including dummy places) and those transitions (including dummy transitions) having at least one input place or output place labelled by y, as well as their associated arcs.
Figure 13 shows the projections (Nx, Mx0) and (Ny, My0) obtained by projecting the net (N', M0') in Figure 12 onto objects x and y, respectively. (Nx, Mx0) and (Ny, My0) are uniquely labelled, simply because (N', M0') is uniquely labelled. They serve as the behavioural specifications for objects x and y, where conditions and events are uniquely represented.
8	Real-life example
This section presents a real-life example to further illustrate the refinement process.
Figure 13. The nets obtained by projecting the integrated net in Figure 12 onto objects x and y.
The real-life example is an Office Access Control System. The system is briefly described as follows. It is a system used in a company for controlling staff accesses to its 30+ offices and laboratories. Among these offices and laboratories, some can be accessed by all staff while some others by authorised staff only and/or during specified time periods only. For controlling the staff access, every entrance is implemented with a card-reader, an emergency switch and an electronic lock, all being connected to a centralised server. The server maintains the access privileges and validates every access to the offices/laboratories. There are three typical cases for each request for access.
Authorised access (Ui). A staff member wants to access an office/laboratory. He/She presents his/her staff card via a card-reader. Access is granted. The door is unlocked for five seconds and then re-locked.
Unauthorised access (U2). A staff member wants to access an office/laboratory. He/She presents his/her staff card via a card-reader. Access is not granted. The door is locked.
Emergency access (Us). A staff member wants to access an office/laboratory for emergency. He/She presses the emergency key. The door is unlocked immediately, until it is reset by a security officer.
From the object-oriented perspectives, the server (s : Server) and doors (d : Door) are objects of the Office Access Control System. They are interacting with each other in order to perform the above system functionalities. There are three object interaction scenarios, corresponding to Ui, U2 and U3.
Figure 14 shows these object interaction scenarios specified as UML sequence diagrams and collaboration diagrams, where appropriate condition labels are appended for denoting the pre-conditions and postconditions for each event occurrence.
Scenario U, : |s : Server I I d : Door
cii	- ei	c2i
ci2	e2 ,	c22
ci3	( e3	c23
ci4 cii	^e4	c2i
4 : e4
i : ei
Scenario U2 : s :
|d : Door |
3 : e,
cii	, ei	c2i
ci2	e5 )	c22
ci5	Pe6	c2i
cii		
60
s : Server
i : ei
Scenario U3 : s
: Server | |d : Door
e7

3 : e9
O
s : Server
i : e7
2 : e8
d : Door
Legends for condition labels :
cii ci2 ci3 ci4 ci5 ci6 ci7 c2i c22 c23 c24
Server is ready
Server is processing access request
Server is waiting for re-lock
Server is writing log (successful access)
Server is writing log (unsuccessful access)
Server is waiting for emergency reset
Server is writing log (emergency access)
Door is locked
Door is waiting for response
Door is unlocked (successful access)
Door is unlocked (emergency access)
Legends for event labels :
ei	Request for access is received
e2	Access is granted
es	Time expires after access granted
e4	Successful access is committed
e5	Access is not granted
e6	Unsuccessful access is committed
e7	Request for emergency access is received
e8	Door is reset to normal
e9	Emergency access is committed
Figure 14. Object interaction scenarios specified as UML sequence diagrams and collaboration diagrams (the Office Access Control System).
Step 1 of the proposed method is to specify object interaction scenarios as labelled nets. Figure 15 shows
the labelled nets (N1, M10), (N2, M20) and (N3, M30) representing the object interaction scenarios for U1, U2 and U3, respectively.
Step 2 of the proposed method is to synthesise the labelled net into an integrated system, and analyse the system on its liveness, boundedness, reversibility and conservativeness. (N1, M10), (N2, M20) and (N3, M30) are synthesised into an integrated net (N, M0) by fusing those places which refer to the same system initial states or conditions : Places p11, p21 and p31 are fused into one place p41, and p15, p24 and p34 into p42. Figure 16 shows the integrated net (N, M0) so obtained.
p24
(N2, M20)
(N3, M30)
Figure 15. Labelled nets representing the object interaction scenarios in Figure 14.
Figure 16. The integrated net obtained by synthesising the labelled nets in Figure 15.
The integrated net (N, M0) is of an AMG-structure. Let R = { p41, p42 }. For (N, M0; R), every R-siphon contains a marked place and hence would never become empty. According to Properties 3.2 and 3.3, (N, M0; R) is live and reversible. Since every place in its R-transform is covered by cycles, according to Property 3.4, (N, M0; R) is also bounded and conservative. Therefore, it may be concluded that the Office Access Control System is well-behaved.
3 : e
2 : e
d : Door
c2i
As shown in Figure 16, (N, M0) is not uniquely labelled as it contains duplicate labels. For example, place p12 and p22 have the same condition label s.c12 and transitions tu and t21 have the same event label e1. Since every condition is eventually implemented as a unique substate and every event as a unique operation, in order for the integrated net to be effectively used for implementation purposes, these duplicate labels must be eliminated.
Step 3 of the proposed method is to eliminate duplicate condition labels and duplicate event labels from the integrated net (N, M0) through the fusion of common subnets. We perform this elimination process by applying the algorithm described in Section 6. Figure 17 shows the uniquely labelled net (N', M0') so obtained.
(Ns Mso)
(Nd, Mdo)
Figure 17. The uniquely labelled net obtained after eliminating duplicate labels from the integrated net in Figure 16.
Step 4 of the proposed method is to obtain the individual object-based behavioural specification as projections of the integrated net onto the objects. The projection is made by ignoring those places, transitions and arcs which are irrelevant to the object concerned.
Consider the integrated net (N', M0') in Figure 17. For the projection onto object s (the server object), we keep those places with object label s (including dummy places) and those transitions (including dummy transitions) having at least one input place or output place labelled by s, as well as their associated arcs. Similarly, for the projection onto object d (the door object), we keep those places with object label d (including dummy places) and those transitions (including dummy transitions) having at least one input place or output place labelled by d, as well as their associated arcs. Figure 18 shows the projections (Ns, Ms0) and (Nd, Md0) obtained by projecting the integrated net (N', M0') in Figure 17 onto objects s and d, respectively.
Figure 18. The nets obtained by projecting the integrated net in Figure 17 onto objects s and d.
As the integrated net (N', M0') is uniquely labelled, its projections (Ns, Ms0) and (Nd, Md0) are also uniquely labelled, where every condition or event is uniquely represented. (Ns, Ms0) and (Nd, Md0) then serve as the behavioural specifications for the server (s : Server) and door (d : Door) objects, respectively.
9 Conclusion
One of the most difficult tasks in object-oriented system design is to obtain individual object-based behavioural specifications from a given set of object interaction scenarios. Not only conventional specification constructs for object interaction scenarios are too primitive to represent the partial ordering of events and the causal relationship between the events and conditions, there also involves different abstractions between intra-object lifecycle and inter-object interaction. Moreover, we have to ensure that the derived object-based behavioural specifications reflect exactly the given object interaction scenarios and that the system is well-behaved.
We proposed a Petri-net-based method for refining a given set of object interaction scenarios into individual object-based behavioural specifications. By specifying the object interaction scenarios as labelled nets with an AMG-structure and synthesising them into an integrated net, we analyse the system, based on the special properties of augmented marked graphs. For unique representation of events and conditions, an algorithm is applied to the integrated net to eliminate duplicate condition labels and event labels while preserving the event sequences. Object-based behavioural specifications are then obtained as projections of the integrated nets onto the objects. The whole refinement process has been described, elaborated and illustrated using the Office Access Control System example.
The proposed method offers a number of distinctive features. First, object interaction scenarios are formally specified as labelled nets which are unambiguous and semantically rich for an explicit representation of events, conditions and their causal relationships. Second, object-based behavioural specifications are rigorously derived from the given object interaction scenarios through systematic synthesis and projection. The behavioural specifications so obtained would reflect exactly the given object interaction scenarios. Third, liveness, boundedness, reversibility and conservativeness of the system can be effectively analysed by making use of the special properties of augmented marked graphs. Fourth, every event or condition is uniquely represented in the behavioural specifications so that the specifications can be readily used for implementation purposes.
With a strong theoretical foundation of Petri nets, the proposed method can be effectively used for refining object-based behavioural specifications from a set object interaction scenarios. It resolves a number of problems perplexing the designers of object-oriented systems, such as the lack of formality in specifying object interaction scenarios and the difficulty of ensuring the correctness of object behavioural specifications. The latter is especially important for systems involving shared resources, where erroneous situations such as deadlock and capacity overflow are easily induced. The proposed method can be implemented as tool to support object-oriented system design. By capturing the functional requirements of a system as a set of object interaction scenarios, it helps perform rigorous system synthesis and analysis. The correctness of this refinement can be assured. Moreover, the object-based behavioural specifications so obtained can be readily used for code generation. This inevitably contributes towards a smooth transitions from functional requirements through design to implementation for object-oriented system development.
References
[1]	J. Iivari (1995), "Object Orientation as Structural, Functional and Behavioural Modelling : A Comparison of Six Methods for Object-Oriented Analysis", Information and Software Technology, Vol. 37, No. 3, pp. 155-163.
[2]	K.S. Cheung and K.O. Chow (1996), "Comparison of Object-Oriented Models by Views and Abstraction Constructs", Proceedings of the International Conference on Intelligent Technologies in Human-Related Sciences, pp. 335342, Leon, Spain.
[3]	K.O. Chow and S. Yeung (1996), "Behavioural Modelling in Object-Oriented Methodologies", Information and Software Technology, Vol. 38, No. 10, pp. 657-666.
[4]	K.S. Cheung, K.O. Chow and T.Y. Cheung (1997), "A Feature-Based Approach for Consistent Object-Oriented Requirements Specifications". W.G. Wojtkowshi et al. (eds.), Systems Development Methods for the Next Century, pp. 31-38, Plenum Publishing.
[5]	K.S. Cheung, K.O. Chow and T.Y. Cheung (1999), "Extending Formal Specification to Object Oriented Models Through Level-View Structured Schemas". J. Chen, J. Lu and B. Meyer (eds.), Technology of Object Oriented Languages and Systems, Vol. 31, pp. 118-125, IEEE Computer Society Press.
[6]	B. Breu et al. (1998), "Systems, Views and Models of UML". M. Schader and A. Korthaus (eds.), The Unified Modeling Language : Technical Aspects and Applications, Physica-Verlag.
[7]	G. Booch, J. Rumbaugh and I. Jacobson (1999), The Unified Modeling Language : User Guide, Addison-Wesley.
[8]	J. Rumbaugh, I. Jacobson and G. Booch (1999), The Unified Modeling Language : Reference Manual, Addison-Wesley.
[9]	I. Graham et al. (2001), Object-Oriented Methods : Principles and Practice, Addison-Wesley.
[10]	I. Jacobson et al. (1992), Object-Oriented Software Engineering : A Use-Case-Driven Approach, Addison-Wesley.
[11]	I. Jacobson, G. Booch and J. Rumbaugh (1999), The Unified Software Development Process, Addition Wesley.
[12]	G. Schneider and J.P. Winters (1998), Applying Use Cases, Addison-Wesley.
[13]	P. Kruchten (1999), The Rational Unified Process : An Introduction, Addison-Wesley.
[14]	D. Rosenberg (1999), Use Case Driven Object Modeling with UML : A Practical Approach, Addison-Wesley.
[15]	D. Rosenberg and K. Scott (2001), Applying Use Case Driven Object Modeling with UML, Addison-Wesley.
[16]	J. Arlow and I. Neustadt (2002), UML and the Unified Process : Practical Object-Oriented Analysis and Design, Addison-Wesley.
[17]	S. Kirani and W.T. Tsai (1994), "Method Sequence Specification and Verification of Classes", Journal of Object-Oriented Programming, Vol. 7, No. 6, pp. 28-38.
[18]	K.S. Cheung, K.O. Chow and T.Y. Cheung (1998), "Consistency Analysis on Lifecycle Model and Interaction Model". C. Rolland and G. Grosz (eds.), Object-Oriented Information Systems, pp. 427-441, Springer.
[19]	K.S. Cheung, K.O. Chow and T.Y. Cheung (1998), "Deriving Scenarios of Object Interaction through Petri Nets". J. Chen et al. (eds.), Technology of Object Oriented Languages and Systems, Vol. 27, pp. 118-125, IEEE Computer Society Press.
[20]	M. Glinz (2000), "A Lightweight Approach to Consistency of Scenarios and Class Models", Proceedings of the IEEE International Conference on Requirements Engineering, pp. 49-58, IEEE Computer Society Press.
[21]	F. Bordeleau and R.J.A. Buhr (1997), "UCM-ROOM Modelling : From Use Case Maps to Communicating State Machines", Proceedings of the IEEE International Symposium and Workshop
on Engineering of Computer-Based Systems, pp. 169-178, IEEE Computer Society Press.
[22]	F. Bordeleau, J.P. Corriveau and B. Selic (2000), "A Scenario-Based Approach to Hierarchical State Machine Design", Proceedings of the International Symposium on Object-Oriented Real-Time Distributed Computing, pp. 78-85, IEEE Computer Society Press.
[23]	B. Dano, H. Briand and F. BaTbier (1996), "Progressing Towards Object-Oriented Requirements Specifications Using the Use Case Concept", Proceedings of the IEEE Symposium and Workshop on Engineering of Computer-Based Systems, pp. 450-456, IEEE Computer Society Press.
[24]	B. Dano, H. Briand and F. Barbier (1997), "An Approach Based on the Concept of Use Case to Produce Dynamic Object-Oriented Specifications", Proceedings of the IEEE International Symposium on Requirements Engineering, pp. 54-64, IEEE Computer Society Press.
[25]	P. Graubmann (1988), "The Construction of EN Systems from a Given Trace Behaviour", Advances in Petri Nets, Lecture Notes in Computer Science, Vol. 340, pp. 133-153, Springer-Verlag.
[26]	E. Smith (1991), "On Net Systems Generated by Process Foldings", Advances in Petri Nets, Lecture Notes in Computer Science, Vol. 524, pp. 253-295, Springer-Verlag.
[27]	K. Hiraishi (1992), "Construction of a Class of Safe Petri Nets by Presenting Firing Sequences", Application and Theory of Petri Nets, Lecture Notes in Computer Science, Vol. 616, pp. 244-262, Springer-Verlag.
[28]	W.J. Lee, S.D. Cha and Y.R. Kwon (1998), "Integration and Analysis of Use Cases Using Modular Petri Nets in Requirement Engineering", IEEE Transactions on Software Engineering, Vol. 24, No. 12, pp. 1115-1130.
[29]	J.L. Peterson (1981), Petri Net Theory and the Modelling of System, Prentice Hall.
[30]	W. Reisig (1985), Petri Nets : An Introduction, Springer-Verlag.
[31]	T. Murata (1989), "Petri Nets : Properties, Analysis and Applications", Proceedings of the IEEE, Vol. 77, No. 4, pp. 541-580.
[32]	J. Desel and W. Reisig (1998), "Place Transition Petri Nets", Lectures on Petri Nets 1 : Basic Models, Lecture Notes in Computer Science, Vol. 1491, pp. 122-173, Springer-Verlag.
[33]	J. Desel and J. Esparza (1995), Free-choice Petri Nets, Cambridge University Press.
[34]	F. Chu and X. Xie (1997), "Deadlock Analysis of Petri Nets Using Siphons and Mathematical Programming", IEEE Transactions on Robotics and Automation, Vol. 13, No. 6, pp. 793-804.
[35]	K.S. Cheung (2004), "New Characterisations for Live and Reversible Augmented Marked Graphs", Information Processing Letters, Vol. 92, No. 5, pp. 239-243.
[36]	K.S. Cheung and K.O. Chow (2005), "Cycle Inclusion Property of Augmented Marked Graphs", Information Processing Letters, Vol. 94, No. 6, pp. 271-276.
[37]	K.S. Cheung (2007), "Liveness and Boundedness of Augmented Marked Graphs", IMA Journal of Mathematical Control and Information, Vol. 24, No. 2, pp. 235-244.
[38]	K.S. Cheung, T.Y. Cheung and K.O. Chow (2006), "A Petri-Net-Based Synthesis Methodology for Use-Case-Driven System Design", Journal of Systems and Software, Vol. 79, No. 6, pp. 772-790.
[39]	K. Jensen (1986), "Coloured Petri Nets", Petri Nets : Central Models and Their Properties, Lecture Notes in Computer Science, Vol. 254, pp. 248-299, Springer-Verlag.
[40]	K.S. Cheung and K.O. Chow (2006), "Elimination of Duplicate Labels in Petri-Net-Based System Specification", Proceedings of the International Conference on Computer and Information Technology, pp. 932-936, IEEE Computer Society Press.
A Multi-class SVM Classifier Utilizing Binary Decision Tree
Gjorgji Madzarov, Dejan Gjorgjevikj and Ivan Chorbev Department of Computer Science and Engineering Faculty of Electrical Engineering and Information Technology Karpos 2 b.b., 1000 Skopje, Macedonia E-mail: madzarovg@feit.ukim.edu.mk
Keywords: Support Vector Machine, multi-class classification, clustering, binary decision tree architecture Received: July 27, 2008
In this paper a novel architecture of Support Vector Machine classifiers utilizing binary decision tree (SVM-BDT) for solving multiclass problems is presented. The hierarchy of binary decision subtasks using SVMs is designed with a clustering algorithm. For consistency between the clustering model and SVM, the clustering model utilizes distance measures at the kernel space, rather than at the input space. The proposed SVM based Binary Decision Tree architecture takes advantage of both the efficient computation of the decision tree architecture and the high classification accuracy of SVMs. The SVM-BDT architecture was designed to provide superior multi-class classification performance. Its performance was measured on samples from MNIST, Pendigit, Optdigit and Statlog databases of handwritten digits and letters. The results of the experiments indicate that while maintaining comparable or offering better accuracy with other SVM based approaches, ensembles of trees (Bagging and Random Forest) and neural network, the training phase of SVM-BDT is faster. During recognition phase, due to its logarithmic complexity, SVM-BDT is much faster than the widely used multi-class SVM methods like "one-against-one" and "one-against-all", for multiclass problems. Furthermore, the experiments showed that the proposed method becomes more favourable as the number of classes in the recognition problem increases.
Povzetek: Predstavljena je metoda gradnje binarnih dreves z uporabo SVM za večrazredne probleme.
1 Introduction
The recent results in pattern recognition have shown that support vector machine (SVM) classifiers often have superior recognition rates in comparison to other classification methods. However, the SVM was originally developed for binary decision problems, and its extension to multi-class problems is not straightforward. How to effectively extend it for solving multi-class classification problem is still an on-going research issue. The popular methods for applying SVMs to multi-class classification problems usually decompose the multi-class problems into several two-class problems that can be addressed directly using several SVMs.
For the readers' convenience, we introduce the SVM briefly in section 2. A brief introduction to several widely used multi-class classification methods that utilize binary SVMs is given in section 3. The Kernelbased clustering introduced to convert the multi-class problem into SVM-based binary decision-tree architecture is explained in section 4. In section 5, we discuss related works and compare SVM-BDT with other multi-class SVM methods via theoretical analysis and empirical estimation. The experimental results in section 6 are presented to compare the performance of the proposed SVM-BDT with traditional multi-class approaches based on SVM, ensemble of decision trees
and neural network. Section 7 gives a conclusion of the paper.
2 Support vector machines for pattern recognition
The support vector machine is originally a binary classification method developed by Vapnik and colleagues at Bell laboratories [1][2], with further algorithm improvements by others [3]. For a binary problem, we have training data points: {x, y,}, i=l,...J , >>,£{-1, 1}, x,eR''. Suppose we have some hyperplane which separates the positive from the negative examples (a "separating hyperplane"). The points x which lie on the hyperplane satisfy w x + b = 0, where w is normal to the hyperplane, |b|/||w|| is the perpendicular distance from the hyperplane to the origin, and ||w||is the Euclidean norm of w. Let d+ (d.) be the shortest distance from the separating hyperplane to the closest positive (negative) example. Define the "margin" of a separating hyperplane to be d++d-. For the linearly separable case, the support vector algorithm simply looks for the separating hyperplane with largest margin. This can be formulated as follows: suppose that all the training data satisfy the following constraints:
Xj-w+èSr+l for jj =+1,	(1)
Xj • w + è < -1 for = -1,	( 2 )
These can be combined into one set of inequalities:

( 3 )
Now consider the points for which the equality in Eq. (1) holds (requiring that there exists such a point) is equivalent to choosing a scale for w and b. These points lie on the hyperplane ff1: x, ■ w + b = 1 with normal w and perpendicular distance from the origin |1-b|/||w||. Similarly, the points for which the equality in Eq. (2) holds lie on the hyperplane H2: x, ■ w + b = -1, with normal again w and perpendicular distance from the origin |-1-b|/||w||. Hence d+ = d- = 1/||w|| and the margin is simply 2/||w||.
origin
margin
Figure 1 - Linear separating hyperplanes for the separable case. The support vectors are circled.
Note that Hi and H2 are parallel (they have the same normal) and that no training points fall between them. Thus we can find the pair of hyperplanes which gives the maximum margin by minimizing ||w||2, subject to constraints (3).
Thus we expect the solution for a typical two dimensional case to have the form shown on Fig. 1. We introduce nonnegative Lagrange multipliers ai, i = 1,..., l, one for each of the inequality constraints (3). Recall that the rule is that for constraints of the form Ci > 0, the constraint equations are multiplied by nonnegative Lagrange multipliers and subtracted from the objective function, to form the Lagrangian. For equality constraints, the Lagrange multipliers are unconstrained. This gives Lagrangian:
r l|l l|2 ^ ^
2
l
w
-oz
a,
( 4 )
^ 1
We must now minimize Lp with respect to w, b, and maximize with respect to all a, at the same time, all subject to the constraints a, > 0 (let's call this particular set of constraints C1). Now this is a convex quadratic programming problem, since the objective function is itself convex, and those points which satisfy the
constraints also form a convex set (any linear constraint defines a convex set, and a set of N simultaneous linear constraints defines the intersection of N convex sets, which is also a convex set). This means that we can equivalently solve the following "dual" problem: maximize Lp, subject to the constraints that the gradient of Lp with respect to w and b vanish, and subject also to the constraints that the a^ > 0 (let's call that particular set of constraints C2). This particular dual formulation of the problem is called the Wolfe dual [4]. It has the property that the maximum of Lp, subject to constraints C2, occurs at the same values of the w, b and a, as the minimum of LP, subject to constraints C1.
Requiring that the gradient of LP with respect to w and b vanish gives the conditions:
i
( 5 ) ( 6 )
Since these are equality constraints in the dual formulation, we can substitute them into Eq. (4) to give
1 '
LD --T.^^if^jyiyj^i
i i,j
( 7 )
Note that we have now given the Lagrangian different labels (P for primal, D for dual) to emphasize that the two formulations are different: LP and LD arise from the same objective function but with different constraints; and the solution is found by minimizing LP or by maximizing LD. Note also that if we formulate the problem with b = 0, which amounts to requiring that all hyperplanes contain the origin, the constraint (6) does not appear. This is a mild restriction for high dimensional spaces, since it amounts to reducing the number of degrees of freedom by one.
Support vector training (for the separable, linear case) therefore amounts to maximizing LD with respect to the a., subject to constraints (6) and positivity of the a., with solution given by (5). Notice that there is a Lagrange multiplier a. for every training point. In the solution, those points for which a. > 0 are called "support vectors", and lie on one of the hyperplanes H1, H2. All other training points have ai = 0 and lie either on H1 or H2 (such that the equality in Eq. (3) holds), or on that side of H1 or H2 such that the strict inequality in Eq. (3) holds. For these machines, the support vectors are the critical elements of the training set. They lie closest to the decision boundary; if all other training points were removed (or moved around, but so as not to cross H1 or H2), and training was repeated, the same separating hyperplane would be found.
The above algorithm for separable data, when applied to non-separable data, will find no feasible solution: this will be evidenced by the objective function (i.e. the dual Lagrangian) growing arbitrarily large. So how can we extend these ideas to handle non-separable data? We would like to relax the constraints (1) and (2), but only
when necessary, that is, we would like to introduce a further cost (i.e. an increase in the primal objective function) for doing so. This can be done by introducing positive slack variables e; i = 1,..., l, in the constraints, which then become:
X, • w + è > +I - e, for = +I,	( 8 )
Xj • w + è < -I + gj for = -I,	( 9 )
e^ > OVi.	( 10 )
Thus, for an error to occur, the corresponding ei must exceed unity, so Siei is an upper bound on the number of training errors. Hence a natural way to assign an extra cost for errors is to change the objective function to be minimized from ||w||2/2 to ||w||2/2 + C(E,e,), where C is a parameter to be chosen by the user, a larger C corresponding to assigning a higher penalty to errors.
How can the above methods be generalized to the case where the decision function (f(x) whose sign represents the class assigned to data point x) is not a linear function of the data? First notice that the only way in which the data appears in the training problem, is in the form of dot products, Xi • Xj. Now suppose we first mapped the data (Figure 2) to some other (possibly even infinite dimensional) Euclidean space H, using a mapping which we will call

( 11 )
K Xi,
x exp
x j
2a'
( 12 )
fix) ^ X>i> è = E =x> è ( 13 )
where the s,- are the support vectors. So again we can avoid computing ®(x) explicitly and use the K(si, x) = ©(s,) • ®(x) instead.
Then of course the training algorithm would only depend on the data through dot products in H, i.e. on functions of the form ©(x,) • ©(xy). Now if there were a "kernel function" K such that K(xi, x,) = ©(x,) • ©(x^), we would only need to use K in the training algorithm, and would never need to explicitly even know what © is. The kernel function has to satisfy Mercer's condition [1].One example for this function is Gaussian:
Figure 2 - General principle of SVM: projection of data in an optimal dimensional space.
3 An overview of widely used multi-class SVM classification methods
Although SVMs were originally designed as binary classifiers, approaches that address a multi-class problem as a single "all-together" optimization problem exist [5], but are computationally much more expensive than solving several binary problems.
A variety of techniques for decomposition of the multi-class problem into several binary problems using Support Vector Machines as binary classifiers have been proposed, and several widely used are given in this section.
In this particular example, H is infinite dimensional, so it would not be very easy to work with © explicitly. However, if one replaces x,- • x, by K(x,, x,) everywhere in the training algorithm, the algorithm will happily produce a support vector machine which lives in an infinite dimensional space, and furthermore do so in roughly the same amount of time it would take to train on the un-mapped data. All the considerations of the previous sections hold, since we are still doing a linear separation, but in a different space. But how can we use this machine? After all, we need w, and that will live in H. But in test phase an SVM is used by computing dot products of a given test point x with w, or more specifically by computing the sign of
N two-class SVM
3.1	One-against-all (OvA)
For the N-class problems (N>2), classifiers are constructed [6]. The i"" SVM is trained while labeling the samples in the i"" class as positive examples and all the rest as negative examples. In the recognition phase, a test example is presented to all N SVMs and is labelled according to the maximum output among the N classifiers. The disadvantage of this method is its training complexity, as the number of training samples is large. Each of the N classifiers is trained using all available samples.
3.2	One-against-one (OvO)
This algorithm constructs N(N-1)/2 two-class classifiers, using all the binary pair-wise combinations of the N classes. Each classifier is trained using the samples of the
2
first class as positive examples and the samples of the second class as negative examples. To combine these classifiers, the Max Wins algorithm is adopted. It finds the resultant class by choosing the class voted by the majority of the classifiers [7]. The number of samples used for training of each one of the OvO classifiers is smaller, since only samples from two of all N classes are taken in consideration. The lower number of samples causes smaller nonlinearity, resulting in shorter training times. The disadvantage of this method is that every test sample has to be presented to large number of classifiers N(N-l)/2. This results in slower testing, especially when the number of the classes in the problem is big [8].
3.3	Directed acyclic graph SVM (DAGSVM)
Introduced by Platt [l] the DAGSVM algorithm for training an N(N-1)/2 classifiers is the same as in one-against-one. In the recognition phase, the algorithm depends on a rooted binary directed acyclic graph to make a decision [9]. DAGSVM creates a model for each pair of classes. When one such model, which is able to separate class c1 from class c2, classifies a certain test example into class ci, it does not really vote "for" class ci, rather it votes "against" class c2, because the example must lie on the other side of the separating hyperplane than most of the class C2 samples. Therefore, from that point onwards the algorithm ignores all the models involving the class c2. This means that after each classification with one of the binary models, one more class can be thrown out as a possible candidate, and after only N-1 steps just one candidate class remains, which therefore becomes the prediction for the current test example. This results in significantly faster testing, while achieving similar recognition rate as One-against-one.
3.4	Binary tree of SVM (BTS)
This method uses multiple SVMs arranged in a binary tree structure [10]. A SVM in each node of the tree is trained using two of the classes. The algorithm then employs probabilistic outputs to measure the similarity between the remaining samples and the two classes used for training. All samples in the node are assigned to the two subnodes derived from the previously selected classes by similarity. This step repeats at every node until each node contains only samples from one class. The main problem that should be considered seriously here is training time, because aside training, one has to test all samples in every node to find out which classes should be assigned to which subnode while building the tree. This may decrease the training performance considerably for huge training datasets.
4 Support vector machines utilizing a binary decision tree
In this paper we propose a binary decision tree architecture that uses SVMs for making the binary decisions in the nodes. The proposed classifier
architecture SVM-BDT (Support Vector Machines utilizing Binary Decision Tree), takes advantage of both the efficient computation of the tree architecture and the high classification accuracy of SVMs. Utilizing this architecture, A^-1 SVMs needed to be trained for an N class problem, but only at most [ì0g2A^ SVMs are required to be consulted to classify a sample. This can lead to a dramatic improvement in recognition speed when addressing problems with big number of classes.
An example of SVM-BDT that solves a 7 - class pattern recognition problem utilizing a binary tree, in which each node makes binary decision using a SVM is shown on Figure 3. The hierarchy of binary decision subtasks should be carefully designed before the training of each SVM classifier.
The recognition of each sample starts at the root of the tree. At each node of the binary tree a decision is being made about the assignment of the input pattern into one of the two possible groups represented by transferring the pattern to the left or to the right sub-tree. Each of these groups may contain multiple classes. This is repeated recursivly downward the tree until the sample reaches a leaf node that represents the class it has been assigned to.
There exist many ways to divide N classes into two groups, and it is critical to have proper grouping for the good performance of SVM-BDT.
For consistency between the clustering model and the way SVM calculates the decision hyperplane, the clustering model utilizes distance measures at the kernel space, rather than at the input space. Because of this, all training samples are mapped into the kernel space with the same kernel function that is to be used in the training phase.
1,2,3,4,5,6,7
Figure 3: Illustration of SVM-BDT.
The SVM-BDT method that we propose is based on recursively dividing the classes in two disjoint groups in every node of the decision tree and training a SVM that will decide in which of the groups the incoming unknown sample should be assigned. The groups are determined by a clustering algorithm according to their class membership.
Let's take a set of samples xj, X2, ..., x^ each one labeled by >-, e {ci, C2, ..., cj^} where N is the number of classes. SVM-BDT method starts with dividing the classes in two disjoint groups gj and g^. This is performed by calculating N gravity centres for the N different classes. Then, the two classes that have the biggest Euclidean distance from each other are assigned to each of the two clustering groups. After this, the class with the smallest Euclidean distance from one of the clustering groups is found and assigned to the corresponding group. The gravity center of this group is then recalculated to represent the addition of the samples of the new class to the group. The process continues by finding the next unassigned class that is closest to either of the clustering groups, assigning it to the corresponding group and updating the group's gravity center, until all classes are assigned to one of the two possible groups.
This defines a grouping of all the classes in two disjoint groups of classes. This grouping is then used to train a SVM classifier in the root node of the decision tree, using the samples of the first group as positive examples and the samples of the second group as negative examples. The classes from the first clustering group are being assigned to the first (left) subtree, while the classes of the second clustering group are being assigned to the (right) second subtree. The process continues recursively (dividing each of the groups into two subgroups applying the procedure explained above), until there is only one class per group which defines a leaf in the decision tree.
Figure 4: SVM-BDT divisions of the seven classes.
For example, Figure 4 illustrates grouping of 7 classes, while Figure 3 shows the corresponding decision tree of SVMs. After calculating the gravity centers for all classes, the classes c2 and c5 are found to be the furthest apart from each other, considering their Euclidean distance and are assigned to group g1 and g2 accordingly. The closest to group g1 is class c3, so it is assigned to the group gl, followed by recalculation of the gi's gravity center. In the next step, class c1 is the closest to group g2, so it is assigned to that group and the group's gravity center is recalculated. In the following iteration, class c7
is assigned to g1 and class c6 is assigned to g2, folowed by recalculating of group's gravity centers. Finally class c4 is assigned to g1. This completes the first round of grouping that defines the classes that will be transferred to the left and the right subtree of the root node. The SVM classifier in the root is trained by considering samples from the classes {c2, cs, c4, cv} as positive examples and samples from the classes {ci, c5, c6} as negative examples.
The grouping procedure is repeated independently for the classes of the left and the right subtree of the root, which results in grouping c7 and c4 in gjj and c2 and c3 in g1,2 in the left node of the tree and c1 and c5 in g2,1 and c6 in g2,2 in the right node of the tree. The concept is repeated for each SVM associated to a node in the taxonomy. This will result in training only N-1 SVMs for solving an N-class problem.
5 Related work and discussion
Various multi-class classification algorithms can be compared by their predictive accuracy and their training and testing times. The training time T for a binary SVM is estimated empirically by a power law [13] stating that where M is the number of training samples and a is a proportionality constant. The parameter (i is a constant, which depends of the datasets and it is typically in the range [1, 2]. According to this law, the estimated training time for OvA is
Tova -NOM
d
( 11 )
where N is the number of classes in the problem.
Without loss of generality, let's assume that each of the N classes has the same number of training samples. Thus, each binary SVM of OvO approach only requires 2M/N samples. Therefore, the training time for OvO is:
T
OvO
i a-
N^-l
2
2M N

The training time for DAGSVM is same as OvO. As for BTS and SVM-BDT, the training time is summed over all the nodes in the fiög2 N levels. In the i'h level, there are 2i-1 nodes and each node uses 2M/N for BTS and M/2i-1 for SVM-BDT training samples. Hence, the total training time for BTS is:
Tsts- y
la
, N )
S «2-1 i=\
rffTogjC'J
N

and for SVM-BDT is:
d
Tsvm-
bdt
^ E «2
i-l
M

'^aW , ( 14)
It must be noted that Tsvm-bdt in our algorithm does not include the time to build the hierarchy structure of the N classes, since it consumes insignificant time compared to the quadratic optimization time that dominates the total SvM training time. On the other hand, in the process of building the tree, BTS requires testing of each trained SVM with all the training samples in order to determine the next step, therefore significantly increasing the total training time.
According to the empirical estimation above, it is evident that the training speed of SVM-BDT is comparable with OvA, OvO, DAGSVM and BTS.
In the testing phase, DAGSVM performs faster than OvO and OvA, since it requires only N-1 binary SVM evaluations. SVM-BDT is even faster than DAGSVM because _the depth of the SVM-BDT decision tree is fiög2 N in the worst case, which is superior to N-1, especially when N>>2.
While testing, the inner product of the sample's feature vector and all the support vectors of the model are calculated for each sample. The total number of support vectors in the trained model directly contributes to the major part of the evaluation time, which was also confirmed by the experiments.
A multistage SVM (MSVM) for multi-class problem has been proposed by Liu et al. [11]. They use Support Vector Clustering (SVC) [12] to divide the training data into two parts that are used to train a binary SVM. For each partition, the same procedure is recursively repeated until the binary SVM gives an exact label of class. An unsolved problem in MSVM is how to control the SVC to divide the training dataset into exact two parts. However, this procedure is painful and unfeasible, especially for large datasets. The training set from one class could belong to both clusters, resulting in decreased predictive accuracy.
There are different approaches for solving multi-class problems which are not based on SVM. Some of them are presented in the following discussion. However, the experimental results clearly show that their classification accuracy is significantly smaller than the SVM based methods.
Ensemble techniques have received considerable attention within the recent machine learning research [16][17][18][19]. The basic goal is to train a diverse set of classifiers for a single learning problem and to vote or average their predictions. The approach is simple as well as powerful, and the obtained accuracy gains often have solid theoretical foundations [20][20][21]. Averaging the predictions of these classifiers helps to reduce the variance and often increases the reliability of the predictions. There are several techniques for obtaining a diverse set of classifiers. The most common technique is to use subsampling to diversify the training sets as in Bagging [21] and Boosting [20]. Other techniques include the use of different feature subsets for every
classifier in the ensemble [23], to exploit the randomness of the base algorithms [24], possibly by artificially randomizing their behavior [25], or to use multiple representations of the domain objects. Finally, classifier diversity can be ensured by modifying the output labels, i.e., by transforming the learning tasks into a collection of related learning tasks that use the same input examples, but different assignments of the class labels. Error-correcting output codes are the most prominent example for this type of ensemble methods [22].
Error-correcting output codes are a popular and powerful class binarization technique. The basic idea is to transform an N-class problem into n binary problems (n > N), where each binary problem uses a subset of the classes as the positive class and the remaining classes as a negative class. As a consequence, each original class is encoded as an n-dimensional binary vector, one dimension for each prediction of a binary problem (+1 for positive and -1 for negative). The resulting matrix of the form {-1, +1} N^n is called the coding matrix. New examples are classified by determining the row in the matrix that is closest to the binary vector obtained by submitting the example to the n classifiers. If the binary problems are chosen in a way that maximizes the distance between the class vectors, the reliability of the classification can be significantly increased. Error-correcting output codes can also be easily parallelized, but each subtask requires the total training set. Similar to binarization, some approaches suggest mapping the original multiple classes into three clsses. A related technique where multi-class problems are mapped to 3-class problems is proposed by Angulo and Catal'a [26]. Like with pairwise classification, they propose generating one training set for each pair of classes. They label the two class values with target values +1 and -1, and additionally, samples of all other classes are labeled to a third class, with a target value of 0. This idea leads to increased size of the training set compared to the binary classification. The mapping into three classes was also used by Kalousis and Theoharis [27] for predicting the most suitable learning algorithm(s) for a given dataset. They trained a nearest-neighbor learner to predict the better algorithm of each pair of learning algorithms. Each of these pairwise problems had three classes: one for each algorithm and a third class named "tie", where both algorithms had similar performances.
Johannes Fürnkranz has investigated the use of round robin binarization (or pair-wise classification) [28] as a technique for handling multi-class problems with separate-and-conquer rule learning algorithms (aka covering algorithms). In particular, round robin binarization helps Ripper [29] outperform C5.0 on multi-class problems, whereas C5.0 outperforms the original version of Ripper on the same problems.
6 Experimental results
In this section, we present the results of our experiments with several multi-class problems. The performance was measured on the problem of recognition of handwritten digits and letters.
d
Here, we compare the results of the proposed SVM-BDT method with the following methods:
1)	one-against-all (OvA);
2)	one-against-one (OvO);
3)	DAGSVM;
4)	BTS;
5)	Bagging
6)	Random Forests
7)	Multilayer Perceptron (MLP, neural network) The training and testing of the SVMs based methods
(OvO, OvA, DAGSVM, BTS and SVM-BDT) was performed using a custom developed application that uses the Torch library [14]. For solving the partial binary classification problems, we used SVMs with Gaussian kernel. In these methods, we had to optimize the values of the kernel parameter a and penalty C. For parameter optimization we used experimental results. The achieved parameter values for the given datasets are given in Table 1.
Table 1. The optimized values for a and C for the used datasets.
	MNIST	Pendigit	Optdigit	Statlog
a	2	60	25	1.1
C	100	100	100	100
We also developed an application that uses the same (Torch) library for the neural network classification. One hidden layer with 25 units was used by the neural network. The number of hidden units was determined experimentally.
The classifications based on ensembles of decision trees [30] (Bagging and Random Forest) was performed by Clus, a popular decision tree learner based on the principles stated by Blockeel et al. [31]. There were 100 models in the ensembles. The pruning method that we used was C4.5. The number of selected features in the Random Forest method was llog2 Mwhere M is the number of features in the dataset.
The most important criterion in evaluating the performance of a classifier is usually its recognition rate, but very often the training and testing time of the classifier are equally important.
In our experiments, four different multi-class classification problems were addressed by each of the eight previously mentioned methods. The training and testing time and the recognition performance were recorded for every method.
The first problem was recognition of isolated handwritten digits (10 classes) from the MNIST database. The MNIST database [15] contains grayscale images of isolated handwritten digits. From each digit image, after performing a slant correction, 40 features were extracted. The features are consisted of 10 horizontal, 8 vertical and 22 diagonal projections [25]. The MNIST database contains 60.000 training samples, and 10.000 testing samples.
The second and the third problem are 10 class problems from the UCI Repository [33] of machine
learning databases: Optdigit and Pendigit. Pendigit has 16 features, 7494 training samples, and 3498 testing samples. Optdigit has 64 features, 3823 training samples, and 1797 testing samples.
The fourth problem was recognition of isolated handwritten letters - a 26-class problem from the Statlog collection [34]. Statlog-letter contains 15.000 training samples, and 5.000 testing samples, where each sample is represented by 16 features.
The classifiers were trained using all available training samples of the set and were evaluated by recognizing all the test samples from the corresponding set. All tests were performed on a personal computer with an Intel Core2Duo processor at 1.86GHz with the Windows XP operating system.
Tables 2 through 4 show the results of the experiments using 8 different approaches (5 approaches based on SVM, two based on ensembles of decision trees and one neural network) on each of the 4 data sets. The first column of each table describes the classification method. Table 2 gives the prediction error rate of each method applied on each of the datasets. Table 3 and table 4 shows the testing and training time of each algorithm, for the datasets, measured in seconds, respectively.
The results in the tables show that SVM based methods outperform the other approaches, in terms of classification accuracy. In terms of speed, SVM based methods are faster, with different ratios for different datasets. In overall, the SVM based algorithms were significantly better compared to the non SVM based methods.
The results in table 2 show that for all datasets, the one-against-all (OvA) method achieved the lowest error rate. For the MNIST, Pendigit and Optdigit datasets, the other SVM based methods (OvO, DAGSVM, BTS and our method - SVM-BDT) achieved higher, but similar error rates. For the recognition of handwritten letters from the Statlog database, the OvO and DAGSVM methods achieved very similar error rates that were about 1.5% higher than the OvA method. The BTS method showed the lowest error rate of all methods using one-against-one SVMs. Our SVM-BDT method achieved better recognition rate than all the methods using one-against-one SVMs, including BTS. Of the non SVM based methods, the Random Forest method achieved the best recognition accuracy for all datasets. The prediction performance of the MLP method was comparable to the Random Forest method for the 10-class problems, but noticeably worse for the 26-class problem.
The MLP method is the fastest one in terms of training and testing time, which is evident in Table 3 and Table 4. The classification methods based on ensembles of trees were the slowest in the training and the testing phase, especially the Bagging method. Overall, the Random Forest method was more accurate than the other non SVM based methods, while the MLP method was the fastest.
The results in Table 3 show that the DAGSVM method achieved the fastest testing time of all the SVM based methods for the MNIST dataset. For the other datasets, the testing time of DAGSVM is comparable
with BTS and SVM-BDT methods and their testing time is noticeably better than the one-against-all (OvA) and one-against-one (OvO) methods. The SVM-BDT method was faster in the recognition phase for the Pendigit dataset and slightly slower than DAGSVM method for the Statlog dataset.
Table 2. The prediction error rate (%) of each method for every dataset
Classifier	MNIST	Pendigit	Optdigit	Statlog
OvA	1.93	1.70	1.17	3.20
OvO	2.43	1.94	1.55	4.72
DAGSVM	2.50	1.97	1.67	4.74
BTS	2.24	1.94	1.51	4.70
SVM-BDT	2.45	1.94	1.61	4.54
R. Forest	3.92	3.72	3.18	4.98
Bagging	4.96	5.38	7.17	8.04
MLP	4.25	3.83	3.84	14.14
Table 3. Testing time of each method for every dataset measured in seconds
Classifier	MNIST	Pendigit	Optdigit	Statlog
OvA	23.56	1.75	1.63	119.50
OvO	26.89	3.63	1.96	160.50
DAGSVM	9.46	0.55	0.68	12.50
BTS	26.89	0.57	0.73	17.20
SVM-BDT	25.33	0.54	0.70	13.10
R. Forest	39.51	3.61	2.76	11.07
Bagging	34.52	2.13	1.70	9.76
MLP	2.12	0.49	0.41	1.10
Table 4. Training time of each method for every dataset measured in seconds
Classifier	MNIST	Pendigit	Optdigit	Statlog
OvA	468.94	4.99	3.94	554.20
OvO	116.96	3.11	2.02	80.90
DAGSVM	116.96	3.11	2.02	80.90
BTS	240.73	5.21	5.65	387.10
SVM-BDT	304.25	1.60	1.59	63.30
R. Forest	542.78	17.08	22.21	50.70
Bagging	3525.31	30.87	49.4	112.75
MLP	45.34	2.20	1.60	10.80
In terms of training speeds, it is evident in Table 4 that among the SVM based methods, SVM-BDT is the fastest one in the training phase. For the three 10-class problems the time needed to train the 10 classifiers for the OvA approach took about 4 times longer than training the 45 classifiers for the OvO and DAGSVM methods. Due to the huge number of training samples in
the MNIST dataset (60000), SVM-BDT's training time was longer compared to other one-against-one SVM methods. The huge number of training samples increases the nonlinearity of the hyperplane in the SVM, resulting in an incresed number of support vectors and increased training time. Also, the delay exists only in the first level of the tree, where the entire training dataset is used for training. In the lower levels, the training time of divided subsets is not as significant as the first level's delay.
In the other 10 class problems, our method achieved the shortest training time. For the Statlog dataset, the time needed for training of the 26 one-against-all SVMs was almost 7 times longer than the time for training the 325 one-against-one SVMs. The BTS method is the slowest one in the training phase of the methods using one-against-one SVMs. It must be noted that as the number of classes in the dataset increases, the advantage of SVM-BDT becomes more evident. The SVM-BDT method was the fastest while training, achieving better recognition rate than the methods using one-against-one SVMs. It was only slightly slower in recognition than DAGSVM.
7 Conclusion
A novel architecture of Support Vector Machine classifiers utilizing binary decision tree (SVM-BDT) for solving multiclass problems was presented. The SVM-BDT architecture was designed to provide superior multi-class classification performance, utilizing a decision tree architecture that requires much less computation for deciding a class for an unknown sample. A clustering algorithm that utilizes distance measures at the kernel space is used to convert the multi-class problem into binary decision tree, in which the binary decisions are made by the SVMs. The results of the experiments show that the speed of training and testing are improved, while keeping comparable or offering better recognition rates than the other SVM multi-class methods. The experiments showed that this method becomes more favourable as the number of classes in the recognition problem increases.
References
[1]	V. Vapnik. The Nature of Statistical Learning Theory, 2nd Ed. Springer, New York, 1999.
[2]	C. J. C. Burges. A tutorial on support vector machine for pattern recognition. Data Min. Knowl. Disc. 2 (1998) 121.
[3]	T. Joachims. Making large scale SVM learning practical. in B. Scholkopf, C. Bruges and A. Smola (eds). Advances in kernel methods-support vector learning, MIT Press, Cambridge, MA, 1998.
[4]	R. Fletcher. Practical Methods of Optimization. 2nd Ed. John Wiley & Sons. Chichester (1987).
[5]	J. Weston, C. Watkins. Multi-class support vector machines. Proceedings of ESANN99, M. Verleysen, Ed., Brussels, Belgium, 1999.
[6]	V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
[7]	J. H. Friedman. Another approach to polychotomous classification. Technical report. Department of Statistics, Stanford University, 1997.
[8]	P. Xu, A. K. Chan. Support vector machine for multi-class signal classification with unbalanced samples. Proceedings of the International Joint Conference on Neural Networks 2003. Portland, pp.1116-1119, 2003.
[9]	Platt, N. Cristianini, J. Shawe-Taylor. Large margin DAGSVM's for multiclass classification. Advances in Neural Information Processing System. Vol. 12, pp. 547-553, 2000.
[10]	B. Fei, J. Liu. Binary Tree of SVM: A New Fast Multiclass Training and Classification Algorithm. IEEE Transaction on neural networks, Vol. 17, No. 3, May 2006.
[11]	X. Liu, H. Xing, X. Wang. A multistage support vector machine. 2nd International Conference on Machine Learning and Cybernetics, pages 13051308, 2003.
[12]	A. Ben-Hur, D. Horn, H. Siegelmann, V. Vapnik. Support vector clustering. Journal of Machine Learning Research, vol. 2:125-137, 2001.
[13]	J. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning. Pages 185-208, Cambridge, MA, 1999. MIT Press.
[14]	R. Collobert, S. Bengio, J. Mariéthoz. Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46, IDIAP, 2002.
[15]	__,	MNIST,	MiniNIST,	USA http://yann.lecun.com/exdb/mnist
[16]	T. G. Dietterich. Machine learning research: Four current directions. AI Magazine, 18(4): 97-136, Winter 1997.
[17]	G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli (eds.) First International Workshop on Multiple Classifier Systems, pp. 1-15. Springer-Verlag, 2000a.
[18]	D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169-198, 1999.
[19]	E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:105169, 1999.
[20]	Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139, 1997.
[21]	L. Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996.
[22]	T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263286, 1995.
[23]	S. D. Bay. Nearest neighbor classification from multiple feature subsets. Intelligent Data Analysis, 3(3):191-209, 1999.
[24]	J. F. Kolen and J. B. Pollack. Back propagation is sensitive to initial conditions. In Advances in Neural Information Processing Systems 3 (NIPS-90), pp. 860-867. Morgan Kaufmann, 1991.
[25]	T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139-158, 2000b.
[26]	C. Angulo and A. Catalpa. K-SVCR. A multi-class support vector machine. In R. L'opez de M'antaras and E. Plaza (eds.) Proceedings of the 11th European Conference on Machine Learning (ECML-2000), pp. 31-38. Springer-Verlag, 2000.
[27]	A. Kalousis and T. Theoharis. Noemon: Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3(5):319-337, 1999.
[28]	Johannes Fürnkranz, Round robin classification, The Journal of Machine Learning Research, 2, p.721-747, 3/1/2002
[29]	W. W. Cohen. Fast effective rule induction. In A. Prieditis and S. Russell (eds.) Proceedings of the 12th International Conference on Machine Learning (ML-95), pp. 115-123, Lake Tahoe, CA, 1995. Morgan Kaufmann.
[30]	D. Kocev, C. Vens, J. Struyf and S. D^zeroski. Ensembles of multi-objective decision trees. Proceedings of the 18th European Conference on Machine Learning (pp. 624-631) (2007). Springer.
[31]	H. Blockeel, J. Struyf. Efficient Algorithms for Decision Tree Cross-validation. Journal of Machine Learning Research 3:621-650, 2002.
[32]	D. Gorgevik, D. Cakmakov. An Efficient Three-Stage Classifier for Handwritten Digit Recognition. Proceedings of 17th Int. Conference on Pattern Recognition, ICPR2004. Vol. 4, pp. 507-510, IEEE Computer Society, Cambridge, UK, 23-26 August 2004.
[33]	C. Blake, E. Keogh and C. Merz. UCI Repository of Machine Learning Databases, (1998). Statlog Data Set, http://archive.ics.uci.edu/ml/datasets.html [Online]
[34]	Statlog Data Set, http://archive.ics.uci.edu/ml/-datasets/Letter+Recognition [Online]
Churn Prediction Model in Retail Banking Using Fuzzy C-Means Algorithm
Džulijana Popović
Zagrebačka banka d.d., Consumer Finance
Trg bana Josipa Jelačića 10, 10000 Zagreb, Croatia
E-mail: dzulijana.popovic@unicreditgroup.zaba.hr, www.zaba.hr
Bojana Dalbelo Bašić
University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3, 10000 Zagreb, Croatia E-mail: bojana.dalbelo@fer.hr
Keywords: churn prediction, fuzzy c-means algorithm, fuzzy transitional condition of the first degree, fuzzy transitional condition of the second degree, distance of k instances fuzzy sum
Received: February 22, 2008
The paper presents model based on fuz^zy methods for churn prediction in retail banking. The study was done on the real, anonymised data of 5000 clients of a retail bank. Real data are great strength of the study, as a lot of studies often use old, irrelevant or artificial data. Canonical discriminant analysis was applied to reveal variables that provide maximal separation between clusters of churners and non-churners. Combination of standard deviation, canonical discriminant analysis and k-means clustering results were used for outliers detection. Due to the fuzzy nature of practical customer relationship management problems it was expected, and shown, that fuzzy methods performed better than the classical ones. According to the results of the preliminary data exploration and fuzzy clustering with different values of the input parameters for fuzzy c-means algorithm, the best parameter combination was chosen and applied to training data set. Four different prediction models, called prediction engines, have been developed. The definitions of clients in the fuzzy transitional conditions and the distance of k instances fuzzy sums were introduced. The prediction engine using these sums performed best in churn prediction, applied to both balanced and non-balanced test sets. Povzetek: Razvita je metoda mehke logike za uporabo v bančništvu.
1 Introduction
Due to intensive competition and saturated markets,	Although the clustering analysis is in fact an
companies in all industries realize that their existing	unsupervised learning technique, it can be used as the
clients database is their most valuable asset. Retaining	basis for classification model, if the data set contains the
existing clients is the best marketing strategy to survive	classification variable, what was case in this study.
in industry and a lot of studies showed it is more	To our best knowledge this is the first paper considering
profitable to keep and satisfy existing clients than to	application of fuzzy clustering in churn prediction for constantly attract new ones [1,4,8,11]. Churn retail banking. Studies of churn prediction in banking are
management, as the general concept of identifying those	very scarce, and the most of papers used models based on
clients most prone to switching to another company, led	logistic regression, decision trees and neural networks to development of variety of techniques and models for [9,11]. Useful literature review of attrition models can be
churn prediction. Next generation of such models has to	found in [11]. Some of them [9] reported the percentage
concentrate on the improved accuracy, robustness and	of correct predictions varying from 14% to 73%,
lower implementation costs, as every delay in reaction	depending on the proportion of churners in the validation
means increased costs for the company [2].	set. The others [3] obtained AUC performance in
The aim of this study was to show that the data	subscription services varying from 69,4% for overall
mining methods based on the fuzzy logic could be	churn to 90,4% but only for churn caused by financial
successfully applied in the retail banking analysis and,	reasons, which is much easier to predict. Results are not
moreover, that the fuzzy c-means clustering performed	perfectly comparable due to differences in churn moment
better than the classical clustering algorithms in the	definitions, data sets sizes or industries, but still can
problem of churn prediction.	provide valuable subject insight.
2 Fuzzy c-means clustering algorithm
Classical clustering assigns each observation to a single cluster, without information how far or near the observation is from all the other possible decisions. This type of clustering is often called hard or crisp clustering [1,10,12]. Two major classes of crisp clustering methods are hierarchical and optimization (partitive) clustering, with number of different algorithms, used in the study. Based on the fuzzy set theory, firstly introduced by Zadeh in 1965. [5,6,10] and on the concept of membership functions, the fuzzy clustering methods have been developed. In fuzzy clustering entities are allowed to belong to many clusters with different degrees of membership.
Fuzzy clustering of X into p clusters is characterized by p membership functions Hj, where
n
Z%ifij(Xi) = 1, i = 1.2.....n,
and
0<Ztitij(iXi)<n, j = 1,2.....p.
(1) (2)
(3)
Membership functions are based on a distance function, such that membership degrees express proximities of entities to cluster centers (also called cluster prototypes).
The most known method of fuzzy clustering is the fuzzy c-means method (FCM), initially proposed by Dunn, generalized by Bezdek [5] and used in this study. FCM involves two iterative processes: the calculation of cluster centers and the assignment of the observations to these centers using some form of distance. FCM is attempting to minimize a standard loss function

(4)
from which two fundamental equations necessary to implement FCM are derived [5].
Expression (5) is used to calculate a new cluster center value:
(5)
and expression (6) to calculate the membership in the j - th cluster:

(6)
„P (Jjym-l
The symbols in the equations (4), (5) and (6) denote:
I	is the minimized loss value;
p	is the number of fuzzy clusters;
i^kO
m
Ck djt
dki
is the number of observations in the data set;
is a function that returns the membership of Xi in the k — th cluster;
is the fuzzification parameter; is the centre of the k — th cluster; is the distance metric for Xi in cluster
is the distance metric for Xt in cluster Cfc.
3 Churn prediction problem in retail banking and input data set
There is no unique definition of churn problem, but generally, term churn refers to all types of customer attrition whether voluntary or involuntary [1,3]. How to recognize it in practice depends on industry and case. In this study, client is treated as churner if he had at least one product (saving account, credit card, cash loan etc.) at time t„ and had no product at time t^+i, meaning that he cancelled all his products in the period t„+i - t^. If client still holds at least one product at time t„+i, he is considered to be non-churner.
The programs for all research phases, as well as prediction engines, were written in SAS 9.1. [12].
3.1	Input data set
The input data set has 5000 clients, chosen by random sampling from the client population in 2005, aged between 18 and 80 years, preserving the distribution of population according to introduced auxiliary variable which was product level of detection (PLOD). The class imbalance problem [2,9] was solved in the way that precisely 2500 churners and 2500 non-churners entered the final data set, what is in line with results in [2]. Regarding the moment of churn for 2500 churners five sample data sets, with different configuration of churners, have been explored. The analysis showed that the "clear" set, with churners all lost in the same quarter of the year, is best for further clustering. All clients who quitted the relationship with bank in some period, but returned after 6 months or later, were removed from the sample, as the analysis showed they behave similarly to real churners and introduce the noise.
3.2	Variable selection
Not all variables of interest were allowed for study, and the availability of more transactional variables would surely lead to better model performance [11]. This was partially proved through inclusion of derived variables (differences in time, ratios, etc.), what led to the improved accuracy of fuzzy clustering in comparison to clustering results with only original variables. All variables were measured in five equidistant points of time: to to t^. Preliminary clustering analysis showed that, as far as the original variables are statical in their character, including the values of more than two periods
leads to more noise than to greater precision. All the combinations of two periods were tested and finally the values in tg and t2 were chosen for the further analysis.
Table 1. gives the description of the 73 variables finally chosen for the further work.
time	socio-demograph ic	banking produc-ts in charge-	financial	tisd-behavior	derived for At=t2-tO
fa	3	16	14	2	6
ti	--	16	14	2	-
total	3	32	28	4	6
Table 1: Variables for the final FCM.
3.3	Canonical discriminant analysis
Canonical discriminant analysis (CDA) finds the linear combinations of the variables that provide maximal separation between clusters [12]. CDA helped in identifying the variables that best describe each of two classes/clusters: churners and non-churners. In some way CDA confirmed that the combination of t0 - t2 variable values is more adequate for FCM then the other combinations. All the coefficients and corresponding variables have been carefully examined.
3.4	Detection and removal of outliers
Although the majority of variables is not normally distributed, the standard deviation [10] in combination with CDA and k-means revealed most serious outliers better than other methods.
The most serious outliers were detected for all 73 variables separately and the intersection of those 73 sets was found. For all the outliers the data values have been checked in the data warehouse. The check confirmed that all the data are correct and that the outliers are not the
consequence of the errors in database. Top 50 outliers from that intersection were removed from the data set. The outlier removal significantly improved the performance of classical clustering and slightly improved the performance of FCM.
3.5 Results of the hierarchical and crisp k-means algorithm
To prove that fuzzy clustering performs better than the classical methods on the real retail banking data, hierarchical clustering and k-means clustering were done. Applied to all 5000 clients, almost all hierarchical methods, as well as repeated k-means, failed on the same outliers. Most of them separated only one client in the first cluster and all other 4999 were appointed to the second cluster. All the methods were repeated on the data set without top 50 outliers and some of them performed better.
Figure 1 shows the standard measures [7] for comparison of the results in churn prediction, as stated in [2] and Table 2 shows some of the results of classical clustering.
tp rate {recall, hit rate) =
fp rate (_false alarm rate) =
positives correctly classified total positives
negatives incorrectly classified
accuracy =
total nef/atives true positives + true negatives
positives + negatives specificity = 1 - fp rate
Figure 1 : Common performance metrics calculated from confusion matrix.
Table 2: Results of preliminary classical clustering.
To get the full comprehension on algorithm performance several measures have to be considered simultaneously. Recall rate of 100% means unsuccessful churners recognition, if comes in combination with specificity under 1%. Losing one client causes greater losses for the bank, then investing in marketing campaign for several clients incorrectly classified as possible churners, which means that costs of false negatives are much higher then costs of false positives. In real clients population there are much less positive then negative instances, so liberal classifiers obtaining high recall rate and acceptable specificity are considered successful in business.
4 Model setup and prediction results
FCM has been repeatedly applied on the complete data set and on the data set without top 50 outliers, with 10 different values of the fuzzification parameter m, and different initial cluster seeds. It performed slightly better without outliers, what means that FCM is very robust against outliers' presence. From application point of view that is very good property of FCM, since it will not always be profitable for the bank to detect and remove outliers, not to mention the fact that these outliers are sometimes the most active and profitable clients and they need to be included in the model development. With crisp k-means it would not be possible, because it performed incredibly poorly with these clients.
Data set was splitted into two parts: training set and test set, in three different ratios. The ratio of 90% of clients in the training set and 10% of clients in test set was chosen. According to the values of the membership functions, the clients in fuzzy transitional conditions (FTC) were detected. For that purpose two new definitions were proposed.
Definition 1. Let p be the number of clusters in the FCM algorithm. Let us denote mcixJ^^{iJ.j(,Xi)} — ix]/,ax and maxf^^lfijixi) \ fil,ax} = IJ-max for the entity xi. The entity Xj is said to be in the fuzzy transitional condition of the 1st degree if, for arbitrary small £ > 0, holds that ulijix - uliAx < £■
Definition 2. Let p be the number of clusters in the FCM algorithm. Let us denote	= h\iax.
The entity Xi is said to be in the fuzzy transitional condition of the 2nd degree if, for arbitrary small £ > 0, holds that ^i^/^x
Subsets of clients in the FTC of both degrees, and with floating e values, were further analyzed and the information gained from the fact about their membership values helped in explaining their behavior. Four prediction models were developed, based on the main idea of the distance of the new client from the clients in the training data set. For the predictive purpose in the 4th model, the definition of distance of k instances (DOKI) sums was introduced.
Definition 3. Let p be the number of clusters in the FCM algorithm and X be the set of n entities with assigned membership values Hj, j = 1,...,p. Distance of k instances sum i.e. DOKlj^C^x) sum for the new entity X-n+i is defined as the sum of membership values {jUy} in the ] — th cluster of the k nearest entities from X, according to distance metric used in FCM.
Calculation of DOKI sums requires the input parameter k and several different values were applied. Table 3 presents the results of FCM on the training set and prediction engine with DOKI sums applied on balanced and non-balanced test sets. Concept of DOKI sums might seem similar to k nearest neighbors approach, but DOKI sums up values of membership functions and not the pure distances. Recall rate for test sets were even higher then recall rate obtained with FCM on the training set. Improvement in recall was paid in slight decrease in specificity. As mentioned previously, it is more important to hit churners, even if it is paid by hitting some percentage of loyal clients. The cost minimization can be achieved later through more intelligent and multi-level communication channels.
Table 3: Results of FCM and DOKI prediction model.
5 Conclusions and further work
It is always challenging to deal with real data and business situations, where classical methods can rarely be applied in their simplest theoretical form. The main idea of the study - to prove that fuzzy logic and fuzzy data mining methods can find their place in the reality of retail banking - was completely fulfilled. FCM performed much better than the classical clustering and provided more hidden information about the clients, especially those in fuzzy transitional conditions. Three new definitions were introduced and had the impact on the overall work. Implementation of DOKI sums increased hit rate (recall) by 8,88% in comparison to pure FCM. A lot of work still needs to be done. In the near future every client and every selling opportunity will become important. Methods which require a lot of preprocessing and, above all, removing many outlying clients, will lose the battle with more efficient and robust methods. More accuracy should be obtained through better information exploitation of clients in fuzzy transitional conditions, and not through clients removal. Monitoring clients in FTCs and reacting as they approach to churners could be a way for more intelligent churn management. This requires analysis on larger data sets, including more transactional variables into the model and tuning e. Model should also include costs of positive and negative misclassifications. Different segments of clients or clients having similar product lines could be modeled
on their own, to find empirically best FCM parameters for each segment/product line.
Acknowledgement
The first author's opinions expressed in this paper do not necessarily reflect the official positions of Zagrebačka banka d.d.
This work has been supported by the Ministry of Science, Education and Sports, Republic of Croatia, under the grant No. 036-1300646-1986 and 0980982560-2563.
References
[1]	Berry J.A.M., Linoff S.G. (2004) Data Mining Techniques For Marketing, Sales, and Customer Relationship Management, 2nd Ed. Indianapolis: Wiley Publishing, Inc.
[2]	Burez J., Van den Poel D. (2008) "Handling class imbalance in customer churn prediction", Expert Systems with Applications, In Press, available online 16 May 2008.
[3]	Burez J., Van den Poel D. (2008) "Separating financial from commercial customer churn: A modeling step towards resolving the conflict between the sales and credit department", Expert Systems with Applications 35 (1-2), pp. 497-514.
[4]	Coussement K., Van den Poel D. (2006) "Churn Prediction in Subscription Services: an Application of Support Vector Machines While Comparing Two Parameter-Selection Techniques", Working Paper 2006/412, Ghent University.
[5]	Cox E. (2005) Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration, San Francisco: Morgan Kaufmann Publishers.
[6]	De Oliveira, J.V., Pedrycz W. (editors) (2007) Advances in Fuzzy Clustering and its Applications, John Wiley & Sons Ltd.
[7]	Fawcett T. (2004) ROC Graphs: Notes and Practical Considerations for Researchers, Netherlands: Kluwer Academic Publishers.
[8]	Hadden J., Tiwari A., Roy R., Ruta D. (2005) "Computer assisted customer churn management: State-of-the-art and future trends", Computers & Operations Research 34, pp. 2902-2917.
[9]	Mutanen T., Ahola J., Nousiainen S. (2006) "Customer churn prediction - a case study in retail banking", ECML/PKDD 2006 Workshop on Practical Data Mining: Applications, Experiences and Challenges, Berlin.
[10]	Theodoridis S., Koutroumbas K. (2003) Pattern Recognition, 2nd Ed., San Diego, USA: Academic Press, Elsevier Science.
[11]	Van den Poel D., Larivière B. (2004) "Customer attrition analysis for financial services using proportional hazard models", European Journal of Operational Research 157 (1), pp. 196-217.
[12]	Yeo D. (2005) Applied Clustering Techniques Course Notes, Cary NC, USA: SAS Institute Inc.
JOŽEF STEFAN INSTITUTE
Jožef Stefan (1835-1893) was one of the most prominent physicists of the 19th century. Born to Slovene parents, he obtained his Ph.D. at Vienna University, where he was later Director of the Physics Institute, Vice-President of the Vienna Academy of Sciences and a member of several scientific institutions in Europe. Stefan explored many areas in hydrodynamics, optics, acoustics, electricity, magnetism and the kinetic theory of gases. Among other things, he originated the law that the total radiation from a black body is proportional to the 4th power of its absolute temperature, known as the Stefan-Boltzmann law.
The Jožef Stefan Institute (JSI) is the leading independent scientific research institution in Slovenia, covering a broad spectrum of fundamental and applied research in the fields of physics, chemistry and biochemistry, electronics and information science, nuclear science technology, energy research and environmental science.
The Jožef Stefan Institute (JSI) is a research organisation for pure and applied research in the natural sciences and technology. Both are closely interconnected in research departments composed of different task teams. Emphasis in basic research is given to the development and education of young scientists, while applied research and development serve for the transfer of advanced knowledge, contributing to the development of the national economy and society in general.
At present the Institute, with a total of about 800 staff, has 600 researchers, about 250 of whom are postgraduates, nearly 400 of whom have doctorates (Ph.D.), and around 200 of whom have permanent professorships or temporary teaching assignments at the Universities.
In view of its activities and status, the JSI plays the role of a national institute, complementing the role of the universities and bridging the gap between basic science and applications.
Research at the JSI includes the following major fields: physics; chemistry; electronics, informatics and computer sciences; biochemistry; ecology; reactor technology; applied mathematics. Most of the activities are more or less closely connected to information sciences, in particular computer sciences, artificial intelligence, language and speech technologies, computer-aided design, computer architectures, biocybernetics and robotics, computer automation and control, professional electronics, digital communications and networks, and applied mathematics.
ranean Europe, offering excellent productive capabilities and solid business opportunities, with strong international connections. Ljubljana is connected to important centers such as Prague, Budapest, Vienna, Zagreb, Milan, Rome, Monaco, Nice, Bern and Munich, all within a radius of 600 km.
From the Jožef Stefan Institute, the Technology park "Ljubljana" has been proposed as part of the national strategy for technological development to foster synergies between research and industry, to promote joint ventures between university bodies, research institutes and innovative industry, to act as an incubator for high-tech initiatives and to accelerate the development cycle of innovative products.
Part of the Institute was reorganized into several hightech units supported by and connected within the Technology park at the Jožef Stefan Institute, established as the beginning of a regional Technology park "Ljubljana". The project was developed at a particularly historical moment, characterized by the process of state reorganisation, privatisation and private initiative. The national Technology Park is a shareholding company hosting an independent venture-capital institution.
The promoters and operational entities of the project are the Republic of Slovenia, Ministry of Higher Education, Science and Technology and the Jožef Stefan Institute. The framework of the operation also includes the University of Ljubljana, the National Institute of Chemistry, the Institute for Electronics and Vacuum Technology and the Institute for Materials and Construction Research among others. In addition, the project is supported by the Ministry of the Economy, the National Chamber of Economy and the City of Ljubljana.
Jožef Stefan Institute
Jamova 39, 1000 Ljubljana, Slovenia
Tel.:+386 1 4773 900, Fax.:+386 1 251 93 85
WWW: http://www.ijs.si
E-mail: matjaz.gams@ijs.si
Public relations: Polona Strnad
The Institute is located in Ljubljana, the capital of the independent state of Slovenia (or S9nia). The capital today is considered a crossroad between East, West and Mediter-
INFORMATICA
AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS
INVITATION, COOPERATION
Submissions and Refereeing
Please submit an email with the manuscript to one of the editors from the Editorial Board or to the Managing Editor. At least two referees outside the author's country will examine it, and they are invited to make as many remarks as possible from typing errors to global philosophical disagreements. The chosen editor will send the author the obtained reviews. If the paper is accepted, the editor will also send an email to the managing editor. The executive board will inform the author that the paper has been accepted, and the author will send the paper to the managing editor. The paper will be published within one year of receipt of email with the text in Informatica MS Word format or Informatica LATEX format and figures in .eps format. Style and examples of papers can be obtained from http://www.informatica.si. Opinions, news, calls for conferences, calls for papers, etc. should be sent directly to the managing editor.
QUESTIONNAIRE
Send Informatica free of charge
Yes, we subscribe
Please, complete the order form and send it to Dr. Drago Torkar, Informatica, Institut Jožef Stefan, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: drago.torkar@ijs.si
Since 1977, Informatica has been a major Slovenian scientific journal of computing and informatics, including telecommunications, automation and other related areas. In its 16th year (more than sixteen years ago) it became truly international, although it still remains connected to Central Europe. The basic aim of Informatica is to impose intellectual values (science, engineering) in a distributed organisation.
Informatica is a journal primarily covering the European computer science and informatics community - scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international referee-ing. It publishes scientific papers accepted by at least two referees outside the author's country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major practical achievements and innovations in the computer and information industry are presented through commercial publications as well as through independent evaluations.
Editing and refereeing are distributed. Each editor can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author's country. If new referees are appointed, their names will appear in the Refereeing Board.
Informatica is free of charge for major scientific, educational and governmental institutions. Others should subscribe (see the last page of Informatica).
ORDER FORM - INFORMATICA
Name: ...............................
Title and Profession (optional): .........
Home Address and Telephone (optional):
Office Address and Telephone (optional):
E-mail Address (optional): .............
Signature and Date: ...................
Informatica WWW: http://www.informatica.si/
Referees:
Witold Abramowicz, David Abramson, Adel Adi, Kenneth Aizawa, Suad Alagić, Mohamad Alam, Dia Ali, Alan Aliu, Richard Amoroso, John Anderson, Hans-Jurgen Appelrath, Ivän Araujo, Vladimir BajiC, Michel Barbeau, Grzegorz Bartoszewicz, Catriel Beeri, Daniel Beech, Fevzi Belli, Simon Beloglavec, Sondes Bennasri, Francesco Bergadano, Istvan Berkeley, Azer Bestavros, Andraž Bežek, Balaji Bharadwaj, Ralph Bisland, Jacek Blazewicz, Laszlo Boeszoermenyi, Damjan Bojadžijev, Jeff Bone, Ivan Bratko, Pavel Brazdil, Bostjan Brumen, Jerzy Brzezinski, Marian Bubak, Davide Bugali, Troy Bull, Sabin Corneliu Buraga, Leslie Burkholder, Frada Burstein, Wojciech Buszkowski, Rajkumar Bvyya, Giacomo Cabri, Netiva Caftori, Particia Carando, Robert Cattral, Jason Ceddia, Ryszard Choras, Wojciech Cellary, Wojciech Chybowski, Andrzej Ciepielewski, Vic Ciesielski, Mel Ó Cinnéide, David Cliff, Maria Cobb, Jean-Pierre Corriveau, Travis Craig, Noel Craske, Matthew Crocker, Tadeusz Czachorski, Milan (Ćeška, Honghua Dai, Bart de Decker, Deborah Dent, Andrej Dobnikar, Sait Dogru, Peter Dolog, Georg Dorfner, Ludoslaw Drelichowski, Matija Drobnic, Maciej Drozdowski, Marek Druzdzel, Marjan Družovec, Jozo Dujmovic, Pavol iDuriš, Amnon Eden, Johann Eder, Hesham El-Rewini, Darrell Ferguson, Warren Fergusson, David Flater, Pierre Flener, Wojciech Fliegner, Vladimir A. Fomichov, Terrence Forgarty, Hans Fraaije, Stan Franklin, Violetta Galant, Hugo de Garis, Eugeniusz Gatnar, Grant Gayed, James Geller, Michael Georgiopolus, Michael Gertz, Jan Golinski, Janusz Gorski, Georg Gottlob, David Green, Herbert Groiss, Jozsef Gyorkos, Marten Haglind, Abdelwahab Hamou-Lhadj, Inman Harvey, Jaak Henno, Marjan Hericko, Henry Hexmoor, Elke Hochmueller, Jack Hodges, John-Paul Hosom, Doug Howe, Rod Howell, Tomdš Hruška, Don Huch, Simone Fischer-Huebner, Zbigniew Huzar, Alexey Ippa, Hannu Jaakkola, Sushil Jajodia, Ryszard Jakubowski, Piotr Jedrzejowicz, A. Milton Jenkins, Eric Johnson, Polina Jordanova, Djani Juricic, Marko Juvancic, Sabhash Kak, Li-Shan Kang, Ivan Kapust0k, Orlando Karam, Roland Kaschek, Jacek Kierzenka, Jan Kniat, Stavros Kokkotos, Fabio Kon, Kevin Korb, Gilad Koren, Andrej Krajnc, Henryk Krawczyk, Ben Kroese, Zbyszko Krolikowski, Benjamin Kuipers, Matjaž Kukar, Aarre Laakso, Sofiane Labidi, Les Labuschagne, Ivan Lah, Phil Laplante, Bud Lawson, Herbert Leitold, Ulrike Leopold-Wildburger, Timothy C. Lethbridge, Joseph Y-T. Leung, Barry Levine, Xuefeng Li, Alexander Linkevich, Raymond Lister, Doug Locke, Peter Lockeman, Vincenzo Loia, Matija Lokar, Jason Lowder, Kim Teng Lua, Ann Macintosh, Bernardo Magnini, Andrzej Malachowski, Peter Marcer, Andrzej Marciniak, Witold Marciszewski, Vladimir Marik, Jacek Martinek, Tomasz Maruszewski, Florian Matthes, Daniel Memmi, Timothy Menzies, Dieter Merkl, Zbigniew Michalewicz, Armin R. Mikler, Gautam Mitra, Roland Mittermeir, Madhav Moganti, Reinhard Moller, Tadeusz Morzy, Daniel Mossé, John Mueller, Jari Multisilta, Hari Narayanan, Jerzy Nawrocki, Rance Necaise, Elzbieta Niedzielska, Marian Niedq'zwiedzinski, Jaroslav Nieplocha, Oscar Nierstrasz, Roumen Nikolov, Mark Nissen, Jerzy Nogiec, Stefano Nolfi, Franc Novak, Antoni Nowakowski, Adam Nowicki, Tadeusz Nowicki, Daniel Olejar, Hubert Österle, Wojciech Olejniczak, Jerzy Olszewski, Cherry Owen, Mieczyslaw Owoc, Tadeusz Pankowski, Jens Penberg, William C. Perkins, Warren Persons, Mitja Peruš, Fred Petry, Stephen Pike, Niki Pissinou, Aleksander Pivk, Ullin Place, Peter Planinšec, Gabika Polcicovä, Gustav Pomberger, James Pomykalski, Tomas E. Potok, Dimithu Prasanna, Gary Preckshot, Dejan Rakovic, Cveta Razdevšek Pucko, Ke Qiu, Michael Quinn, Gerald Quirchmayer, Vojislav D. Radonjic, Luc de Raedt, Ewaryst Rafajlowicz, Sita Ramakrishnan, Kai Rannenberg, Wolf Rauch, Peter Rechenberg, Felix Redmill, James Edward Ries, David Robertson, Marko Robnik, Colette Rolland, Wilhelm Rossak, Ingrid Russel, A.S.M. Sajeev, Kimmo Salmenjoki, Pierangela Samarati, Bo Sanden, P. G. Sarang, Vivek Sarin, Iztok Savnik, Ichiro Satoh, Walter Schempp, Wolfgang Schreiner, Guenter Schmidt, Heinz Schmidt, Dennis Sewer, Zhongzhi Shi, Märia Smolärovä, Carine Souveyet, William Spears, Hartmut Stadtler, Stanislaw Stanek, Olivero Stock, Janusz Stoklosa, Przemyslaw Stpiczynski, Andrej Stritar, Maciej Stroinski, Leon Strous, Ron Sun, Tomasz Szmuc, Zdzislaw Szyjewski, Jure Šilc, Metod Škarja, Jiri Šlechta, Chew Lim Tan, Zahir Tari, Jurij Tasic, Gheorge Tecuci, Piotr Teczynski, Stephanie Teufel, Ken Tindell, A Min Tjoa, Drago Torkar, Vladimir Tosic, Wieslaw Traczyk, Denis Trcek, Roman Trobec, Marek Tudruj, Andrej Ule, Amjad Umar, Andrzej Urbanski, Marko Uršic, Tadeusz Usowicz, Romana Vajde Horvat, Elisabeth Valentine, Kanonkluk Vanapipat, Alexander P. Vazhenin, Jan Verschuren, Zygmunt Vetulani, Olivier de Vel, Didier Vojtisek, Valentino Vranic, Jozef Vyskoc, Eugene Wallingford, Matthew Warren, John Weckert, Michael Weiss, Tatjana Welzer, Lee White, Gerhard Widmer, Stefan Wrobel, Stanislaw Wrycza, Tatyana Yakhno, Janusz Zalewski, Damir Zazula, Yanchun Zhang, Ales Zivkovic, Zonling Zhou, Robert Zorc, Anton P. Železnikar
Informatica
An International Journal of Computing and Informatics
Web edition of Informatica may be accessed at: http://www.informatica.si.
Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer, Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Vožarski pot 12, 1000 Ljubljana, Slovenia.
The subscription rate for 2009 (Volume 33) is
-	60 EUR for institutions,
-	30 EUR for individuals, and
-	15 EUR for students
Claims for missing issues will be honored free of charge within six months after the publication date of the issue.
Typesetting: Borut Žnidar.
Printing: Dikplast Kregar Ivan s.p., Kotna ulica 5, 3000 Celje.
Orders may be placed by email (drago.torkar@ijs.si), telephone (+386 1 477 3900) or fax (+386 1 251 93 85). The payment should be made to our bank account no.: 02083-0013014662 at NLB d.d., 1520 Ljubljana, Trg republike 2, Slovenija, IBAN no.: SI56020830013014662, SWIFT Code: LJBASI2X.
Informatica is published by Slovene Society Informatika (president Niko Schlamberger) in cooperation with the following societies (and contact persons): Robotics Society of Slovenia (Jadran Lenarcic) Slovene Society for Pattern Recognition (Franjo Pernuš)
Slovenian Artificial Intelligence Society; Cognitive Science Society (Matjaž Gams) Slovenian Society of Mathematicians, Physicists and Astronomers (Bojan Mohar) Automatic Control Society of Slovenia (Borut Zupancic)
Slovenian Association of Technical and Natural Sciences / Engineering Academy of Slovenia (Igor Grabec) ACM Slovenia (Dunja Mladenic)
Informatica is surveyed by: Citeseer, COBISS, Compendex, Computer & Information Systems Abstracts, Computer Database, Computer Science Index, Current Mathematical Publications, DBLP Computer Science Bibliography, Directory of Open Access Journals, InfoTrac OneFile, Inspec, Linguistic and Language Behaviour Abstracts, Mathematical Reviews, MatSciNet, MatSci on SilverPlatter, Scopus, Zentralblatt Math
The issuing of the Informatica journal is financially supported by the Ministry of Higher Education, Science and Technology, Trg OF 13, 1000 Ljubljana, Slovenia.
Informatica
An International Journal of Computing and Informatics
An Exquisite Mutual Authentication Scheme with Key Agreement Using Smart Card
Routing Scalability in Multicore-Based Ad Hoc Networks
Similarity Measures for Relational Databases Robust H^ Control of a Doubly Fed Asynchronous Machine
Balancing Load in a Computational Grid Applying Adaptive, Intelligent Colonies of Ants
Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis Applying SD-Tree for Object-Oriented Query Processing
Improving Design Pattern Adoption with an Ontology-Based Repository
Historical Impulse Response of Return Analysis Shows Information Technology Improves Stock Market Efficiency
Fall Detection and Activity Recognition with Machine Learning
Rajan Transform and its Uses in Pattern Recognition
A Petri-Net Approach to Refining Object Behavioural Specifications A Multi-Class SVM Classifier Utilizing Binary Decision Tree
Churn Prediction Model in Retail Banking Using Fuzzy C-Means Algorithm
C.-H. Liao, H.-C. Chen, C.-T. Wang A. Marowka
M. Hajdinjak, A. Bauer
G.	Sofiane, Y. Said, S. Moussa
M.A. Salehi,
H.	Deldari, B.M. Dorri
Ž. Agic, Z. Dovedan, M. Tadic
I.E.	Shanthi, R. Nadarajan
L. Pavlic, M. Hericko, V. Podgorelec, I. Rozman W. Leigh, R. Purvis
117
125
135 143
151 161 169 181
191
M. Luštrek, B. Kaluža 197
E.N. Mandalapu,	205
E.G. Rajan
K.-S. Cheung,	213
P. K.-O. Chow
G. Madzarov,	225
D. Gjorgjevikj, I. Chorbev
D. Popovic, B.D. Bašic 235