© Acta hydrotechnica 24/40 (2006), Ljubljana ISSN 1581-0267 21 UDK/UDC: 519.61/.64:556.55 Prejeto/Received: 15. 1. 2008 Pregledni znanstveni prispevek – Review scientific paper Sprejeto/Accepted: 5. 8. 2008 A VTOMATIZIRANO MODELIRANJE JEZER Z UPORABO PODATKOV IN EKSPERTNEGA ZNANJA: EV ALV ACIJA APLIKACIJ AUTOMATED MODELLING OF LAKES FROM DATA AND EXPERT KNOWLEDGE: EV ALUATION OF APPLICATIONS Nataša ATANASOVA Ekološki modeli jezer so uporabna orodja tako za boljše razumevanje delovanja ekosistema kot tudi za upravljanje z jezeri, oblikovanje okoljske politike ter testiranje in vrednotenje razli čnih inženirskih rešitev. Postavitev takega modela mo čno otežuje sama kompleksnost ekosistema. Zato je za pove čanje relevantnosti in zanesljivosti bodo čega modela ekosistema zaželeno uporabljati čimveč modelarskih pristopov. V tem prispevku podajamo oceno metode Lagramge, ki združuje dva osnovna pristopa k modeliranju, to sta u čenje modelov iz podatkov (induktivni pristop) in gradnja modelov iz teoreti čnega domenskega znanja (deduktivni pristop). Metoda podpira vnos domenskega znanja v proces u čenja iz podatkov, pri čemer je domensko znanje shranjeno v obliki modelne knjižnice. Štiri aplikacije metode, tj. jezero Glumsø, Blejsko jezero, jezero Kasumigaura in jezero Greifensee, obsegajo razli čne naloge za Lagramgea, katerih rezultat so strukturno in po vrednosti parametrov specifi čni modeli opazovanih ekosistemov. Modele smo vrednotili v smislu njihove uporabnosti in natan čnosti. Kljub nekaterim pomanjkljivostim evalvacija metode pokaže, da je ta uporabna za modeliranje kompleksnih domen. Uspešno jo lahko uporabimo tako za gradnjo modelov kot tudi za druga znanstvena dognanja, kot so prepoznavanje dinami čnih vzorcev obnašanja ekosistemov oziroma strukturne dinamike ekosistemov. Klju čne besede: jezera, avtomatizirano modeliranje, Lagramge, modelna knjižnica, konceptualno modeliranje, modeliranje iz podatkov. Ecological models of lakes are useful tools for a better understanding of the ecosystem behaviour, lake management, policy making, as well as testing and accepting engineering solutions. Setting such model is a difficult task due to the complexity of these ecosystems. Therefore it is reasonable to use as many approaches as possible to construct a reliable model of the observed domain. In this paper the evaluation of an automated modelling method, called Lagramge, that combines the two basic approaches, i.e. data-driven (inductive) approach and knowledge-driven (deductive) approach, is given. The method supports the introduction of domain knowledge in the procedure of equation discovery from measured data, where the domain modelling knowledge is introduced in a form of modelling knowledge library. Four applications of the method, i.e. Lake Glumsø, Lake Bled, Lake Kasumigaura, and Greifensee, comprise different modelling tasks for Lagramge, each of them resulting in a specific model of the observed domains. The models are evaluated in terms of their descriptive power and their performance (goodness of fit to the measurements). Although faced with some constraints, the method can be successfully used in complex domains. It can be used successfully for model discovery as well as for other scientific discoveries, such as identifying dynamic patterns in the observed system, i.e. dynamic structure of the ecosystem. Key words: lakes, automated modelling, Lagramge, modelling knowledge library, conceptual modelling, data-driven modelling. 1. UVOD Jezera so kompleksni in dinami čni ekosistemi. V številnih primerih je njihovo 1. INTRODUCTION Lakes are complex and dynamic ecosystems. In many cases their behaviour is Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 22 obnašanje težko predvidljivo, kar botruje neuspešnemu upravljanju z jezeri. Ekološki modeli jezer so orodja, ki se zaenkrat ve činoma uporabljajo v znanstvenih raziskavah, vendar pa se njihova uporabnost vedno znova pokaže tudi pri upravljanju, pri sprejemanju okoljske politike kot tudi pri preverbi in sprejemanju inženirskih rešitev. Veliko takih modelov je bilo razvitih in objavljenih v svetovni literaturi. Zajetna baza ekoloških modelov obstaja na: http://dino.wiz.uni-kassel.de/ecobas.html (Benz & Voigt, 1996; Benz & Knorrenschild, 1997). Modeli v bazi so dokumentirani pod enotnim sistemom, imenovanim ECOBAS (Benz & Hoch, 1997; Hoch et al., 1998; Benz et al., 2001). Kjub njihovi prisotnosti in popularnosti med raziskovalci ostaja naloga gradnje ekološkega modela velik izziv, saj smo pri tem dostikrat soo čeni s problemom razumevanja delovanja nekaterih procesov v sistemu. Zato je uporaba čimve č pristopov k modeliranju zelo priporo čljiva. Slika 1 prikazuje temeljne pristope k modeliranju (Kompare, 1995). Razvidna je konstantna prepletenost med osnovno teorijo, meritvami (opazovanji sistema) in modelom. V odvisnosti od za četne to čke našega modelarskega postopka v tem diagramu lo čimo med dvema osnovnima pristopoma, to sta pristop gradnje (učenje) modela iz podatkov (induktivni) in gradnja modela iz osnovne teorije (deduktivni pristop). Pri teoreti čnem (deduktivnem) pristopu so modeli zgrajeni iz osnovnih fizikalnih, kemijskih in bioloških principov. So torej transparentni in razumljivi raziskovalcem in domenskim ekspertom, zaradi česar so zelo priljubljeni (npr. Jørgensen & Bendoricchio, 2001; DeAngelis, 1992; Chapra, 1997). Kot pa smo že omenili, so ekosistemi zelo kompleksni in naše znanje o njihovem delovanju omejeno. Zato so tudi enačbe modelov prirejene našemu pomanjkljivemu znanju. To ima za posledico številne neznane parametre v modelih po eni strani, po drugi pa številne modelne strukture v literaturi, ki opisujejo isti proces. Z drugimi besedami: enoli čni model (lahko je ustrezen, a ne nujno ‘pravi’) za dolo čen ekosistem ne obstaja. Torej je kvaliteta modelov mo čno odvisna od modelarjevih sposobnosti in izkušenj. difficult to predict, which makes their management unsuccessful. Ecological models of lakes are highly appreciated tools, whose use is still primarily for scientific purposes, but they can also be well used in environmental management, policy making, as well as testing and accepting engineering solutions. Many such models have been developed and published in the literature. A comprehensive database of ecological models can be found at: http://dino.wiz.uni-kassel.de/ecobas.html (Benz & Voigt, 1996; Benz & Knorrenschild, 1997). The models in the database are documented under a unifying documentation system called ECOBAS (Benz & Hoch, 1997; Hoch et al., 1998; Benz et al., 2001). Despite of their omnipresence, the task of establishing models is very demanding. Many times the modeller is faced with a problem of understanding the system in the first place. Therefore, using as many approaches to modelling as possible is strongly encouraged. Figure 1 represents the basic modelling paradigm (Kompare, 1995). It indicates a constant interaction between the basic theory, measurements (systems observations) and the model. Based on the starting point of our modelling procedure in this diagram, we distinguish between two basic approaches, i.e. data-driven (inductive) approach and knowledge-driven (deductive) approach. Knowledge-based (deductive) approach models are constructed from basic physical, chemical and biological principles. Therefore they are transparent and clear to the domain experts, which results in their popularity (e.g., Jørgensen & Bendoricchio, 2001; DeAngelis, 1992; Chapra, 1997). However, as already stated, ecosystems are complex and our knowledge is not complete. Thus, the equations in mathematical models are adapted with regard to our incomplete knowledge. This results in many unknown parameters in our models on one hand, and in a set of possible mathematical model structures for a single process on the other hand. In other words, there is no single suitable (corresponding, but not necessarily correct!) model for a specific system. Thus, the quality of the obtained models greatly depends on the modeller’s skills and experiences. Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 23 Induktivni pristop se za čne tukaj meritve identifikacija sistema indukcija model: črna skrinjica semi-transparentni transparentni hipoteza teorija na črtovanje eksperimentov Deduktivni pristop se za čne tukaj za potrditev modela, hipoteze alo teorije Slika 1. Pristopi k modeliranju, prirejeno po Kompare (1995). Inductive approach starts here measurements system identification induction model: black-box semi-transparent transparent hypothesis theory design of experiments Deductive approach starts here to prove the model, hypothesis, or the theory Figure 1. Modelling paradigm, adopted from Kompare (1995). Nasprotno deduktivnemu pristopu je u čenje modelov iz podatkov (opazovanj). Induktivni pristop vključuje statisti čne metode kot tudi metode s podro čja umetne inteligence. Ta pristop omogo ča reševanje razli čnih ekoloških problemov brez vnosa domenskega znanja v proces gradnje modela. Model je zgrajen samo na podlagi meritev oziroma opazovanj. In contrast to the knowledge-driven modelling, data-driven modelling is aimed at building models from observations. It comprises statistical methods, as well as methods that include data mining (artificial intelligence). This approach enables to tackle various (ecological) problems without the necessity to introduce any domain knowledge Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 24 Številne izmed omenjenih metod, še zlasti pa statisti čne metode, proizvajajo t. i. ‘black-box’ modele, ki jih eksperti ne morejo razložiti (ali pa zelo pomanjkljivo) ali opisati njihovega delovanja. Vendar pa podpodro čje umetne inteligence, strojno u čenje (ML), vsebuje algoritme, s katerimi lahko dobimo t. i. semitransparentne modele. Te lahko domenski eksperti delno razumejo in razložijo. Uspešne aplikacije razli čnih algoritmov strojnega učenja v ekologiji najdemo npr. v (Kompare, 1995; Kompare et al., 2001; Todorovski et al., 1998). Zavedajo č se prednosti in pomanjkljivosti teoreti čnega in induktivnega pristopa h gradnji modela sta Džeroski & Todorovski (2003) razvila metodo Lagramge, ki uporablja kombinacijo teh dveh oziroma združuje prednosti obeh pristopov: transparentnost, enostavnost in natan čnost. Metoda podpira vnos domenskega znanja v postopek indukcije enačb iz merjenih podatkov. Domensko znanje vnašamo v obliki generi čnih procesov. V tem smislu je Todorovski (2003) razvil formalizem za zapis domenskega znanja v knjižnico za modeliranje. Z uporabo formalizma je bila izdelana domenska knjižnica za modeliranje jezerskih ekosistemov (Atanasova et al., 2006a). Znanje v knjižnici obsega modeliranje prehranjevalnih mrež (ali kroženja hranil) v jezerih z navadnimi diferencialnimi ena čbami oziroma s principom masnih bilanc. Formalizirano je v obliki: (1) taksonomije tipov spremenljivk, (2) osnovnih generi čnih procesov, ki opisujejo delovanje vodnih ekosistemov, (3) alternativnih modelov (formulacije) osnovnih procesov in (4) znanja, kako kombinirati te modele posameznih procesov v model celotnega ekosistema. Z uporabo knjižnice in orodja za avtomatsko modeliranje Lagramge so bili zgrajeni modeli slede čih realnih domen: Blejsko jezero, Slovenija (Atanasova et al., 2006b), jezero Kasumigaura, Japonska (Atanasova et al., 2006c), jezero Glumsø, Danska (Atanasova et al., 2007), Beneška laguna, Italija (Atanasova, 2005) in jezero in the process of model construction. The measurements solely drive the procedure of model construction. As a result many of these methods, e.g. statistical methods, produce the so called ‘black-box’ models, which cannot be explained, or rather vaguely, by domain experts. Yet, a branch of artificial intelligence algorithms, i.e., machine learning (ML), tends to produce the so called semi-transparent models. These can be partly explained and understood by a domain expert. Successful applications of different machine learning techniques in ecology can be found for example in (Kompare, 1995; Kompare et al., 2001; Todorovski et al., 1998). Knowing the benefits and the drawbacks of the knowledge-driven and data-driven approaches, respectively, Džeroski & Todorovski (2003) developed a method (Lagramge) that brings together the good properties of both approaches (deductive and inductive), i.e. transparency, simplicity and accuracy. The method supports the introduction of domain knowledge in the procedure of equation discovery from the measured data. Domain knowledge is introduced in a form of generic processes. In this manner Todorovski (2003) developed the formalism for encoding the domain knowledge into a modelling library. Using the developed formalism Atanasova et al. (2006a) elaborated a knowledge library for modelling of lake ecosystems. The knowledge in the library comprises modelling of the food web (or nutrient cycling) in a lake by following the mass conservation principle. It is formalized in terms of: (1) taxonomy of variable types, (2) basic processes that govern the behaviour of aquatic ecosystems, (3) alternative models of the basic processes, and (4) knowledge how to combine models of individual processes into a model of the entire ecosystem. Using the library and the automated modelling method Lagramge models on several real-world domains, i.e., Lake Bled, Slovenia (Atanasova et al., 2006b), Lake Kasumigaura, Japan (Atanasova et al., 2006c), Lake Glumsø, Denmark (Atanasova et al., 2007), Lagoon of Venice, Italy (Atanasova, Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 25 Greifensee, Švica (Atanasova et al., 2005). Cilj tega prispevka je podati pregled teh aplikacij in na ta na čin ovrednotiti metodo avtomatskega modeliranja s strani uporabnosti v ekološkem modeliranju. Predstavljene so tudi smernice za nadaljnji razvoj orodja Lagramge. 2. A VTOMATIZIRANO MODELIRANJE Z METODO LAGRAMGE 2.1 POSTOPEK MODELIRANJA Postopek modeliranja pri teoreti čnem pristopu se praviloma za čne z identifikacijo problema in zbiranjem podatkov (slika 2, levo). 2005), and Greifensee, Switzerland (Atanasova et al., 2005), were constructed. The aim of this paper is to give an overview of these applications, and to evaluate the method in the light of its applicability in ecological modelling. Finally, future trends in the development of Lagramge are presented. 2. AUTOMATED MODELLING METHOD LAGRAMGE 2.1 MODELLING PROCEDURE Typically the modelling procedure within the theoretical approach begins with Problem identification and Data collection (Figure 2, left). Izbira matemati čnih formulacij za procese v KM Kalibracija Simulacija modela Simulacija se prilega meritvam Merjeni podatki Kon čano DA DA Avtomatsko iskanje mat. struktur za vsak KM Modelne strukture za vsak KM Kalibracija vseh modelnih struktur, za vsak KM Najboljši konceptualni in matemati čni model, t.j. model, ki se najbolje prilega meritvam (ali testnim podatkom) Matemati čni model s konstantnimi parametri Modelling task specifications Verifikacija zadovoljiva NE Validacija (simulacija na testnih podatkih) Simulacija se prilega meritvam NE DA NO Evaluacijski (testni) podatki Konceptualni model sistema (KM) Definicija problema in zbiranje podatkov Konceptualni modeli sistema (KM) Knjižnica modelov TEORETI ČNI PRISTOP K MODELIRANJU HIBRIDNI PRISTOP: TEORETI ČNI IN INDUKTIVNI Slika 2. Primerjava dveh pristopov k modeliranju. Levo: teoreti čni pristop, desno: hibridni pristop oziroma kombinacija teoreti čnega pristopa in pristopa u čenja modelov iz podatkov (induktivnega). Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 26 Selection of mathematical expressions for the processes in the CM Calibration Simulation of the model Simulated data fit the measurements Measured data Model accepted YES YES Conceptual model (CM) of the system Conceptual modells of the system Automated compilation of mathematical structures for each CM Modelling knowledge library Candidate mathematical models for each CM Calibration of all model structures, for each CM Best conceptual and mathematical model, i.e. model that fits the measurements (or the test data) best Mathematical model with constant parameters KNOWLEDGE BASED APPROACH COMBINED APPROACH: KNOWLEDGE AND DATA DRIVEN Modelling task specifications Problem identification and data collection Verification OK NO Validation (simulation on test data) Simulated data fit the test data NO YES NO Evaluation (test) data set Figure 2. Comparison of two approaches to process based modelling. Left: knowledge based approach, right: hybrid approach, i.e. combination of knowledge based approach and data driven approach. Naslednji korak je konceptualno modeliranje, kjer določimo poenostavljeno strukturo sistema z izbiro (1) relevantnih spremenljivk v sistemu in (2) relevantnih biogeokemijskih procesov, ki delujejo na izbrane spremenljivke. Tu velja opozoriti, da je izbira ustreznega konceptualnega modela zahtevna naloga, ki dostikrat zahteva preverbo razli čnih variant. Ko izberemo konceptualni model, nastopi matematično modeliranje, kjer izbrane procese zapišemo z ustreznimi matemati čnimi formulacijami. Tudi tu moramo opozoriti, da za en proces obstaja ve č (pravilnih) matemati čnih izrazov. Rezultat tega koraka je matemati čni model sistema, sestavljen iz enačb s konstantnimi parametri. Vrednosti konstantnih parametrov se dolo čijo bodisi z meritvami, privzamejo iz literature ali pa se Conceptual modelling or system identification is the next step, where a simplified structure of the system is determined by selecting (1) the relevant variables in the system and (2) the relevant bio-geo-chemical processes that connect these variables. Note here that choosing a correct conceptual model of the observed system is a non-trivial task that many times requires the consideration of different possibilities and trying them out. Once the conceptual model is selected, Mathematical modelling comes to play, where all processes are identified with suitable mathematical formulations. Here too, we should note that each process can be identified with more than one mathematical expression. The result of this step is a mathematical model of the system, composed of equations with constant parameters. The values of the Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 27 umerijo na podane meritve. Naslednji trije koraki so namenjeni preverjanju in izboljšavam ekološkega modela. Verifikacija je definirana kot "prikaz, da je formalizem modela pravilen", Rykiel (1996). V tem koraku pokažemo, da so kalkulacije z modelom smiselne, vhodi v model in ra čunalniška koda pa pravilni. Vendar pa model, s katerim vršimo te kalkulacije, vsebuje parametre, katerih vrednost še ni popolnoma znana. Zato jih ne smemo smatrati kot popolnoma zanesljive (Oreskes et al., 1994). Kalibracija (identifikacija parametrov) je "ocena in prilagajanje parametrov in konstant modela za izboljšanje prileganja simuliranih vrednosti z modelom meritvam", Rykiel (1996). Kalibracija se lahko vrši kot del verifikacije ali validacije. Zadnji korak, validacija, se nanaša na delovanje modela v domeni, za katero je zgrajen (Rykiel, 1996). Validacija primerja simulirane vrednosti z modelom z realnimi opazovanji v sistemu, ki niso bili uporabljeni med razvojem modela. 2.2 PRIMERJAVA TEORETI ČNEGA IN LAGRAMGE PRISTOPA K MODELIRANJU Pri teoreti čnem pristopu k modeliranju se vsi koraki v postopku izvajajo iterativno. Potrebna je stalna interakcija modelarja, ki spreminja konceptualni in/ali matemati čni model in posledi čno v zanki izvaja preostale korake za vsako spremembo. Zanka se za čne z verifikacijo modela (slika 2, levo). Po formulaciji matemati čnega modela za izbrani konceptualni model izvedemo verifikacijo, tj. preverimo pravilnost formulacije in obnašanja modela. Če je preverba zadovoljiva, izvedemo simulacijo modela in pri tem primerjamo simulirane in merjene vrednosti spremenljivk. Če primerjava ni zadovoljiva, imamo tri možnosti za izboljšavo modela: kalibracijo, spremembo konceptualnega modela in spremembo matemati čnega modela za podani konceptualni model. Vsaka možnost vodi v nov matematični model s konstantnimi parametri, ki ga ustrezno preverimo in simuliramo. Te postopke ponavljamo toliko časa, dokler ne dobimo zadovoljive primerjave constant parameters are measured, estimated or adopted from the literature. The next three steps are for testing and improving ecological models. Verification is defined as "a demonstration that the modeling formalism is correct", Rykiel (1996). Here we determine that any calculations, inputs, or computer code are correct or true. However, the models that use these calculations are based on parameters which are not completely known so they cannot be taken as absolute truth (Oreskes et al., 1994). Calibration (parameter identification) is defined as "the estimation and adjustment of model parameters and constants to improve the agreement between model output and a data set", Rykiel (1996). Calibration can occur as part of either verification or validation. Finally, Validation, refers to model’s performance (Rykiel, 1996). Validation compares simulated system output with real system observations using data not used in model development. 2.2 THEORETICAL VS. LAGRAMGE APPROACH In theoretical approach all modelling steps are performed iteratively. The modeller constantly interacts by adopting/changing the conceptual and/or mathematical model and performing the rest of the modelling steps for each change. The loop starts with the Verification of the model (see Figure 2, left). After we formulate a mathematical model for the selected conceptual model, we perform verification, i.e. determine the correctness of the model formulation and behaviour. If the test is passed we simulate the model and compare the simulated data with the measurements. If this comparison is unsatisfactory, there are three options for improving the model – calibration, modification of the conceptual model, or modification of the mathematical formulation of the existing conceptual model. Each option leads to a new mathematical model with constant parameters, which is verified and simulated. This loop is repeated until satisfactory comparison between the simulated Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 28 med simuliranimi vrednostmi v sistemu z modelom in merjenimi. Število ponavljanj korakov v postopku modeliranja dolo čajo tri pomembna vprašanja: (1) ali je modelar izbral pravilen konceptualni model, (2) ali je modelar izbral pravilno matematično strukturo za izbrani konceptualni model in (3) ali so bili parametri modela optimalno določeni. Ta vprašanja lahko sistemati čno rešujemo z uporabo metodologije za avtomatizirano modeliranje, Lagramge, ki deluje na podlagi metode za odkrivanje ena čb. V nasprotju s teoreti čnim pristopom, ki zahteva veliko interakcij modelarja, Lagramge izvaja vse korake avtomati čno, razen izbire konceptualnega modela. Kljub temu lahko Lagramgeu podamo ve č možnih konceptualnih modelov hkrati, za katere izvede odkrivanje ustreznega matemati čnega modela (slika 2, desno). Za dani konceptualni model torej Lagramge odkrije matemati čni model, strukturo in vrednost parametrov. To stori na osnovi (1) knjižnice znanja, kjer je formalno zapisano splošno znanje o modeliranju, (2) specifikacije sistema za modeliranje, ki ustreza konceptualnemu modelu ekosistema. Tu uporabnik opredeli pomembne spremenljivke in procese v opazovanem sistemu in (3) časovne serije merjenih podatkov opredeljenih spremenljivk. Na podlagi podane specifikacije ekosistema in merjenih podatkov Lagramge izvede hevristi čno iskanje v knjižnici splošnega znanja o modeliranju. Rezultat tega iskanja je množica specifi čnih matemati čnih struktur modelov, primernih za modeliranje sistema, tj. Lagramge dolo či prostor potencialnih modelnih (matemati čnih) struktur za dani konceptualni model. Hkrati Lagramge umerja parametre potencialnih matemati čnih modelov glede na podane meritve, tj. izvaja nelinearno optimizacijo konstantnih parametrov vsake potencialne matemati čne strukture modela na podane meritve. Ti modeli se vrednotijo z dvema hevristi čnima funkcijama. Prva je srednja vrednost vsote kvadratov napake, MSE (mean square error), ki meri odstopanje med merjenim podatkom in podatkom, dobljenim s simulacijo modela. Druga je funkcija najkrajše dolžine opisa, MDL (minimum description length) in upošteva kompleksnost dobljenega and the measured data is achieved. There are three important issues, which determine the number of repetitions of the modelling procedure: (1) has the modeller selected the correct conceptual model, (2) has the modeller selected the correct mathematical structure for the selected concept, and (3) parameter estimation procedure. These issues can be addressed systematically using an automated modelling methodology, Lagramge, which is based on equation discovery method. In contrast to the knowledge based approach, Lagramge performs all of the modelling steps automatically, except for the conceptual modelling. Still, we can feed Lagramge with several conceptual models at once and perform the mathematical model development in a single iteration for each conceptual model (See Figure 2, right). Thus, for a given conceptual model Lagramge discovers the mathematical model, i.e. structure and parameters’ values. This is done based on (1) knowledge library, where the general modelling knowledge is encoded, (2) modelling task specification, which corresponds to the conceptual model of the system, where the user specifies important variables and processes that take place in the observed system, and (3) time series data of the specified variables. After taking the modelling task specification and the measurements, Lagramge performs a heuristic search in the knowledge library. The result of this search is a list of specific mathematical model structures that can be used to model the processes specified in the task specification, i.e. a space of candidate model structures for a given conceptual model. At the same time Lagramge takes all candidate model structures and matches each of them to submitted data by fitting the values of the constant parameters, i.e. Lagramge performs non-linear optimisation of the constant models’ parameters according to the measurements. These models can be evaluated by two heuristic functions. One is mean square error (MSE) – it measures the discrepancy between the measured data and the data obtained by simulating the model. The other is the minimum description length (MDL) Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 29 modela, s tem da vnese preferenco za enostavnejše modele. Model z najmanjšo napako (MSE ali MDL) je najboljši matemati čni model za podani konceptualni model (specifikacija sistema) opazovanega sistema. Uporaba orodja Lagramge za ekološko modeliranje je predstavljena skozi aplikacije Lagramgea na realnih domenah, in sicer natan čneje na primeru jezera Glumsø (poglavje 3.1). 3. APLIKACIJE LAGRAMGEA: OPIS DOMEN, PODATKOV IN EKSPERIMENTOV 3.1 JEZERO GLUMSØ Jezero Glums ø (Jørgensen et al., 1986) se nahaja v sub-glacialni dolini na Danskem. Je plitvo jezero s povprečno globino blizu 2 m. Površina jezera Glumsø meri 266.000 m 2 . Nekaj let se je v jezero stekala odpadna voda naselja s 3000 prebivalci, o čiš čena do druge stopnje, tj. biološko je bil odstranjen organski ogljik, ne pa tudi dušik in fosfor. Dodatna obremenitev z dušikom in fosforjem je prispevna površina jezera, ki meri 10,9 km 2 in je pretežno agrarnega zna čaja. Visoke obremenitve s hranili (dušik in fosfor) v odpadni vodi povzro čajo hiperevtrofno stanje jezera. Jezero ne vsebuje podvodne vegetacije, najverjetneje zaradi slabe prosojnosti in pomanjkanja kisika. Naša naloga je odkriti ustrezen model jezera z orodjem Lagramge. Kot že rečeno, Lagramge uporablja modelno knjižnico za sestavljanje matemati čno pravilnih modelov. Za uporabo znanja iz knjižnice uporabnik Lagramgea najprej sestavi specifikacijo sistema za modeliranje, nato pa poda še merjene podatke. Specifikacija sistema vklju čuje specifikacijo spremenljivk in ekoloških procesov, pomembnih za opazovani sistem, ki ga modeliramo. Specifikacija spremenljivk in procesov v jezeru Glumsø je podana v preglednici 1. V vrsticah od 1 do 6 so podane spremenljivke sistema, tj. deklaracija tipov spremenljivk: ns (topni anorganski dušik), ps (topni anorganski fosfor), phyto (fitoplankton, izražen kot Chl-a), zoo (zooplankton), temp (temperatura) in light (radiacija). function that takes into account model complexity and introduces preference towards simpler models. The model with the lowest error is considered as the best model for the given conceptual model (task specification) and data set. The use of Lagramge for modelling from the user’s perspective is presented through the case studies, i.e. more detailed in the case of Lake Glumsø (section 3.1). 3. APPLICATIONS OF LAGRAMGE: DOMAINS, DATA, AND EXPERIMENTS 3.1 LAKE GLUMSØ Lake Glumsø (Jørgensen et al., 1986) is situated in a sub-glacial valley in Denmark. It is a shallow lake with an average depth of about 2 m. Its surface area measures 266,000 m 2 . For several years, it was receiving waste water from a community with 3000 inhabitants, mechanically-biologically treated, i.e. without nutrients (phosphorus and nitrogen) removal. Additional load to the lake represents the lake’s watershed (10,9 km 2 ) which is mainly agricultural with almost no industry. The high nitrogen and phosphorus concentrations in the treated waste water caused hypereutrophication. The lake has no submerged vegetation, probably due to the low transparency of the water and oxygen deficit at the bottom of the lake. Our task is to find a suitable model of the lake with Lagramge. As stated previously Lagramge uses a modelling knowledge library for composing mathematically correct models. In order the knowledge to be used for building models the user needs to first specify the modelling task and next to provide data measurements. The task specification includes declaration of the variables and ecological processes relevant for the system to be modelled. The modelling task specification for Lake Glumsø is presented in Table 1. In lines from 1 to 6 the variable types are declared, i.e. ns (dissolved inorganic nitrogen), ps (dissolved inorganic phosphorus), phyto (phytoplankton, expressed as Chl-a), zoo (zooplankton), temp (temperature), and light (radiation). Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 30 Preglednica 1. Specifikacija modela za jezero Glumsø. Table 1. Modelling task specification for lake Glumsø. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: variable Inorganic ns variable Inorganic ps system variable Primary_producer phyto variable Animal zoo variable Temperature temp variable Light light process PP_growth(phyto, {ps}, {temp}, {light}) p1 process Feeds_on(zoo, {phyto}, {temp}) p3 process Respiration_PP(phyto, {temp}, {ps}, {light}) resp0 process Sedimentation(phyto, {temp}) sed0 Procesi so deklarirani v vrsticah od 7 do 10. Rast fitoplanktona je opisana v vrstici 7, kjer ime procesa PP_growth vsebuje pa štiri argumente. Prvi je ime spremenljivke stanja, ki ozna čuje fitoplankton. Argumenti v zavitih oklepajih {}, tj. {ps}, {light} in {temp}, ozna čujejo vplive na proces s strani hranil, svetlobe in temperature. Proces Feeds_on (vrstica 8) predstavlja (1) izgubo fitoplanktona zaradi pašnje zooplanktona in (2) rast zooplanktona (zoo). Izbirna argumenta tega procesa sta hrana (phyto) in temperatura (temp), kar pomeni, da na rast zooplanktona lahko (ali pa tudi ne) vplivata razpoložljiva hrana (ena ali ve č vrst fitoplanktona) in temperatura. Na podoben na čin sta definirana procesa respiracije fitoplanktona in sedimentacija (Respiration_PP in Sedimentation), vrstici 9 in 10. Glede na podano specifikacijo sistema Lagramge preiš če modelno knjižnico za ustrezne modele (formulacije) deklariranih procesov in jih ustrezno združi v model opredeljene sistemske spremenljivke, tj. model fitoplanktona. Spomnimo, da se v knjižnici nahajajo alternativni modeli posameznega procesa. Posledica tega je množica modelov fitoplanktona, ki se v naslednjem koraku kalibrirajo na podane meritve. Podatkovni niz jezera Glumsø vsebuje dnevne meritve za obdobje od aprila 1973 do aprila 1974 ter od oktobra 1974 do oktobra 1975. Merjeni so podatki, uporabljeni v specifikaciji sistema: temp, ns, ps, phyto in zoo. Cilj raziskave je bil odkriti in validirati The processes are defined in lines from 7 to 10. Phytoplankton growth is described in line 7. The process name is PP_growth and it has four arguments. The first is the name of the phytoplankton state variable. The arguments in the {} brackets, i.e. {ps}, {light} and {temp}, define the influences and limitations of the process by nutrients, light and temperature, respectively. The process Feeds_on (line 8) stands for (1) predatory loss of phytoplankton and (2) growth of zooplankton (zoo). Optional arguments of this process are food (phyto) and temperature (temp), which means that the growth of zoo can be, or not, influenced by the food (none or many species) and temperature. Similarly, the rest of the processes in the system (Respiration_PP, and Sedimentation) are defined (see lines 9 and 10). Having this specification Lagramge performs a search through the modelling knowledge library for suitable mathematical models of the specified processes and combines them in a model of the specified system variable, i.e. phytoplankton model. Note that there are alternative models for each of the processes. These result in a set of phytoplankton model structures, which are all calibrated against given measured data in the next step. The data set for Lake Glumsø includes time series of the measurements for the variables in the task specification, i.e. temp, ns, ps, phyto, and zoo. In this case we used two years of daily measurements (provided by Jørgensen, 2004) from April 1973 to April 1974. The experiment was aimed at discovering and Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 31 model za napoved koncentracije fitoplanktona. Eksperiment vklju čuje odkrivanje ena čbe na podatkih enega leta in validacijo na podatkih drugega leta (Atanasova et al., 2007). 3.2 BLEJSKO JEZERO Blejsko jezero je tipi čno subalpsko jezero ledeniško-tektonskega izvora. Zaseda površino 1,4 km 2 z maksimalno globino 30,1 m, povpre čno pa 17,9 m (Sketelj & Rejic, 1958; Rismal, 1980; Remec-Rekar, 1995). Potopljeni greben deli jezero v dve kotanji – zahodno in vzhodno. Podatki o jezeru vsebujejo meritve (od 1987 do 2002) fizikalnih, kemijskih in bioloških parametrov. Vendar pa lahko kot dosledne in ustrezne podatke za indukcijo modelov z Lagramgeom obravnavamo le tiste od leta 1995 do 2002. Pogostost meritev znaša enkrat mesečno. Vzor čevanje poteka na dveh lokacijah (vzhodna in zahodna kotanja), in sicer vsaka dva metra od površine do dna. Podatki, ki smo jih uporabili v eksperimentih, so: temperatura, svetloba, pretoki in raztopljena hranila v pritokih, iztoki iz jezera, raztopljena anorganska hranila v jezeru, tj. fosfor, nitratni dušik in silicij, skupna biomasa fitoplanktona in vrsta zooplanktona Daphnia hyalina. Izvedli smo tri eksperimente (Atanasova et al., 2006b). Najprej smo odkrivali model, ki ustrezno opisuje dinamiko fitoplanktona skozi ve č let. Identifikacija modela je bila izvršena na podatkih od 1995 do 2001 (podatki za učenje). Leto 2002 je bilo uporabljeno za validacijo. Dobljeni model fitoplanktona se je slabo prilegal meritvam. Zato smo predpostavili, da se struktura jezera preve č spreminja iz leta v leto, da bi ga lahko opisali z enim modelom. To hipotezo smo preverili v drugem eksperimentu, tj. pri identifikaciji sistema lo čeno za vsako leto. Za vsako leto smo torej dobili drug model. Vsi modeli so se dobro prilegali meritvam, razen modela za leto 1996. Za to leto smo v tretjem eksperimentu odkrivali model za opis osnovne prehranjevalne mreže, tj. fosfor–fitoplankton– zooplankton. Zaradi zahtevne nelinearne optimizacije, ki bi nastopila v tej nalogi, smo identifikacijo oziroma prostor možnih rešitev za ta model mo čno omejili (Atanasova et al., 2006b). validating a phytoplankton model. The discovery was performed on one-year data and the validation on the other (Atanasova et al., 2007). 3.2 LAKE BLED Lake Bled is a typical subalpine lake of glacial-tectonic origin. It occupies an area of 1.4 km 2 with a maximum depth of 30.1 m and an average depth of 17.9 m (Sketelj & Rejic, 1958; Rismal, 1980; Remec-Rekar, 1995). A sunken reef divides the lake into two basins – eastern and western. The data about the lake comprise long-term (from 1987 to 2002) measurements of physical, chemical and biological parameters, but only the data from 1995 to 2002 are consistent and suitable for model induction with Lagramge. The samples are taken at two deepest locations in the western and eastern basins, at every two metres from the surface to the bottom with a frequency of one month. The data used for modelling are as follows: temperature, light, inflow data (flow rates and nutrient concentrations), outflow from the lake, dissolved inorganic nutrients in the lake (phosphorus, nitrogen and silica), total phytoplankton biomass, and zooplankton species Daphnia hyalina. Three experiments were performed on Lake Bled (Atanasova et al., 2006b). The first was aimed at discovering a model that would describe the long-term phytoplankton behaviour of the lake. Data from 1995 to 2001 were used for model identification (model structure and parameters), and the data of 2002 to validate the model. Failing to get a very good fit to the long-term data, we conjectured that the lake dynamics changes from year to year. The second experiment was aimed at testing this hypothesis, so we applied Lagramge to build separate models for each year’s data. In the final experiment, we aimed at discovering a model that includes three system variables (phosphorus, phytoplankton, and zooplankton) from one year’s data (1996). Due to the complexity of space of candidate models and limited computational resources, a strong limitation and control of the search space was applied (Atanasova et al., 2006b). Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 32 3.3 JEZERO KASUMIGAURA Jezero Kasumigaura je plitvo jezero na Japonskem s povpre čno globino 4 m. Prostornina jezera znaša 662 milijonov m 3 , površina pa 220 km 2 . Nahaja se v hiperevtrofnem stanju, ki ga najpogosteje povzro ča cvetenje modro-zelenih alg poleti in spomladi z visokimi koncentracijami vrst Microcystis in Oscillatoria. Podatki o jezeru vsebujejo meritve od 1986 do 1992. Merjeni podatki so temperatura vode, globalna radiacija, raztopljeni fosfor, nitrat, amonij, silicij, skupni fitoplankton, merjen kot klorofil-a (chl-a), vrste fitoplanktona, tj. Microcystis, Oscillatoria, Scenedesmus in Synedra rumpens in vrsta zooplanktona Cladocera. Podatki so bili uporabljeni kot linearno interpolirane vrednosti med dejanskimi meritvami. Domnevamo, da je bila pogostost meritev enkrat mese čno. Zooplankton je merjen le do leta 1989. Z uporabo podatkov in ekspertnim znanjem pri postopku odkrivanja ena čb smo izvedli naslednje eksperimente za odkrivanje modela za simulacijo Chl-a (Atanasova et al., 2006c): - Identifikacija dinamike fitoplanktona za vsako leto z odkrivanjem modelov na podatkih, pripravljenih lo čeno za vsako leto. Pri tem eksperimentu smo želeli preveriti, ali se dinamika iz leta v leto ponavlja. Ali potrebujemo razli čne strukture modelov za vsako leto (od 1986 do 1992) ali pa zgolj razli čne parametre in enako strukturo? Vsakega izmed odkritih modelov, nau čenega na podatkih enega leta, smo validirali na preostala leta. S tem smo preverili, ali morda obstaja generi čni model. Vpliv zooplanktona v tem eksperimentu ni bil upoštevan. - Odkrivanje modela za Chl-a z u čenjem iz celotnega podatkovnega niza, oziroma na podatkih od 1986 do 1991, leto 1992 smo uporabili za validacijo. S tem eksperimentom smo odgovorili na slede ča vprašanja: Ali dolžina u čnega niza vpliva na tovrstno modeliranje in kako? Ali je bolje najti eno reprezentativno leto za učenje ali pa se u čiti na celotnem podatkovnem nizu, čeprav lahko vsebuje veliko šuma? Vpliv zooplanktona ni bil 3.3 LAKE KASUMIGAURA Lake Kasumigaura is a shallow lake in Japan with an average depth of 4 m. It has a volume of 662 million m 3 and a surface area of 220 km 2 . The hypereutrophic state of the lake causes blue-green algal blooms in summer and autumn with frequently high abundances of Microcystis and Oscillatoria. The lake’s dataset comprises measurements from 1986 to 1992. The following data are measured: water temperature, global radiation, dissolved inorganic phosphorus, total phytoplankton, measured as chlorophyll-a (chl-a), phytoplankton species (Microcystis, Oscillatoria, Scenedesmus and Snyedra rumpens), and zooplankton species, Cladocera. The measurements were used as interpolated values between the actual measured values. The actual frequency of the measurements is monthly. Zooplankton was measured only until 1989. Using the data and introducing the expert knowledge in the procedure of model discovery, the following experiments for discovery of Chl-a model were designed and conducted (Atanasova et al., 2006c): - Discover chl-a model for each year separately. This experiment focused on the question whether it is possible to find a generic model structure for all years from 1986 to 1992 and just optimise the parameter values for each year or if specific model structures for each year were required. We tested each year- specific model on the remaining years in order to find out whether there is a generic model for all the years measured. Algal grazing by zooplankton was not included in this experiment as zooplankton data were only available for the years 1986 to 1989. - Discover one chl-a model by learning on the entire data set, i.e. from 1986 to 1991, while 1992 was used for validation. This experiment focused on the question whether and how the length of the training data set influences the model. Is it better to learn models on one-year data or to use the entire data set, although they tend to contain a lot of noise? Algal grazing by zooplankton was not included in this experiment as zooplankton data were only Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 33 upoštevan. - V tretjem eksperimentu smo vklju čili pašnjo zooplanktona na fitoplankton, pri odkrivanju modela za fitoplankton (Chl-a). Za u čenje smo uporabili podatkovni niz od 1986 do 1988, medtem ko smo model validirali na podatkih iz leta 1989. 3.4 GREIFENSEE Jezero Greifensee se nahaja v Švici. Osnovne zna čilnosti jezera so: maksimalna globina jezera znaša 32 m, povpre čna globina je 18 m, površina jezera meri 8,5 km 2 , prostornina pa 148 milijonov m 3 . Prispevno obmo čje jezera meri 163 km 2 . V 60. letih prejšnjega stoletja je bilo jezero v evtrofnem stanju s povpre čno koncentracijo fosfatov prek 500 mg/m 3 . Kasneje, v 70. letih prejšnjega stoletja, po sprejetju dolo čenih ukrepov za izboljšanje kakovosti vode, si je jezero nekoliko opomoglo, a je še vedno v evtrofnem stanju. Podatkovni niz jezera vsebuje časovno serijo v obdobju 1988 do 1999. Merjeni so vhodni podatki v jezero ter kemijski, fizikalni in biološki parametri v samem jezeru. Vhodni podatki vklju čujejo dnevne meritve dveh pritokov, Aabach-Mönchaltdorf in Aabach-Niederuster. Merjeno je sledeče: pretoki, temperatura, pH, kisik, anorganske oblike dušika in fosforja ter skupni dušik in fosfor. Meteorološki podatki vklju čujejo dnevne meritve globalnega son čnega obsevanja (vir: Swiss Meteorological Institute – MeteoSchweiz). Kemijski in fizikalni parametri kvalitete vode jezera so merjeni enkrat na mesec, medtem ko so biološki parametri merjeni z mese čno do tedensko pogostostjo (vir: Swiss Federal Institute of Aquatic Science and Technology – Eawag). Cilj eksperimentov je bil gradnja oziroma odkrivanje enostavnega ekološkega modela za napoved relevantnih spremenljivk stanja, ki opisujejo trofi čno stanje jezera, to so koncentracije raztopljenega fosforja in klorofila-a (Atanasova et al., 2005). available for the years 1986 to 1989. - In this experiment we included algal grazing by zooplankton, while discovering a model for phytoplankton (Chl-a). For learning we used the data from the period from 1986 to 1988. 1989 was used for validation. 3.4 GREIFENSEE Greifensee is located in Switzerland with a watershed area of 163 km 2 , and maximal and average depths of 32 m and 18 m, respectively. The surface area of the lake measures 8.5 km 2 , while the volume is 148 million m 3 . In the 1960s the lake was highly eutrophic with average phosphate concentrations of over 500 mg/m 3 . The lake begun to recover around the 1970s after some measures were taken to improve the water quality (but it is still eutrophic). The lake’s data set comprises a time series data for the period from 1988 to 1999 of the input to the lake, and chemical physical, and biological data in the lake. The input data include daily measurements of two river inflows, i.e. Aabach-Mönchaltdorf and Aabach-Niederuster. The measurements include the flow rates, temperature, pH, Oxygen, inorganic forms of nitrogen and phosphorus, and total nitrogen and phosphorous. Meteorological data include global solar radiation obtained from the Swiss Meteorological Institute (MeteoSchweiz). Chemical and Physical variables in the lake include monthly measurements, whereas biological data in the lake are monthly to weekly measurements (obtained from the Swiss Federal Institute of Aquatic Science and Technology – Eawag). The experiments were aimed at discovering a simple lake model for prediction of the relevant state variables that describe the trophic state of the lake, i.e. phosphorus and chlorophyll_a concentrations (Atanasova et al., 2005). Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 34 4. EV ALV ACIJA EKSPERIMENTOV Vrednotenje opisanih eksperimentov je povzeto v preglednici 2. Eksperimenti so prikazani z zna čilnostmi domene, razpoložljivimi podatkovnimi nizi, uporabnostjo in tipom modela, ki se je gradil za to domeno, in u činkovitostjo zgrajenih modelov. Vsi eksperimenti so bili izvedeni tako, da je bila omogo čena validacija modelov, kar pomeni, da je bil en del vsakega podatkovnega niza uporabljen za učenje (tj. kalibracijo, optimizacijo) modelov, drugi pa za validacijo nau čenih modelov. Modeli vsakega eksperimenta so bili ovrednoteni po dveh kriterijih za uspešnost. Prvih deset najuspešnejših modelov smo dolo čili glede na napaki, vklju čeni v orodju Lagramge MSE in MDL (glej poglavje 2) (najboljše prileganje k meritvam). Najboljšega izmed teh deset modelov smo izbrali po simulaciji in ekspertovem vrednotenju simulacije. Opozoriti velja, da model z najmanjšo napako (MSE ali MDL) glede na meritve ni nujno najboljši model glede na ekspertovo oceno. Poglejmo si najboljši modela jezera Glumsø (ena čba (1)). Razvidno je, da je ta ena čba popolnoma v skladu z ekspertnim znanjem o modeliranju dinamike fitoplanktona, kljub temu da je bila odkrita z orodjem za strojno učenje Lagramge. Prikazani model fitoplanktona za jezero Glumsø vsebuje znane formulacije ekoloških procesov. Prvi člen enačbe predstavlja rast fitoplanktona, ki jo omejujejo hranila (Monodov model), temperatura (linearni vpliv) in svetloba (zvon časta krivulja). Slede či členi opisujejo: respiracijo (kinetika 1. reda), sedimentacijo (reakcija 1. reda z vplivom temperature) ter pašnjo zooplanktona (zadnji člen v ena čbi). Simulacija modela je prikazana na sliki 3. Na levi strani je prikazana simulacija na u čnih podatkih, tj. podatkih, na katerih je model skalibriran, na desni pa validacija na testnih podatkih (od aprila 1973 do aprila 1974). 4. EV ALUATION OF THE EXPERIMENTS Evaluation of all described experiments is summarized in Table 2. The experiments are presented with domain characteristics, data sets available, applicability and type of the model that was built for that domain, and model performance. All experiments were constructed in a way to enable validation of the models, meaning that one portion of each data set was used for training (calibrating, optimizing) models, and another portion was left out for validation. The models from each experiment were evaluated by two error measures. To rank the first ten models (best fitted to the measurements) the error measure included in Lagramge, i.e. MSE and MDL (see section 2), was used. These best models are then simulated and the simulation is evaluated according to the visual perception of the expert. Note that models with lowest MSE are not necessarily the best according to the domain experts. Let us here present the best model for Lake Glumsø (equation (1)). It is evident that this model is in-line with the expert knowledge about modelling of phytoplankton dynamics, in spite the fact that it was discovered with a ML tool (Lagramge). The phytoplankton model is composed of known formulations for the ecological processes. The first term in the equation represents the phytoplankton growth, limited by the nutrients (Monod model), temperature (linear function), and light (photoinhibition curve). The following terms in the model stand for respiration (first order kinetics), sedimentation, which is influenced by temperature, and grazing by zooplankton. Model performance is shown on the left hand side of Figure 3. The data set measured from April 1973 to April 1974 was used for model validation on testing data set. This is shown on the right hand side of Figure 3. Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 35 (1 ) 2 116.6 116.6 2 2 (1 9 ) 2 0.36 4 0.18 0.01 0.0012 15.4 2 18 4 0.14 1.13 0.007 0.44 light temp dphyto ps temp temp phyto light e phyto phyto dt ps phyto zoo phyto phyto + − − − =⋅⋅ ⋅⋅⋅−⋅−⋅⋅ +− −⋅⋅ ⋅ ⋅ ⋅ + (1) Oct74 Jan75 Apr75 Jul75 Oct75 Jan76 0 0.1 0.2 0.3 0.4 0.5 Phytoplankton concentration [mgChla/l] model measured Apr73 Jul73 Oct73 Jan74 Apr74 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Phytoplankton concentration [mgChla/l] model measured Slika 3. Simulacija modela fitoplanktona za jezero Glumsø. Levo: simulacija na u čnih podatkih oziroma podatkih, na katerih je model kalibriran. Desno: validacija modela na testnih podatkih. Figure 3. Phytoplankton model performance on Lake Glumsø data. Left: performance on the training set. Right: validation on unseen data. Vsak model smo ocenili z dvema ocenama, tj. z oceno natan čnosti simulacije na u čnem nizu in testnem nizu. Uporabljene ocene so: slabo (zelo slabo prileganje simulacije k meritvam), zadovoljivo (sprejemljivo prileganje k meritvam), dobro in zelo dobro. Na osnovi teh ocen pridemo do nekaj ugotovitev. Verjetno najbolj o čitna je povezava med natan čnostjo modelov in kvaliteto podatkov. Vzemimo za primer model jezera Glumsø, kjer podatkovni niz obsega dnevne meritve vseh parametrov. Model dosega veliko natan čnost tako na u čnem nizu podatkov kot na testnem (oziroma pri validaciji). Nadaljnje ugotovitve evalvacije aplikacij so podane v naslednjem poglavju. There are two evaluation marks for each experiment model: performance on training data and performance on validation data. The marks used here are: poor (very bad or no fit to the measurements), fairly good (acceptable fit to the measurements), good (good fit to the measurements), and very good (nearly perfect fit to the measurements). Based on these marks several conclusions can be drawn. Probably the most obvious one is the connection between the goodness of the models and the quality of the domain data. In the case of Lake Glumsø a data set of frequent (daily) measurements was available, which lead to successful model induction and validation on unseen data. The evaluation results are further discussed in the following section. © Acta hydrotechnica 24/40 (2006), Ljubljana ISSN 1581-0267 Preglednica 2. Evalvacija aplikacij Lagramgea. Oznake pomenijo slede če: temp (temperatura vode), light (intenziteta svetlobe oz. radiacije), ps (anorganski raztopljeni fosfor), no (nitrat), zoo (koncentracija zooplanktona), phyto (koncentracija fitoplanktona), chl-a (koncentracija klorofila- a), nutrients inflows (dotoki hranil v jezero, tj. vrednosti pretokov in koncentracij). 36 Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 37 Table 2. Evaluation of Lagramge’s applications. Marks: temp (water temperature), light (light intensity), ps (inorganic dissolved phosphorus), no (nitrate), zoo (zooplankton concentration), phyto (phytoplankton concentration), chl-a (chlorophill-a), nutrients inflows (inflows of nutrients into the lake, i.e. flow rates and concentrations). © Acta hydrotechnica 24/40 (2006), Ljubljana ISSN 1581-0267 38 5. RAZPRA V A V tem prispevku evalviramo metodo za avtomatizirano modeliranje jezer, ki predstavlja enotni okvir za pregledovanje ekoloških modelov kot tudi za njihovo avtomatsko odkrivanje (indukcijo) iz merjenih podatkov. To je omogo čeno z zapisom obstoje čega znanja o modeliranju v knjižnico generi čnih spremenljivk, konstant in procesov. Po specifikaciji opazovanega ekosistema Lagramge z uporabo znanja iz knjižnice transformira to specifikacijo v možne modelne strukture, ki ta sistem opisujejo. V naslednjem koraku Lagramge optimizira te strukture tako, da jih umeri na podane meritve. O čitno je torej, da za razliko od ostalih orodij ML, kjer je kvaliteta modela ve činoma odvisna od podatkov, ki so uporabljeni za gradnjo modela, v primeru Lagramgea na kvaliteto modelov vpliva ve č faktorjev: (1) domensko znanje, vsebovano v knjižnici, (2) kvaliteta in kvantiteta opazovanj sistema, (3) kompleksnost ekosistema in (4) specifikacija ekosistema v procesu indukcije modela. Zato lahko re čemo, da je Lagramge ustrezen ne samo za indukcijo modelov iz podatkov, temveč tudi za odkrivanje/potrditev specifi čnih znanstvenih dognanj v opazovanih domenah. Znova moramo opozoriti, da so modeli, odkriti z Lagramgeom, popolnoma v skladu s teoreti čnim znanjem oziroma so matemati čno pravilni. Dosedanje aplikacije Lagramgea vklju čujejo indukcijo modelov v domeni modeliranja jezer. Nanašajo se na dve poglavitni nalogi: odkrivanje modela za dinamiko fitoplanktona in odkrivanje modela prehranjevalne mreže, hranilo–fitoplankton– zooplankton. Vpliv kvalitete podatkov. Kot velja za ostala orodja ML (in tudi ostala orodja za za modeliranje), je tudi uspeh Lagramgea mo čno odvisen od razpoložljivih podatkov (kvaliteta in kvantiteta). To se je pokazalo v prakti čno vseh aplikacijah Lagramgea. Jezero Glumsø, ki razpolaga z dnevno merjenimi podatki, je bilo identificirano z modelom, ki je najuspešneje prestal validacijo, tj. simulacijo na nevidenih podatkih. Res pa je tudi, da je podatkovni niz razmeroma kratek in je za potrditev ocene modela potreben dodatni niz meritev za validacijo. Od preostalih 5. DISCUSSION In this paper we evaluate an approach to AM of lakes as a solid unifying framework for both handcrafting ecological models as well as their automated induction from measured data. This is enabled by encoding the existing modelling knowledge into a library of generic variables, constants, and processes. Given a specification of an observed system, Lagramge (by using the knowledge from the library) transforms the task specification into specific model structures for the observed system. The structures are later optimized (according to given measurements of the system variables). Unlike the other ML tools, where the model quality mostly depends on the data used for model building, in the case of Lagramge there are more issues that influence the resulting models: (1) the domain knowledge encoded in the library, (2) quality and quantity of the observed data (3) complexity of the ecosystem, and (4) expert knowledge introduced in the procedure of model induction, i.e. modelling task specification. It is why Lagramge can be used not only for model induction from data, but rather for some specific scientific discoveries in the domains of interest. Note here that models discovered by Lagramge completely follow the background theoretical knowledge, i.e. they are mathematically correct. So far applications of automated modelling tool Lagramge include model discovery in the domain of lake modelling. More specifically, they include two major modelling tasks, (1) discovery of phytoplankton model and (2) discovery of food web models, i.e. nutrient–phytoplankton– zooplankton. Influence of data quality. As it is the case with any other modelling method (not just ML), the success of Lagramge is also dependant on the data quality and quantity. This is confirmed in all applications of Lagramge. Lake Glumsø, with a data set with daily measurements, was most successfully identified in terms of validation on unseen data. However the data set is rather short and further validation is necessary to confirm the model validity. Of the other cases discussed here, lake Greifensee possesses most complete and frequently measured data. The result is a Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 39 obravnavanih domen poseduje jezero Greifensee najbolj popolne in najpogosteje merjene podatke. Tu je Lagramge induciral model, ki smo ga uspešno validirali na daljši podatkovni seriji, za obdobje štirih let. Poleg tega pa vsebuje omenjeni model tri ena čbe, kar predstavlja mnogo težjo nalogo kot pri ostalih eksperimentih, pri katerih smo inducirali modele z eno ena čbo. Validacija modela jezera Greifensee je pokazala dobro prileganje simuliranih podatkov k meritvam kot tudi dolgoro čno stabilnost modela, ki je zelo pomembna za tako kompleksno domeno. Vpliv kompleksnosti ekosistema. Znano je, da se naravni ekosistemi prilagajajo spremembam v okolju oziroma spreminjajo svojo strukturo s časom (Jørgensen, 1999). Jezera imajo ponavljajo če letne vzorce obnašanja (npr. cvetenje alg spomladi) in je zato pri čakovati, da bo dolo čena struktura modela opisala letno obnašanje ekosistema. Vendar pa natan čnejši vpogled pokaže, da se vzorci nekaterih jezer ne ponavljajo do te mere, da lahko sistem zadovoljivo opišemo z eno strukturo modela skozi daljše obdobje. Npr. cvetenje alg se lahko pojavlja vsako leto v približno istem obdobju, povzro ča ga pa vsaki č druga vrsta alg. To potrjujejo eksperimenti na Blejskem jezeru in jezeru Kasumigaura. V primeru Blejskega jezera Lagramge ni uspel najti dobrega modela fitoplanktona za simulacijo skozi daljše obdobje (pet let). Primerjava simuliranih in merjenih vrednosti koncentracije fitoplanktona je sicer nakazovala delno ujemanje (model je pokazal pravilne letne trende obnašanja), a je bil kljub temu precej nenatan čen (Atanasova et al., 2006b). Zato smo v naslednjem eksperimentu odkrili letne modele. Za vsako leto je Lagramge odkril model, ki se zelo dobro prilega meritvam. Vendar pa se modeli med seboj razlikujejo tako po strukturi kot po vrednosti parametrov. Prav tako je bila validacija modelov na nevidenih podatkih neuspešna. Strukturna dinamika Blejskega jezera se je pokazala tudi v tretjem eksperimentu, ko je Lagramge odkril model treh ena čb na podatkih iz 1996. Kljub razmeroma natan čni simulaciji na podatkih istega leta je bila validacija modela na preostanek podatkovnega niza neuspešna. Podobne rezultate lahko opazujemo pri jezeru Kasumigaura, kjer je bilo odkrivanje ‘letnih’ successfully identified and validated model on longer period data (four years). Moreover, Lagramge managed to successfully discover a model of three equations, which is a much more difficult modelling task compared to the rest of the experiments, where models of only one equation were discovered successfully. The validation shows good model behaviour in terms of degree of fit to the measurements as well as long term stability. The stability is especially important due to the complexity of the observed system. The influence of ecosystem’s complexity. It is well known that natural ecosystems tend to adapt to the changes in the environment, thus they change their structure in time (Jørgensen, 1999). Typically lakes have repeating yearly patterns (e.g. algal blooms in spring), and it is thus reasonable to expect that a single model structure can describe a year-to-year behaviour of the ecosystem. However, taking a closer look will show that the patterns in some ecosystems do not repeat to such extent, where we can satisfactorily apply a single model structure for describing long-term behaviour. For example, the algal blooms may happen at the same time as in previous years, but not necessarily by the same algae species. The experiments on lakes Bled and Kasumigaura confirm this. In the case of Lake Bled Lagramge failed to discover a long-term phytoplankton model of satisfactory accuracy. Comparing the simulations to the measured data, the model showed correct annual trends, but the accuracy (the amplitudes) was very low (Atanasova et al., 2006b). In contrast, the models discovered by fitting structures on yearly data showed nearly perfect fit. However, comparing their structures, they all differ between each other (in structures as well as parameters). The structures could not be satisfactorily validated on unseen data. The dynamicity of the ecosystem was also shown in the third experiment, where a food-web model was discovered on the data from 1996. The model performed well on the training data (1996) but not on the rest of the data set. Similar results were obtained on lake Kasumigaura, where discovering yearly models was very successful. However, model validation was also more satisfactorily performed compared to Lake Bled. Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 40 modelov zelo uspešno. Za razliko od Blejskega jezera je bila tu validacija nekaterih modelov uspešnejša. Vpliv ekspertnega znanja. Z vnosom ekspertnega znanja v specifikacijo opazovanega sistema dolo čamo vplive oziroma procese, ki po našem mnenju vplivajo (ali bi lahko vplivali) na sistemske spremenljivke. Dodati ali ‘pozabiti’ na dolo čen proces lahko bistveno vpliva na samo kakovost in strukturo modelov. V primeru jezera Kasumigaura smo odkrivali dva tipa modelov za skupni fitoplankton. Prvi je izklju čeval vpliv zooplanktona, drugi pa vseboval še ta vpliv, tj. proces pašnje zooplanktona na fitoplanktonu. V prvem primeru je odkriti model pokazal, da imajo hranila zanemarljiv vpliv na rast fitoplanktona. Ko smo v postopek odkrivanja modela vklju čili še proces pašnje, pa se je izkazalo, da imajo nekatera hranila ve čji vpliv. Torej se je struktura modela spremenila. Omejitve pri uporabi Lagramgea za indukcijo modelov: glavna omejitev metode je njena ra čunska zahtevnost. Naj spomnimo, da pri vsakem poganjanju Lagramge izvaja nelinearno optimizacijo velikega števila modelnih struktur (odvisno od same specifikacije problema). To lahko vpliva na uporabnika tako, da daje prednost enostavnejšim specifikacijam sistema v izogib dolgotrajnemu iskanju kompleksnih modelov. Vsi obravnavani eksperimenti, razen jezera Greifensee in delno Blejskega jezera, so omejeni na odkrivanje enostavnega modela ene ena čbe. Vendar se metoda razvija v smeri izkoriščanja dodatnih ra čunskih mo či, kot je na primer paralelno procesiranje. Tako lahko v bodoče realno pri čakujemo, da bo ta ovira vsaj delno odpravljena. Druga omejitev, ki jo je treba omeniti, je ta, da ve č ko vnašamo domenskega znanja (v smislu alternativnih formulacij generi čnih procesov), ve čji je iskalni prostor, tj. število možnih struktur modela. To otežuje odkrivanje ‘pravega’ modela, saj so v tem primeru potrebni ve čji podatkovni nizi za identifikacijo modela izmed vseh možnih struktur. Torej je treba za identifikacijo najbolj relevantnih procesov v modelu sistema specifikacijo sistema prilagoditi realni situaciji, tj. našemu znanju in razpoložljivim meritvam. Influence of the expert knowledge. By introducing the expert knowledge in the modelling task we specify the influences and the processes that in expert’s opinion take place (are the most important) in the observed system. Adding or ‘forgetting’ a specific process can greatly influence the structure and the success of the resulting models. In the case of lake Kasumigaura a phytoplankton model was identified first without the influence of the grazing process by zooplankton, and next by taking this influence into account. In the first case the models showed a negligible influence of nutrients on phytoplankton growth, whereas in the second, this influence was well captured in the equations. Thus, the structure of the ecosystems model has changed based on how we specify our background knowledge. Limitations in using Lagramge for model induction: The main constraint is the computational intensity of the method. Recall that Lagramge performs non-linear optimization of tens or hundreds (depending on the modelling task specification) model structures per one run. This can force the user to make more simple tasks and to avoid searching for complex model structures in the observed domain. This is also evident from the presented applications, where except for Greifensee and partly Lake Bled they were all limited to discovery of single phytoplankton equation. However, the method is developing towards exploitation of more computational power, i.e. parallel computing, and thus it is reasonable to expect that this obstacle will be at least partly solved. Another limitation we should mention here is that the more background knowledge (in terms of alternative formulations of generic processes), the larger the search space, i.e., number of generated models, is. This, in turn, makes a discovery of the “right” model more difficult, in the sense that larger quantities of good quality data are needed to identify it from the remaining candidate models. Thus, one should be careful and design the task specification (and to an extent the knowledge library) so as to identify the most relevant processes and formulations for the task at hand. Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 41 6. ZAKLJU ČEK Metoda za AM, Lagramge, ki uporablja hibridni pristop k modeliranju, tj. u čenje iz meritev in ekspertnega znanja, je bila uporabljena na realnih podatkih jezer. Aplikacije zajemajo gradnjo modelov za (1) identifikacijo sistema, (2) napoved evtrofikacije in (3) odkrivanje ali potrditev dinamičnih struktur ekosistemov. Za vsako domeno je bilo izvedenih nekaj eksperimentov. Vsi eksperimenti so bili evalvirani in s tem tudi sama uporabnost metode v ekološkem modeliranju. Kljub nekaterim omejitvam, kot je velika ra čunska zahtevnost, se metoda lahko uspešno uporablja za identifikacijo kompleksnih domen. Uporabna je ne le za gradnjo modelov, temve č tudi za odkrivanje in potrditev drugih znanstvenih dognanj, kot je identifikacija dinamičnih vzorcev obnašanja opazovanih ekosistemov, tj. strukturne dinamike sistemov. Omejitve metode, omenjene v prejšnjem poglavju, narekujejo nadaljnji razvoj Lagramgea. Eden izmed glavnih ciljev nadaljnjega razvoja je omogo čiti odkrivanje strukturno dinami čnih modelov in boljšo identifikacijo generi čnih modelov. Razvoj grafi čnega vmesnika je prav tako pomembno nadaljnje delo. Grafični vmesnik bi bistveno prispeval k popularizaciji oziroma širši uporabi metode med eksperti. 6. CONCLUSIONS AM method (Lagramge), which uses hybrid approach to modelling, i.e. learning from data and expert knowledge, was applied to several real world lakes. The applications comprise models construction for (1) system identification, (2) predictions of eutrophication, and (3) discovery or confirmation of dynamic ecosystems structures. Thus, several experiments with Lagramge were conducted to each domain. All experiments were evaluated to investigate the applicability of the method in ecological modelling. Although faced with some constraints, such as the requirement for intensive computational power, the method can be successfully used in complex domains. It can be used successfully for model discovery as well as for other scientific discoveries, such as identifying dynamic patterns in the observed system, i.e. dynamic structure of the ecosystem. The limitations listed in the previous section direct the focus on the further development of Lagramge. One of the major goals is enabling the discovery of structurally dynamic models and easier identification of generic models. In order to spread this approach of automated modelling among experts further work is aimed to developing a graphical user interface for model construction. VIRI – REFERENCES Atanasova, N., Mieleitner, J., Džeroski, S., Todorovski, L., Kompare, B. (2005). Development of a lake model using data and expert knowledge – case study : Greifensee. In: Marki č, O. (ur.), Gams, M. (ed.), Kordeš, U. (ed.), Heri čko, M. (ur.), Mladeni ć, D. (ur.), Grobelnik, M. (ur.), Rozman, I. (ur.), Rajkovi č, V. (ur.), Urbanči č, T. (ur.), Bernik, M. (ur.), Bohanec, M. (ur.). Zbornik 8. mednarodne multikonference Informacijska družba IS 2005, 11. do 17. oktober 2005, (Informacijska družba). Ljubljana: Institut "Jožef Stefan", 216–219. Atanasova, N., Todorovski, L., Džeroski, S., Kompare, B. (2006a). Constructing a library of domain knowledge for automated modelling of aquatic ecosystems. Ecological Modeling 194(1– 3), 14–36. Atanasova, N., Todorovski, L., Džeroski, S., Remec-Rekar, Š., Recknagel, F., Kompare, B. (2006b). Automated modelling of a food web in lake Bled using measured data and a library of domain knowledge. Ecological Modeling 194(1–3), 37–48. Atanasova, N., Todorovski, L., Džeroski, S., Recknagel, F., Kompare, B. (2006c). Computational Assemblage of Ordinary Differential Equations for Chlorophyll-a Using a Lake Process Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 42 Equation Library and Measured Data of Lake Kasumigaura. In: Recknagel, F. (Ed.): Ecological Informatics, 2nd ed., Springer-Verlag. Berlin, New York, 1–485. Atanasova, N., Todorovski, L., Džeroski, S., Kompare, B. (2007). Application of automated model discovery from data and expert knowledge to a real-world domain: Lake Glumsø. Ecol. Modeling 2008, vol. 212, no. 1/2, str. 92-98. Benz, J., Hoch, R. (1997). Ein Modelldokumentationssystem. ASIM Simulationstechnik, 11. Symposium in Dortmund, 232 p. Benz, J., Knorrenschild, M. (1997). Call for a common model documentation etiquette. Ecological Modeling 97(1–2), 141–143. Benz, J., Voigt, K. (1996). Aufbau eines Systems zur strukturierten Suche von Informationsquellen für den Umweltschutz im Internet. Informatik für den Umweltschutz, 10. Symposium, 241 p. Benz, J., Hoch, R., Legovic, T. (2001). ECOBAS – modelling and documentation. Ecological Modelling 138(1–3), 3–15. Chapra, S. C. (1997): Surface Water-Quality Modeling. McGraw-Hill, New York, N.Y., 1997. DeAngelis, D. L. (1992). Dynamics of Nutrient Cycling and Food Webs. Chapmann & Hall, 2-6 Boundary Row, London SE1 8HN, UK. Džeroski, S., Todorovski, L. (2003). Learning population dynamics models from data and domain knowledge. Ecological Modelling 170, 2–3, 129–140. Hoch, R., Gabele, T., Benz, J. (1998). Towards a standard for documentation of mathematical models in ecology. Ecological Modelling 113, 1–3, 3–12. Jørgensen, S. E., Bendoricchio, G. (2001). Fundamentals of Ecological Modelling. Elsevier Science B.V. Ltd, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK. Jørgensen, S. E. (1999). State-of-the-art of ecological modelling with emphasis on development of structural dynamic models. Ecological Modelling 120(2–3), 75–96. Kompare, B. (1995). The Use of Artificial Intelligence in Ecological Modelling: Ljubljana, FGG; Royal Danish School of Pharmacy, FGG, Ljubljana; Royal Danish School of Pharmacy, Copenhagen, Ljubljana, Copenhagen. Kompare, B., Todorovski, L., Džerovski, S. (2001). Modelling and prediction of phytoplankton growth with equation discovery: case study – Lake Glumsø, Denmark. Verh. Internat. Verein. Limnol. 27, 3626–3631. Oreskes, N., Shrader-Frechette, K., Belitz, K. (1994). Verification, validation, and confirmation of numerical models in the earth sciences. Science 263, 641–646. Rismal, M. (1980). Presoja posameznih metod za sanacijo Blejskega jezera (Assessment of different methods for Lake Bled remediation). Gradbeni vestnik 29(2–3), 34–46 (in Slovenian). Remec-Rekar, S. (1995). Življenjska strategija in absorbcija fosforja pri nekaterih fitoplanktonskih vrstah Blejskega jezera-123 (Life strategy and phosphorus absorption in some phytoplankton species in Lake Bled), University of Ljubljana, Ljubljana (in Slovenian). Rykiel, E. J., Jr. (1996). Testing ecological models: the meaning of validation. Ecological Modeling 90, 229–244. Sketelj, J., Rejic, M. (1958). Preliminary account on the examination of Lake Bled. Gradbeni vestnik 61–64. Todorovski, L. (2003). Using Domain Knowledge for Automated Modeling of Dynamic Systems with Equation Discovery: Fakulteta za ra čunalništvo in informatiko, University of Ljubljana, Ljubljana, Slovenia. Todorovski, L., Džeroski, S., Kompare, B. (1998). Modelling and prediction of phytoplankton growth with equation discovery. Ecological Modelling 113(1–3), 71–81. Atanasova, N.: Avtomatizirano modeliranje jezer z uporabo podatkov in ekspertnega znanja: evalvacija aplikacij – Automated Modelling of Lakes from Data and Expert Knowledge: Evaluation of Applications © Acta hydrotechnica 24/40 (2006), 21–43, Ljubljana 43 Naslov avtorice – Author's Address asist. dr. Nataša Atanasova Univerza v Ljubljani – University of Ljubljana Fakulteta za gradbeništvo in geodezijo – Faculty of Civil and Geodetic Engineering Jamova 2, SI-1000 Ljubljana, Slovenia E-mail: natanaso@fgg.uni-lj.si