Proceedings of the 2013 Mini-Conference -13 on Applied Theoretical Computer Science matcos University of Primorska Press MATCOS-13 Preface Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Dear reader, In the cooperation of the University of Primorska and Edited by the University of Szeged we have decided to organize Miklos Kresz and Andrej Brodnik a conference in 2010, which reflects both the theo- Reviewers and Programme Committee retical and applied aspects of computer science. The Andrej Brodnik ■ Ljubljana, Koper, Slovenia, co-chair Mini-Conference on Applied Theoretical Computer Gabor Galambos ■ Szeged, Hungary Science (MATCOS - 10) was a two-day event in the Gabriel Istrate ■ Timisoara, Romania frame of which a student session was organized. This Hans Kellerer ■ Graz, Austria occasion gave the opportunity for distinguished Master Miklos Kresz ■ Szeged, Hungary students and early stage PhD students to present their Silvano Martello ■ Bologna, Italy preliminary results and ongoing research. It turned out Andras Recski ■ Budapest, Hungary Gerhard Reinelt that our initiative was very successful; the full papers ■ Heildeberg, Germany Jiri Sgall of 9 student talks were honoured to contribute to the ■ Prague, Czech Republic Borut Žalik ■ Maribor, Slovenia post-proceedings of this event. Steering Committee Continuing the tradition, the Middle-European Confer- Andrej Brodnik ■ Ljubljana, Koper, Slovenia ence on Applied Theoretical Computer Science (MAT- Gabor Galambos ■ Szeged, Hungary COS-13) also secured the place for high-level student Miklos Kresz ■ Szeged, Hungary research. This post-proceeding is devoted to publish Gerhard Reinelt ■ Heildeberg, Germany the carefully reviewed full papers presented at MAT- Organizing Committee COS-13 held on October 10th and 11th, 2013 in Koper, Janez Žibert, chair Slovenia. The topics of the contributions cover a wide Tine Šukljan range of theory and application, reflecting the scope Marko Grgurovič of the conference: algorithmic solutions for life science Published by problems, artificial intelligence methods for games and University of Primorska Press computer vision, compiler construction, advanced data Titov trg 4, si-6000 Koper structures, application-oriented scheduling and numer- ical simulation. All papers presented promising results, Editor-in-Chief proving the excellent research attitude of the contribu- Jonatan Vinkler tors with showing the potential for a scientific career. Managing Editor Alen Ježovnik The high standard of the presentations in the regular session convinced us to collect selected papers for a Koper, 2016 special issue in the international journal Informatica, isbn 978-961-6984-20-1 (pdf) which was published as number 3 in volume 39, 2015. www.hippocampus.si/isbn/978-961-6984-20-1.pdf The success both in organizing the meeting and in isbn 978-961-6984-21-8 (html) publishing the best results of the conference motivate www.hippocampus.si/isbn/978-961-6984-21-8/index.html us to make this event a regular meeting. Our plan is to © 2016 Založba Univerze na Primorskem organize the next conference in October 2016 in Koper with involving again regular and student sessions. MATCOS-13 was smoothly organized, we are very grateful to Janez Žibert, chair of Organizing Committee and to “his team”. Special thanks to Professor Silva- no Martello for accepting our invitation as a keynote speaker and giving his nice talk. The success of the conference would have not been possible without the aid of programme committee, while the professional CIP - Kataložni zapis o publikaciji work provided by the University Primorska Press in the Narodna in univerzitetna knjižnica, Ljubljana publishing process was also indispensable. Thank you for all their support. 004(082)(0.034.2) Miklós Krész, chair of student conference MINI-Conference on Applied Theoretical Computer Science (2013 ; Koper) MATCOS-13 [Elektronski vir] : proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science / edited by Miklos Kresz and Andrej Brodnik. - El. knjiga. - Koper : University of Primorska Press, 2016 Način dostopa (URL): http://www.hippocampus.si/isbn/978- 961-6984-20-1.pdf Način dostopa (URL): http://www.hippocampus.si/isbn/978- 961-6984-21-8/index.html ISBN 978-961-6984-20-1 (pdf) ISBN 978-961-6984-21-8 (html) 1. Gl. stv. nasl. 2. Krész, Miklós 283625728 Contents Preface II Efficient Implementation of Algorithms for Solving Subgraph Isomorphism Problem in Cheminformatics 5–8 ◆ Mónika Vigula Adaptive Algorithms For Dynamic Programming With Applications to Bioinformatics ◆ Marko Grgurovič 9–11 An Algortihm for Recognition of Position Repetition in Chess ◆ Nándor Németh 13–16 Comparison Between a Cache-oblivious Range Query Data Structure and a Quadtree ◆ Tine Šukljan 17–19 A New Method for Transforming Algorithm into VHDL by Starting from a Haskell Functional Language Description 21–24 ◆ Gergely Suba and Péter Arató A System for Parallel Execution of Data-flow Graphs ◆ Andrej Bukošek 25–28 An Algorithmic Framework for Real-Time Rescheduling in Public Bus Transportation ◆ Balázs Dávid and János Balogh 29–33 Traffic Sign Symbol Recognition with the D2 Shape Function ◆ Simon Mezgec and Peter Rogelj 35–38 On Numerical Simulation of Filtration Gas Combustion Processes on the Shared Memory Machines 39–43 ◆ Tatyana Kandryukova matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October III matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October Efficient implementation of algorithms for solving subgraph isomorphism problem in cheminformatics Mónika Vigula Eötvös Loránd University Budapest, Hungary vigula.monika@gmail.com ABSTRACT (mostly hundreds of thousands or millions) and a query In a typical task in cheminformatics [1], we have to re- molecule are given. We have to find all the target molecules trieve all target molecules from a given database containing a from the database containing the query molecule as sub-query molecule as substructure. Representing the molecular structure (see Figure 1). In the first step, we represent the structures as labeled graphs, this task can be formulated as molecules as undirected, labeled graphs. Then, we check for the subgraph isomorphism problem, which is a well-known each target molecule whether it contains the query molecule NP-complete problem. or not. As this is a well-known NP-complete problem, screening methods are used as preliminary step. Our pur- In our work, we have studied three algorithms solving the pose is to identify quickly as many of those molecules from subgraph isomorphism problem. We have implemented the the database that cannot contain the query as substructure. Ullmann algorithm [5] and the VF2 algorithm [2] along with Only the remaining molecules are to be examined by the the atom re-ordering step of QuickSI [4] and additional actual substructure search algorithm. heuristics. The time and memory requirements of these algorithms were also evaluated. We have determined the combination of heuristics resulted in the best performance on real data sets as well. Keywords cheminformatics, subgraph isomorphism problem, substruc- ture search Supervisors Krisztián Tichler1 and Péter Kovács2 1. INTRODUCTION Cheminformatics is a rapidly growing field which implies interesting challenges for information technology beyond chemical background. It helps us to select pharmaceutical research directions and reduce the numbers of costly research tests. We try to predict the possible outcome of an experiment using our mathematical and information technological knowledge. In a typical task, a database containing a lot of molecules Figure 1: Our motivation 1Eötvös Loránd University, Hungary, Institute of Informat- This problem has been studied extensively for decades. One ics, ktichler@inf.elte.hu 2 of the most common solution methods is the algorithm pro- Eötvös Loránd University, Hungary, Institute of Informat- ics, kpeter@inf.elte.hu posed by J. R. Ullmann [5] in 1976, the VF2 algorithm [2] published by Cordella et. al. in 2001 is also widely used, while the QuickSI algorithm [4] is a quite new algorithm, it was published in 2008. Our aim was to overview algorithms related to the subgraph isomorphism problem and implement some of these meth- ods efficiently using heuristics that exploit characteristics of molecular graphs to improve performance (e.g., bounded degree, vertex and edge labels). In some applications, it is also necessary to compute all possible mappings between matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 5 the query and the target structures, thus we also considered (e.g., C, N, O ) and bonds (e.g., simple, double, triple) as this problem. We tested the correctness and efficiency of our well. implementations on real data sets. In another screening method, two matrices are calculated for 2. THE SUBGRAPH ISOMORPHISM each molecule. Columns and rows correspond to frequent atom types3 (e.g., C, N, O ) and bond types (e.g., simple, PROBLEM double), respectively. Initially, all matrix elements are zero. Definition 1. A labeled graph is defined as a 6-tuple The bonds of the molecules are examined sequentially. If G = (V, E, ΣV , ΣE, lV , lE) where V is the set of vertices, E the current bond connects two atoms of frequent type, then is the set of undirected edges, ΣV and ΣE are sets of vertex the two elements of the matrix are increased by one. It can and edge labels, and lV : V → ΣV , lE : E → ΣE denote easily be proved that each element of the query’s matrix label functions mapping a vertex or an edge to a label, re- is less than or equal to the corresponding element in the spectively. target’s matrix if query ⊆ target. Definition 2. A graph G1 = (V1, E1, ΣV , Σ , l , l ) 1 E1 V1 E1 is subgraph isomorphic to G Simple example molecules are shown in Figure 3. The cor- 2 = (V2, E2, ΣV , Σ , l , l ) 2 E2 V2 E2 (denoted by G responding matrices are shown in Table 1. The elements of 1 ⊆ G2) if there exists an injective function f : V the matrices of the query and the target are separated by a 1 → V2 satisfying colon. As the query’s matrix contains non-zero elements and 1. ∀u∈V1 (lV (u) = l (f (u))) 1 V2 all elements are zero in the target’s matrix, query * target 2. ∀u, v∈V holds. 1 ((u, v) ∈ E1 ⇒ (f (u), f (v)) ∈ E2) 3. ∀u, v∈V1 ((u, v)∈E1 ⇒ lE ((u, v)) = l ((f (u), f (v)))) 1 E2 Notice that G2 does not necessarily contain G1 as induced subgraph. A simple example is shown in Figure 2. A possible mapping is f (1) = 14 f (5) = 5 f (9) = 12 f (2) = 2 f (6) = 8 f (10) = 3 f (3) = 1 f (7) = 4 f (4) = 6 f (8) = 11 Figure 3: Example molecules to screening C N O F P S . . .  single 4 : 0 1 : 0 1 : 0 0 : 0 0 : 0 0 : 0 . . .  double 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 . . .   triple 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 0 : 0 . . . Table 1: Matrix Figure 2: An example where G2 contains G1 as sub- 4. ALGORITHMS structure We have studied three well-known substructure search algo- Note furthermore, that hydrogen is usually not represented rithms, the Ullmann algorithm [5], the VF2 [2], and the in molecules as its presence can be deduced from our chem- atom-reordering step of the QuickSI [4]. Although they ical knowledge. are all backtracking algorithms, they build the backtrack- ing trees in different ways. Furthermore, QuickSI rearranges In cheminformatics, query molecule, target molecule, atoms, the atoms of the query molecule in the first step to achieve bonds and substructure are usually used instead of G better performance. Each node of the backtracking tree rep- 1, G2, vertices, edges and subgraph, respectively, therefore these resents a partial isomorphism from the query to the target notions are used in the rest of the paper. molecule. As this tree can be exponential in size, the pur- pose of the algorithms is to filter out nodes that can not be extended to achieve a subgraph isomorphism function. 3. SCREENING Therefore, different feasibility functions are applied by the A preprocessing step, called screening, is used to exclude as methods. many molecules as possible from the target database that cannot contain the query molecule as substructure. Our aim The Ullmann algorithm maintains a boolean matrix (de- was not to study and improve the existing screening meth- noted by M ) representing the possible branches at the dif- ods, therefore only some simple conditions were checked ferent depths. Mi,j is true if the current partial mapping before running the implemented substructure search algo- can be extended by adding f (i) = j to achieve a subgraph rithms. If query ⊆ target then the atom/bond count of the query is less than or equal to the atom/bond count of the 3The frequent atom types were determined based on a public target. Similar inequalities hold for various types of atoms molecule set of National Cancer Institute [3]. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 6 isomorphism between the query and the target. This ma- trix is refined after each step: the algorithm checks for each pair of i and j where Mi,j is true whether the non-mapped neighbors of the i-th atom in the query can be mapped to a neighbor of the j-th atom in the target. If this condition does not hold, then Mi,j is set to false and a new refine phase is performed. The VF2 algorithm maintains a neighbor set for each molecule (N bq, N bt) which contains those atoms that are not Figure 4: The parent heuristic involved in the current partial mapping but have a neighbor that is already involved. The algorithm distinguishes the following two cases. frequency of the different atom and bond types in the given database. They suggested mapping the rare atoms before the frequent ones while keeping connectivity. This new atom 1. N b order needs to be calculated only once before any search q 6= ∅. In this case, the next atom to be mapped is the element of this set with the minimum index. against the database. This atom can be mapped only to an atom from N bt. Before the algorithm extends the current partial map- Although this method has been shown to reduce search time, ping by adding f (i) = j, it checks if the atom and the it has some limitations. In its original form, it is applicable bond types match and |N b only for connected query graphs. However, as in many real- q | ≤ |N bt| holds for the new neighbor sets. world applications, the query structure may consist of mul- tiple components, we have extended this method to be ap- 2. N bq = ∅. In this case, the non-mapped query atom plicable for such queries. For example, an isolated atom can with the minimum index is considered, which can be be mapped to any atom of the target having the same atom mapped to any non-used atom of the target. Before type; therefore it is beneficial to consider isolated atoms last. the extension of the current partial mapping, only the Furthermore, if we apply this atom ordering technique, then atom types are checked. the calculation of parent atoms should also be adjusted. 5. HEURISTICS 5.3 Ullmann Algorithm In this section, we briefly introduce the heuristics we have We have successfully applied the aforementioned two heuris- developed to improve upon the considered algorithms. tics in the Ullmann algorithm. They both substantially de- crease its running time. 5.1 Parent heuristic Definition 3. Suppose the atoms of the query are indexed In addition, we could also decrease the memory requirement by positive integers and a partial mapping is given. Let of the algorithm. Its straightforward implementation re- q quires O(n3) space (where n = max{|V1|, |V2|}), which can i (i ∈ {1, . . . , |V1|}) be an arbitrary non-mapped atom of the query molecule. Define the parent atom of q be reduced to O(n2) by saving only the modified matrix el- i (denoted by parent(q ements at each depth instead of saving the entire boolean i)) as its mapped neighbor atom with the minimum index (if it has any). compatibility matrix. Because every element can be set from true to false only once under a node in a backtracking tree, Notice that at least one atom in each component does not at most O(n2) positions need to be saved in total. have a parent atom (i.e., the atom mapped first in its com- ponent). Furthermore, note that the parent atoms are not Another improvement exploits that the compatibility matrix modified by extensions of the current partial mapping. The contains exactly one true value in the first d rows, where d key observation that we exploited is that a query atom qi can denotes the current search depth. Therefore, the matrix only be mapped to those atoms of target that are adjacent refinement step can be started at the (d + 1)-th row. to the image of parent(qi). Search algorithms can effectively take advantage of this property in case of molecular graphs, 5.4 VF2 because they are very sparse. We have also devised a few improvements for the VF2 algo- rithm. First, the candidate sets are not pre-calculated and A simple example is shown in Figure 4 to demonstrate how not stored in memory. If the next candidate is needed, it this heuristic can be applied. Suppose that the current map- can be calculated on the fly. Additionally, if only the first ping is f (1) = 4, f (2) = 5, and f (3) is under consideration. mapping is required, then there is no need to calculate those The parent atoms are indicated by arrows pointing from an candidates that might not be used in a later step. atom to its parent atom. As f (parent(3)) = 4, which has two neighbors that are not mapped yet, namely 2 and 6, the Although comparing the cardinality of neighbor sets im- only possible assignments are f (3) = 2 and f (3) = 6. proves the performance of the original algorithm, we found that it becomes superfluous when the parent heuristic is also 5.2 Atom order in the query molecule applied. In fact, this new heuristic developed by us seems H. Shang et. al. introduced a new method in [4] to obtain to be clearly superior to the original technique. better performance of search algorithm. Their idea is to rearrange the atoms of the query molecule according to the Apart from that, the atom reordering heuristic is also ap- matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 7 plied to further reduce the running time of the algorithm. Ullmann alg. VF2 Moreover, an additional improvement is based on the obser- vation that the order of the atoms in the query can be fixed first all first all in the first step, thereby avoiding to find the atom with the small queries minimum index at each step. 0.004 s 0.005 s 0.001 s 0.002 s and targets small queries 6. TEST CASES, SPEED AND MEMORY and large tar- 0.067 s 0.136 s 0.009 s 0.013 s RESULTS gets large queries We have compared the implemented algorithms combined and large tar- 0.27 s 1.084 s 0.01 s 0.027 s with all the improvements mentioned above in different as- gets pects. We have studied the time requirement for finding either the first or all substructure mappings for molecules Table 3: The search time for 1000 query-target pairs having different size. Our test suite consists of three in- at finding the first/all mapping stance sets: (1) small queries and small targets4; (2) small queries and large targets, and (3) large queries and large targets. The test molecules were selected from a public 7. CONCLUSION molecule database of National Cancer Institute [3]. The This paper presents our results for solving the subgraph iso- different test cases are summarized in Table 2. The first morphism problem. We have implemented the Ullmann al- and second row of each cell corresponds to the queries and gorithm, the VF2 algorithm, and the atom-reordering step of targets, respectively. Some of the 20 small, frequent test QuickSI along with various heuristics that we devised to im- molecules are shown in Figure 5. prove upon them. We have compared the memory and time requirements of the implementations on real-world molecu- The heuristics mentioned above have decreased the search lar graphs having different size. The search time has been time of the Ullmann algorithm and VF2 by 35-40%. The reduced by 35-40% by applying the developed heuristics. running time can be further reduced by 10% if the algo- rithms are implemented in iterative way instead of recursive 8. ACKNOWLEDGMENTS way. Finding all mappings required between 1.5 and 3 times I would like to express my gratitude to Krisztián Tichler as much running time as finding only the first mapping. and Péter Kovács, my supervisors for their patient guidance, enthusiastic encouragement and useful critiques of this re- Our results show that the VF2 algorithm is typically much search work. I am grateful to István Fekete for introducing faster than the Ullmann algorithm. We found that the re- me to cheminformatics and for his continuous support. I am finement of the initial boolean matrix used by the Ullmann obliged to staff members of ChemAxon Ltd. for the valuable algorithm takes about 70% of the total search time. information provided by them in their respective fields. The reduced search time for 1000 query-target pairs are The research project presented in this paper is supported shown in Table 3. by the European Union and co-financed by the European Social Fund (grant agreement no. TÁMOP 4.2.2./B-10/1- Query molecules 2010-0030). small large Marvin was used for drawing, displaying and characterizing 20 small molecules chemical structures, substructures, and reactions, Marvin cules small – 6.0.5, 2013, ChemAxon (http://www.chemaxon.com). molecules having 11-15 atoms mole 20 small molecules large, similar 9. REFERENCES rget large a [1] N. Brown. Chemoinformatics – an introduction for T molecules at least 76 atoms molecules computer scientists. ACM Computing Surveys, pages Table 2: Test cases 8:1–8:38, 2009. [2] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. An Improved Algorithm for Matching Large Graphs. In: 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, Cuen, pages 149–159, 2001. [3] National Cancer Institute. NCI Open Database Compounds, 2000. http://cactus.nci.nih.gov/download/nci/. [4] H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming Figure 5: Some small query molecules Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism. Proceedings of the VLDB Endowment, pages 364–375, 2008. [5] J. R. Ullmann. An algorithm for subgraph 4In cheminformatics, molecules having at most 15 non- isomorphism. Journal of the ACM, 23:31–42, 1976. hydrogen atoms are considered as small molecules. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 8 Adaptive Algorithms For Dynamic Programming With Applications to Bioinformatics [Extended Abstract] Marko Grgurovič University of Primorska Titov Trg 4 Koper, Slovenia marko.grgurovic@student.upr.si ABSTRACT It can be shown that certain dynamic programming algo- In this paper we study a simple computation that lies at rithms, including algorithms for the edit distance problem the core of many dynamic programming algorithms, perhaps and algorithms used in geology [4] and in speech recogni- most notably those for computing the edit distance between tion [3], can be reduced to this simple computation. The two strings. Past solutions have assumed properties of the straightforward algorithm for computing Eq. 1 takes O(n) input such as convexity and concavity and would not work time per i, which amounts to a total of O(n2) time over all on general inputs. We obtain an algorithm that combines i. the previous algorithms in a way that makes no assumptions on the input, yet its running time depends on a measure of Previous algorithms for this problem focused on specific “sortedness” present in the input. This measure turns out to types of the function g(·) (usually referred to as the gap be very natural, and if the input comes from a function, it function), e.g. those that satisfy the quadrangle inequal- corresponds to the number of inflection points of the input ity [1, 5, 2] or the inverse quadrangle inequality [1]. None of function. An immediate consequence of the result is that these algorithms produce the correct answer when g(·) does if the input comes from a polynomial of degree d, then the not satisfy their assumptions. In this paper, we develop a running time of the algorithm can be upper bounded by combination of these approaches that works for all inputs, a function of d. The new algorithm extends the previous but has a running time that depends on the function g(·) in results to a wider family of functions and is never worse a natural way. than either special cases. All logarithms in this paper are in base two. Categories and Subject Descriptors 2. THE CONVEX AND CONCAVE CASE F.2.2 [Theory of Computation]: Nonnumerical Algorithms and Problems In this section we describe the algorithm of [1] for the convex case. We do not explicitly consider the concave case, since it is essentially analogous. General Terms Algorithms, Theory Consider the case when g(·) is a convex function, that is a function which increases at an increasing rate. In other Keywords words, given a < b the following residues are implicitly sorted: Dynamic programming, edit distance g(b + c) − g(a + c) ≤ g(b + c0) − g(a + c0) for 0 ≤ c ≤ c0. 1. INTRODUCTION We consider the following simple computation: given a se- Now consider the line 1, 2, ..., n. We will assign to each quence of values X1, ..., Xn and a function g(k) for 1 ≤ k ≤ point i on this line the value Y [i], i.e. the combination that n, compute for all 1 ≤ i ≤ n: achieves the minimum for a given i in Eq. 1. Observe that we can represent Y [i] with Xk, since given some Xk and Yi = min1≤k i. Proof. Let Xj be the element that achieves the mini- matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 9 mum for i + 1. Then: elements in block bi. Observe that 1 ≤ B ≤ n/2, since two elements form either an ascending or descending sequence, Xk + g(i + 1 − k) ≥ Xj + g(i + 1 − j) and in the case B = 1 the “function” is either convex or Xk − Xj ≥ g(i + c − j) − g(i + c − k). concave. This step takes O(n) time, and allows us to par- tition G into disjoint intervals where the property of sorted Which holds for all c ≥ 1, since the residues on the right-residues holds. hand side at most decrease. The algorithm works by moving through the list L and for What follows from Lemma 2.1 is that each element from X each block (start, end, asc), it finds the optimal combinations occupies at most one interval on our line i = 1...n. of Yi = Xk + Gi−k whenever i − k is inside the interval (start, end). Once these combinations have been found, they We would also like to know, given two elements X are stored as Yi if they are lower than the current value and k and Xj and j < k, for which gap values will X the algorithm moves onto the next block, where the same k be smaller than Xj . This leads to the inequality: process is repeated. Observe that these subproblems can be solved by a slight modification of the algorithms from Xk + g(a) ≤ Xj + g(b) ∀b, a : b − a = k − j. [1], where we use the convex algorithm if asc = 1 and the concave algorithm otherwise. Once we solve the subproblem Rearranging this gives us in a block, we will not revisit it and so the space can be X reused. The algorithm requires O(n) extra space, which k − Xj ≤ g(k + c) − g(j + c). matches the space requirements of [1]. We can now find the maximal value of c for which the in- equality still holds in O(lg n) by employing binary search. The time required to solve the subproblem on block bi comes from the algorithm described in Section 2. In our case, the Now we are ready to describe the algorithm. First we assign binary search is performed on bi, so the time is O(n lg |bi|). X1 as the interval that covers the entire line. Then, the al- Over all subproblems the time becomes: gorithm works by traversing the list of candidates X2, ..., Xn and comparing the current candidate X B k with the element X X O(n lg |bi|). j , which corresponds to the rightmost interval on the line. One can now determine for which values of i the new candi- i=1 date is better, and the interval can be updated. In the case Turning the sum of logarithms into the logarithm of the that the new candidate is better than the previous one for product we get: the entire interval, we simply remove the old candidate, and B repeat the process on the rightmost remaining interval. Y O(n lg( |bi|)). i=1 In order to analyze the time required by the algorithm, we note that each new candidate from X will perform at least Assume B is fixed. Since the logarithm is a monotonically one binary search, which takes O(lg n) time. Multiple binary increasing function, the time is maximized when QB |b i=1 i| is searches may be performed once intervals are removed from maximized. Recall that the geometric mean is less than or the rightmost end of the line, but observe that each element equal to the arithmetic mean. Then we have: X is only added once, so after being removed it is never B reinserted. Therefore, the algorithm takes O(n lg n) time. Y ( |bi|)1/B ≤ n/B i=1 3. ALGORITHM B Y Observe that the function g only takes on integral inputs. ( |bi|) ≤ (n/B)B. Therefore, we can consider a more general computation: i=1 given a sequence of values X1, ..., Xn and a sequence of val- Thus, we can upper bound the time by O(Bn lg( n )). Re- ues G B 1, ..., Gn for 1 ≤ k ≤ n, compute for all 1 ≤ i ≤ n: gardless of G, the time is never worse than the straight- Y forward O(n2) algorithm, and achieves the O(n lg n) bound i = min1≤k 0. Arge et al.[4] presented a contruction method for the layout. The basic idea is to store all the points, sorted by the x- 2. CACHE-OBLIVIOUS DATA STRUCTURE coordinate, in the leafs of a balanced binary search tree, For the cache-oblivious data structure we chose the data stored implicitly using the van-Emde-Boas layout. All the structure from [4], as it’s the simplest and doesn’t have as- internal nodes carry additional information about the actual sumptions about B, and has the same running time. We points in the subtree. Then using the sweep line approach; first describe the data structure for two-sided queries and starting from −∞, we sweep a horizontal line upwards across how to construct it. After that we describe the construction the plane. Each time the sweep line is at point p, we traverse of the data structure for four-sided queries. the tree leaf-to-root and update all the nodes on the path. After this step, we have to check in the root of the tree, if 2.1 Data structure for two-sided queries there exists a sparse query. If it does, we have to traverse a root-to-leaf path (specified by the information in the internal The structure consists of two parts: a sequence L of length nodes), to find the point, we use to create the tree Y. The O(N ), called the layout, which stores the points of S, with whole process was proved[4] to take O(N log N ) memory the possibility of duplicate entries; and a balanced binary B transfers. search tree Y on a subset of the y-coordinates of the points in S, stored implicitly in the van-Emde-Boas layout. Each 2.2 Data structure for four-sided queries leaf in the tree stores a pointer to an element of L. To answer a query (−∞, x] × [y, ∞), a search for y in Y is done and We first explain how to construct a data structure for three- then the pointer is followed into L. Then a scan forward sided queries, as it is a step to construct the data structure over L is performed until an x-coordinate greater than x is for four-sided queries. Our three-sided structure consists of found. a balanced binary tree T with the points in S stored at the leaves, sorted by their x-coordinates. T is laid out in The key point of the data structure is the way the points memory using the van-Emde-Boas layout, so that a root-to- are stored in the layout L. Assume that we are prepared leaf path can be traversed cache-obliviously in O(log N ) B to scan through αT points for α > 1, when answering a memory transfers. For each internal node v in T , let Sv be query with output T . We will call it a dense if we scan at the points of S stored in the subtree rooted at v. We store most αT points, and sparse otherwise. Consider the mini- the points in Sv in two secondary structures, Lv and Rv, mum y-coordinate y associated with v. Lv is a structure for answering two-sided 1 such that there exists a sparse query (−∞, x] × [y queries of the form (∞, x] × [y, ∞) (with the x-opening to 1, ∞). Let S0 be a sequence sorted by the x- coordinate, with y coosrdinate less than y the left); Rv is a structure for answering two-sided queries 1. As there are no sparse queries with the y-coordinate less than y of the form [x, ∞) × [y, ∞) (with the x-opening to the right). 1, we can answer those queries efficiently. As we repeat the step with the remaining points, we get a concatenation of sequences Each point is stored in two linear space structures on each S of the O levels of the tree T , the structure uses O(N log N ) 0, S1, ..., Sk of points, with which we can answer all the 2 queries efficiently. The problem with this approach is that space. To answer the query [xl, xr] × [yb, ∞), we take the the space can be more than linear, since the worst case size first node v such that xl is contained in the subtree rooted is Θ(N ). at he left child l, and xr is contained in the subtree rooted at the right child r. Then we query Rl with the query range To reduce the size of the layout we need to store the se- [xl, ∞) × [yb, ∞) and Lr with the query range (inf ty, xr] × quences in such a way, that every sequence S [yb, ∞). i is identical with Si+1 having a suffix valuse as large as possible. We take a point with the minimum y-coordinate, such that there ex-To construct the data structure for four-sided queries, we ists a sparse query (−∞, x need to apply the same method again. We store all the i] × [yi, ∞). Instead of storing all the points with y-coordinate less than y points, this time sorted by the y-coordinate, in a balanced i, we store only the points that has the x-coordinate less than x binary tree, stored using the van-Emde-Boas layout. For i too. With this improvement it can be proved that the size of the layout L each internal node v, let Sv represent the points from S is linear: stored in the subtree rooted in v. We store the points in Sv in two secondary structures Uv and Bv, associated with v. Uv is a structure for answering a three-sided query of the Lemma 1. [4]Layout L uses at most α N = O(N ) space. form [xl, xr]×[y, ∞); Bv is a structure for answering a three- α−1 sided query of the form [xl, xr] × [y, −∞). This construction To answer the two-sided queries we create, in addition to adds another factor of O(log2N ) to the space complexity, the leayout L, a binary search tree Y over the y-coordinates to get the final O(N log2 N ) space complexity of the data 2 we used to construct L, and store the whole tree using the structure for four-sided queries. van-Emde-Boas layout. Each leaf stores a pointer to the start of the sequence it produced in L.With both, the tree 3. RESULTS Y and the layout L, we can answer two-sided range queries As we implemented the mentioned data structures, we no- in O(log N T /B). ticed that the construction algorithm rely heavily on root- B matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 18 Table 1: Comparison between the cache-oblivious data structure (C-O) and the quadtree (QT). Construction time Query time Size Number of points C-O QT C-O QT C-O QT 100.000 30 s 0,06 s 5 ms 43 ms ∼1 GB 2MB 200.000 69 s 0,12 s 28 ms 90 ms ∼2,5 GB 4MB 1.000.000 7,5 min 0,7 s 132 ms 453 ms ∼7 GB 24MB 2.000.000 17 min 1.5 s 413 ms 951 ms ∼16 GB 50MB 10.000.000 ∼2 h 22 s 2817 ms 6259 ms ∼80 GB 300MB 100.000.000 ∼8 h 115 s 32523 ms 341282 ms ∼950 GB 3,5GB to-leaf and leaf-to-root traversals. Each computation to space. So there is actually a trade-off between the speed of find the index of the parent/child of a specific node takes the queries and the space. O(log N ) time, so we decided to precompute the indexes 2 for all the topologies of the trees, which drastically reduced 5. REFERENCES the construction time. [1] P. Afshani, C. Hamilton, and N. Zeh. Cache-oblivious range reporting with optimal queries requires We decided to compare the cache-oblivious data structure superlinear space. In SCG ’09: Proceedings of the 25th with a quadtree, probably the most used data structure annual symposium on Computational geometry. ACM for range queries in the RAM model. We compared the Request Permissions, June 2009. query time, construction time and the size of the data struc- [2] P. K. Agarwal, L. Arge, A. Danner, and ture. We tested on problems of different sizes, varying from B. Holland-Minkley. Cache-oblivious data structures for 100.000 to 100.000.000 points. All the coordinates were ran- orthogonal range searching. pages 237–245, 2003. dom 32-bits decimal numbers. All the queries were of type [3] A. Aggarwal and J. Vitter. The input/output (−∞, ∞) × (−∞, ∞), so all the points had to be returned. complexity of sorting and related problems. All the tests were run five times and the average results are Communications of the ACM, 31(9):1116–1127, 1988. shown in Table 1. All the tests were run on a Core 2 Duo @ [4] L. Arge and N. Zeh. Simple and semi-dynamic 2.53 GHz and 4GB of RAM. structures for cache-oblivious planar orthogonal range searching. In SCG ’06: Proceedings of the From the results of the tests, it can be seen that the cache- twenty-second annual symposium on Computational oblivious is much faster in answering the queries. The shaded geometry. ACM Request Permissions, June 2006. cells in Table 1 show, that the data structure for that specific size was already bigger than the available main memory, so [5] M. A. Bender, E. D. Demaine, and M. Farach-Colton. swapping occured. Note, that because of high locality for Cache-oblivious B-trees. SIAM J. Comput., the cache-oblivious data structure, swapping didn’t affect it 35(2):341–358, 2005. as much as it affected the quadtree. [6] M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-Oblivious Algorithms. In On the other side, the size of the whole cache-oblivious data FOCS ’99: Proceedings of the 40th Annual Symposium structure is much bigger than the quadtree. The main reason on Foundations of Computer Science. IEEE Computer is that even if the key ingredient of the cache-oblivious data Society, Oct. 1999. structure (the two-sided data structure) has linear space [7] H. Prokop. Cache-oblivious algorithms. Master thesis, complexity, we need a lot of them (O(log2 N ) exactly). When Massachusetts Institute of Technology, Cambridge. 2 N is getting bigger, this factor is not negligible. The difference in the construction time is probably of the least importance, as the cache-oblivious data structure is a static data structure, so it can be precomputed. Despite that, it is interesting to notice, that 2/3 of the construction time is sorting the points, as the four-sided data structure needs the points sorted by their y-coordinate, the three-sided needs them sorted by their x-coordinate. So for every three-sided data structure we need to sort all the points. The same happens for every two-sided data structure, as it needs them sorted by their x coordinate. 4. CONCLUSIONS This is the first imeplementation of this data structure to the author’s knowledge. It can be seen from the results, that the cache-oblivious data structure, once constructed, is not affected by the swapping of blocks made by the operating system, as the theory suggested. On the other side, to achieve that, the implemented solution takes a lot more matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 19 matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 20 A new method for transforming algorithm into VHDL by starting from a Haskell functional language description ∗ Gergely Suba Prof. Dr. Péter Arató Budapest University of Technology and Budapest University of Technology and Economics Economics 2 Magyar tudósok körútja 2 Magyar tudósok körútja Budapest, Hungary, 1117 Budapest, Hungary, 1117 sugergo@iit.bme.hu arato@iit.bme.hu ABSTRACT 1. INTRODUCTION In the field of computer engineering, there are a lot of prob-Although the speed of the conventional processors is growing lems that are too time-consuming like biological or physical continually, there are a lot of problems that are too time- calculations and that’s why they have to be implemented consuming like biological or physical computing and that’s using special hardware structures. Usually, these hardware why they have to be implemented in a more efficient hard- structures are described in HDL (Hardware Description Lan- ware structure and sometimes in special hardware. Usually, guage). Developing in HDL languages is not as efficient as these hardware structures are described in HDL (Hardware it would be in case of software languages due to its low level Description Language). These languages are not so well-structures. known by the mathematicians and software engineers, they usually write the algorithms in general-purpose program- The aim of this work is to implement and test a method ming languages. There is a great difference between using (a compiler program), where the starting point is a code HDL or software languages, and the chance to transform written in the functional language Haskell and the output these two representations into each other is rather small. To is the same algorithm in VHDL (a kind of HDL) language. overcome this gap, there is plenty of research that are very The main advantage of the novel method presented in this important parts of the so called System Level Synthesis. paper is that it generates a synthesizable VHDL description from Haskell code automatically for FPGA implementation. Hardware synthesis starting with a functional software lan- guage has some advantages compared to a hardware descrip- The method introduced in this paper can solve two sepa- tion language. Besides, modifying the hardware when the rate problems: 1) running algorithms effectively in FPGA, algorithm changes is easier, because a code written in func- 2) development of digital hardware implementing a speci- tional language can be adapted more directly than one writ- fied function. We demonstrate the efficiency of the method ten in a hardware description language. Transforming an through practical examples like a part of the MP3 decoding algorithm to HDL can be easily automated by the novel com- algorithm. piler program introduced in this paper. The program code written in functional language can be run in PC, therefore it can be tested easily. In contrast to this, the hardware de-Categories and Subject Descriptors scription languages require a complex simulation for testing, C.3 [Special Purpose and Application-based Systems]: where the inputs and outputs are handled as digital signals, Real-time and embedded systems; D.2.2 [Software Engi- which is an additional complication. neering]: Design Tools and Techniques Based on the above arguments, the aim of this work is to Keywords implement and test a method (a compiler program), where the starting point is a code written in a functional language HLS, FPGA, Haskell, VHDL, dataflow graph and the output is the same algorithm in HDL language. ∗ Functional languages have benefits in case of digital signal Prof. Dr. Péter Arató is the supervisor. Department of Control Engineering and Information Tech- processing and it fits the capabilities of the hardware world nology, Faculty of Electrical Engineering and Informatics better than the imperative ones. I focus on the Haskell [9] functional language, which is being developed dynamically, and it supports the research work well. The main advantage of the method presented in this paper is that it generates the synthesizable VHDL [6] description from Haskell code automatically for FPGA implementation. Even a pipeline optimization tool can be involved by the pro- cedure such as the high level synthesis tool PIPE, developed at the Department of Control Engineering and Information Technology in BME. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 21 We demonstrate the efficiency of the method through prac-Algorithm tical examples like a PID controller and a part of the MP3 writtenhinhHaskell User decoding algorithm. Compiler OperationSet parameter For the method, we defined the following requirements: Compiler FOpMap Haskell-EOG 1. compile Haskell to VHDL code automatically compiler 2. notify the user about the error details in case the code cannot be compiled HLShoptimization EOG 3. make the compiler program modular, so that its parts can be used separately OpVhdlMap EOG-VHDL 4. ensure that pipeline optimization methods can be in- compiler cluded if necessary VhdlModules VHDLhcode 5. the operation set should be dynamically extendable, Externalhsoftware without modifying the compiler program FPGAhsynthesizer Among some functional languages (Lava [3], µF P [12], Ruby [5], SASL [4]) that can use to generate HDL code, the most Hardware similar project is the CλasH (CAES Language for Synchronous Hardware) [2, 7], nevertheless it meets only the first two FPGAhconfiguration requirements from the above list. However, the purpose of CλasH is different from the method introduced in this paper. Figure 1: The structure of the compiler method Although, CλasH is based on Haskell syntax, it is a hardware description language, in contrast to our method where the purpose is a language without hardware concepts. CλasH also has some disadvantages compared with the method in known homogeneous synchronous dataflow (HSDF) [8, 11] this paper. It has no pipeline optimization stage, and it can graph, where the vertices are the operations and the edges not include external optimization methods, as it doesn’t use are dataflow links. In this representation, HLS optimization dataflow graphs as intermediate representation. The opera- stage can be included, such as the HLS system PIPE [1]. tion set is constant, thus new operations can not be added After the optional optimization, the EOG-VHDL compiler dynamically. as backend part will produce the expected VHDL output. From VHDL, an external FPGA development software can The method and the compiler program introduced in this synthesize the FPGA bitstream, which can result in a real paper meets Requirement 1, as it compiles the code auto- FPGA application. matically. In our method the GHC frontend is used to pre- process the source code that includes the parser, lexer and 2.1 Haskell-EOG compiler the generation of the abstract syntax tree. Therefore, re- The frontend of our compiler takes the source code written porting of the compiler errors (Requirement 2) is ensured in Haskell, and produces the EOG via the successive stages by GHC. Our method is based on modular structure: it is introduced in this section. divided into frontend and backend parts, and these parts consist of inner stages as it will be detailed later (Require-Stage 1. Produce the AST Core ment 3). Between the frontend and backend, the interface The first steps are to parse and analyze the code, and pro- is a dataflow graph, which is an intermediate representation duce the abstract syntax tree (AST). These steps are made of the algorithm. Most of the pipeline optimization sys- by the GHC frontend, which produces the GHC Core [13] tems are based on dataflow graphs, therefore these stages tree. This Core tree mostly consist of the following type of can be included in our system (Requirement 4). An exter- nodes: Lit (constant literal), Var (using of variable), App nal database is used as operation set, therefore expanding (lambda application, a.k.a. function calling), Lam (lambda the set of operations dynamically is simple (Requirement 5). abstraction, a.k.a. function body) and Case (branching). In the further steps of the frontend pipeline these nodes have to 2. THE PROPOSED COMPILER be converted to EOG nodes and edges by successive graph ARCHITECTURE transformations. In this section the architecture of our Haskell to VHDL com- piler system will be introduced. Stage 2. Eliminate of the branching The second step is to eliminate all of the branching (Case) The structure of the compiler method is shown in Figure nodes from the Core. During the elimination every branch 1. As it can be seen, the structure is divided into frontend is to be converted to a special function calling (App) node, and backend parts. The first is the Haskell-EOG compiler, which performs the branching as a black box. This step is a which produces the elementary operation graph (EOG) [1] as so-called Core2Core transformation, as its input and output intermediate dataflow representation. EOG is a kind of well are Core trees with the same structure. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 22 Stage 3. Produce the operation dependency tree Elementary operation graph The operation dependency tree (ODT, here which is an in- termediate data structure between the Core and the EOG OperationSet representation) consists of the operation nodes and the de- Preprocess Make pendency edges between the operations. Two types of nodes EOG ModuleList OpVhdlMap are defined in this tree: operation nodes and pointer nodes. VhdlModules An operation node has as many children as the number of its operands. A pointer node is always leaf and it can be Operations ModulList considered as a reference to an operation node. If an oper- ation node has a pointer node child, it will mean the same function as its child would be the operation node referenced by the given pointer node. Produce Produce Produce Operations Signals Components TopModule Producing the ODT is the largest part of the Haskell-EOG Template compiler. The main concept is supercompilation [10], where Generate the idea is to travel the Core tree in the same order as the VHDL CPU would process the statements of this functional code. During this graph traversal, the following conversions are performed to produce the ODT (as the output tree) from VHDL top module the Core description: Figure 2: EOG-VHDL compiler • Var is transformed to pointer node in ODT • Lit is transformed to constant operation node in ODT operations will result in component declarations in VHDL • App (if it is elementary) is transformed to elemen- (it is generated by the Produce Components task). Each tary operation in ODT EOG operation will result in component instantiation and • App (if it is complex): the body of the function must each dataflow link will result in signal declarations, which be inlined, and the nodes contained by the inlined tree interconnect the component instantiations. These parts are has to be processed as every other node generated by the Produce Operations and Produce Sig- nals tasks in Figure 2. • Lam can be eliminated without restriction • Case nodes have already been eliminated in Stage 2 Finally, the Generate VHDL task creates the output VHDL (the top module), from the produced VHDL parts discussed Stage 4. Produce the EOG before. The final step in the Haskell-EOG compiler is to transform the ODT to elementary operation graph, which is a sim- The EOG-VHDL compiler produces a single, structural VHDL pler process, than the previous stages were. Every opera- file, which contains only certain types of language structures: tion node in ODT will also be operation in EOG, and every component and signal declarations and component instan- dataflow link will also be link in EOG. The difference is that tiations. The behavior of the components are defined in the pointer nodes must be resolved, and it has to be substi- VHDL files separately component-by-component (these files tuted by the operation node referenced by the pointer node. are stored in the OperationSet). In practice, if the end of a link is a pointer node, this end has to be replaced by the referenced operation node. After 3. MULTI-RATE EXTENSION this transformation, the dataflow structure is obtained, not The EOG, previously used as an interface between the com- necessarily being a tree anymore. piler frontend and backend, is a single-rate dataflow graph, which is a huge restriction. EOG can only represent dataflow, 2.2 EOG-VHDL compiler where each operation is performed once during one restart The architecture of the EOG-VHDL compiler, which is the of the system (in other words: it is performed once for each backend of the whole method and system introduced in this input of the system). Although, certain loops can be im- paper, can be seen in Figure 2. First, the EOG goes through plemented in that case too, it is not an efficient way due to a preprocess task (Preprocess EOG in Figure 2), which the increasing number of operation nodes. (for example, if collects all of the information about the EOG nodes, then a loop iterates between 1 and 5, the operations of the loop saves it into the store Operations. Another task (Make body have to be replicated 5 times) For overcome this re- ModuleList in Figure 2) creates the store ModuleList, striction, a modified EOG is introduced, where groups of which will contain all of the necessary information about operations can be performed in different number of times the modules used in the input EOG. These modules have during one restart period. to exist in the external OperationSet, because the Make ModuleList task loads the information from that for each The extended version of EOG introduced in this section is operation. called multi-rate EOG (MR-EOG). In MR-EOG, groups of operations (blocks) can be defined, where each block con- From the two stores created before, the produce tasks will tains one or more operations and optionally other block(s). generate the parts of the VHDL output. All of the used In this way the blocks are built up in a hierarchy, and the matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 23 top of the hierarchy is a block denoted by TOP in further. Program Num. of Num. of Num. of operations links blocks i 1 PID controller 29 36 0 2 MP3 SFB 93 133 6 a g h j 3 MP3 ssum 19 25 1 k 4 MP3 uconv 32 45 0 b e f 5 MP3 usum 11 16 1 c d Table 1: Summary of the compiler tests Figure 3: MR-EOG Haskell code automatically, 2) it reports in case of any com- piler errors, 3) the structure of the compiler is modular (it An example can be seen in Figure 3. Here the block TOP has a frontend and a backend part), thus the parts could contains the operations a, i, j and k, and two inner blocks. be used separately in other projects. 4) external pipeline One of that contains the operations g and h, and the other optimization stage can be included, since it has a dataflow contains b, e and f, and its inner block contains the opera- graph as intermediate representation. 5) the operation set tions c and d. is able to be extended, because they are from an external database separated from the compiler program itself. For the new multi-rate representation, there is an important purpose: ensure that the general algorithms performed in The method, including its extended variant for multi-rate dataflow graphs (such as pipeline optimization algorithms) tasks, was tested with practical applications, such as a PID can also be used in MR-EOG. The solution is running the controller and an MP3 synthesis filter bank algorithm. algorithms recursively, starting with the innermost block. Since the innermost block does not contain inner blocks, the 6. REFERENCES methods developed for a single-rate dataflow graph can be [1] Peter Arato, Visegrady Tamas, and Istvan Jankovits. performed without restriction. If the algorithm finish with High Level Synthesis of Pipelined Datapaths. John a given block, the block has to be substituted by a virtual Wiley & Sons, Inc., New York, NY, USA, 2001. operation. The execution time of this operation will be the [2] C. P. R. Baaij. Cλash: From haskell to hardware. same as the execution time of the whole block was. Master’s thesis, Univ. of Twente, December 2009. [3] Per Bjesse, Koen Claessen, Mary Sheeran, and When a block contains only operations (including virtual Satnam Singh. Lava: hardware design in haskell. operations), the same algorithm can be run on it. In the SIGPLAN Not., 34:174–184, September 1998. end, even the block TOP is performed, and the algorithm is [4] Simon Frankau. Hardware synthesis from a finished for the whole MR-EOG hierarchy. stream-processing functional language, 2004. [5] Shaori Guo and Wayne Luk. Compiling ruby into 4. SOME CHARACTERISTIC TASKS FOR fpgas. In Proceedings of the 5th International TESTING Workshop on Field-Programmable Logic and We implemented the compiler program based on the method Applications, FPL ’95, pages 188–197, London, UK, introduced above in Haskell language. For testing the func- 1995. Springer-Verlag. tionality, two characteristic tasks (written also in Haskell) [6] ISO/IEC. Ieee 1076-2008: Ieee standard vhdl language were chosen. The first one is a PID controller, which is a reference manual. Technical report, Institute of single-rate algorithm, therefore the method written in Sec- Electrical and Electronics Engineers, Computer tion 2 can even compile this code to generate the desired Society, 26. January 2009. VHDL. The second one is the synthesis filter bank (SFB) [7] M. Kooijman. Haskell as a higher order structural part of the MP3 decoder algorithm. Since it contains sev- hardware description language, December 2009. eral loops, it can only be represented by multi-rate types of [8] E.A. Lee and D.G. Messerschmitt. Synchronous data dataflow graphs efficiently. The multi-rate extension of the flow. Proceedings of the IEEE, 75(9):1235 – 1245, sept. method detailed in Section 3 was used to compile this task. 1987. [9] Simon Marlow. Haskell 2010 language report, 2010. A brief summary of the functional tests are shown in Table [10] Neil Mitchell. Rethinking supercompilation. SIGPLAN 1. Row 1 shows the characteristic numbers of the PID con- Not., 45:309–320, September 2010. troller, while Row 2 shows the same for the MP3 SFB. The [11] K.K. Parhi. Algorithm transformation techniques for subtasks of the MP3 SFB can be seen in Row 3, 4 and 5 concurrent processors. Proceedings of the IEEE, separately. 77(12):1879 –1895, dec 1989. [12] Mary Sheeran. ufp, an algebraic vlsi design language - 5. CONCLUSIONS phd thesis, 1983. A compiler program is developed on base of the novel method [13] Andrew Tolmach. An external representation for the introduced in this paper. The method meets the require- ghc core language, 2009. ments that was set before: 1) it generate the VHDL from matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 24 A system for parallel execution of data-flow graphs Andrej Bukošek FRI, University of Ljubljana Večna pot 113, 1000 Ljubljana, Slovenia andrej.bukosek@gmail.com ABSTRACT 1. INTRODUCTION This paper describes the system we developed for performing Computer generation and manipulation of images is becom- arbitrary operations on data in parallel using a data-flow ing more and more prevalent in the film industry [2]. From graph [3]. compositing of complex rendered scenes with live actors to simple colour grading, such tasks are accomplished with spe- Each operation is implemented in a dynamically loadable cialised software that allows its users to create a data-flow module or using a domain-specific language, which was de-graph of operations, which are applied to the footage. signed specifically for this purpose. We also implemented a compiler for this language. Our domain-specific language Our goal was to create a generalised and more flexible soft- is functional and strongly typed. We designed its type sys- ware package for performing arbitrary operations on any tem to be modular. Every data type is implemented in an data using a data-flow approach. external dynamically loadable module, which the compiler loads during its initialisation. Each module contains func- The core functionality of the system is the parallel execution tions for generating an intermediate representation for the of data-flow graphs. Each node in a graph represents an LLVM system, which optimises it and translates it into ma- operation on data, which passes through its edges. chine code. To ease creation of new graph operations, we developed a As an example usage of our system, we developed operations domain-specific language. Operations can also be imple- for image manipulation, compositing, and rendering of 3D mented in any language that supports compiling code into scenes. Such a set of operations is commonly used in the shared objects with a standard C ABI (these plugins are film industry for the creation of special effects. then loaded at runtime). Our domain-specific language is functional and strongly typed. An interesting feature is its We also implemented a renderer based on the path tracing type system. Each type is implemented in an external shared algorithm, which creates an image from the description of object (plugin), which enables users to add custom types and a 3D scene. This method is based on a physically-correct parsers for constant values. simulation of light bouncing around the scene. As a practical usage of this system, we also implemented a Categories and Subject Descriptors variety of operations and types. A particularly interesting operation is a 3D scene renderer, which employs the Monte D.1.3 [Programming Techniques]: Parallel programming; Carlo path tracing algorithm to generate physically-correct D.3.2 [Programming Languages]: Functional languages; images of 3D scenes [4, 5]. D.3.2 [Programming Languages]: Specialized applica- tion languages; D.3.4 [Programming Languages]: Com- pilers; E.1 [Data Structures]: Graphs; I.3.4 [Computer 2. SYSTEM ARCHITECTURE Graphics]: Graphics packages; I.3.7 [Computer Graph- ics]: Raytracing, Animation; I.4.0 [Image Processing and Figure 1 shows an overview of the system components. Computer Vision]: Image processing software General Terms Task scheduler I/O modules Design, Languages, Algorithms Script Keywords Gr Oper aph executor ation modules data-flow graphs, parallelisation, domain-specific language, compositing, rendering, computer graphics User interface Compiler Type modules Supervisors Dr. Andrej Brodnik, Dr. Borut Robič Figure 1: System architecture. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 25 In the most common use case, the system accepts a script Each type module can also define a parser hook for con-written in our domain-specific language. This script de- stants. While looking for constants, the lexer gives full con-scribes the data-flow graph (nodes and connections) using trol to such hooks. This means that the syntax of constant standard function calls (this could be achieved in any other values can be completely arbitrary. One could even include language if the system was converted into a library). an interpreter for another programming language and use that to generate a constant by running a program. First, the system loads and initialises I/O and operation modules. This is followed by initialisation of the compiler, 3.2 Grammar which also loads the type modules. Then the script is com- The grammar for our language in BNF [1]: piled into machine code in memory and executed. The script forms the data-flow graph to be executed in parallel. The hidentifierexpr i ::= hidentifier i | hidentifier i ‘(’ hexpressioni* ‘)’ graph is executed after script execution finishes. hconstexpr i ::= hconst i hparenexpr i ::= ‘(’ hexpressioni ‘)’ The main advantage of having external modules for I/O, hifexpr i ::= ‘if’ hexpressioni ‘then’ hexpressioni operations, and types is greater flexibility, as the main pro- ‘else’ hexpressioni gram doesn’t need to be rewritten or recompiled to support hforexpr i ::= ‘for’ htypei hidentifier i ‘=’ hexpressioni ‘,’ new file formats, operations, and data types. hexpressioni [‘,’ hexpressioni] ‘do’ hexpressioni hwithexpr i ::= ‘with’ htypei hidentifier i [‘=’ hexpressioni] The subsequent sections of this paper contain more detailed (‘,’ htypei hidentifier i [‘=’ hexpressioni])* ‘do’ hexpressioni explanations of the various subsystems. hprimaryi ::= hidentifierexpr i | hconstexpr i | hparenexpr i 3. OUR DOMAIN-SPECIFIC LANGUAGE | hifexpr i | hforexpr i The main purpose of the language we created is to make | hwithexpr i developing new graph operations easier, especially for people hunaryi ::= hprimaryi | hunopi hunaryi who aren’t professional programmers. Its secondary purpose hbinoprhsi ::= (hopi hunaryi)* is to provide a language in which users can describe the hexpressioni ::= hunaryi hbinoprhsi data-flow graph, however, this functionality could easily be hprototypei ::= htypei hidentifier i ‘(’ [htypei hidentifier i [‘,’ accomplished using an existing language. htypei hidentifier i]*] ‘)’ hfunc definitioni ::= ‘func’ hprototypei hexpressioni Our language is functional, strongly typed and compiled into hextern definitioni ::= ‘extern’ hprototypei machine code for greater performance. hprogrami ::= (hfunc definitioni | hextern definitioni)* Optimisations and machine code generation are provided by The symbol hidentifier i represents the name of a variable the LLVM framework (http://www.llvm.org/). The com-or function. It can contain alphanumeric characters and piler we developed compiles source code into LLVM inter- underscores, but cannot start with a digit. mediate representation, on which we run various LLVM op- timisation passes and finally generate native machine code hconsti represents a constant value, which can be parsed by for the architecture of the machine the program is running any of the type modules. When encountering this symbol, on. The code is compiled directly to memory, no temporary the lexer goes through all the defined hooks in type modules files are created during compilation. until it finds one that accepts the constant. htypei represents a type keyword. Each type module defines 3.1 Type system a unique keyword for its type. All types in the language are implemented as external shared objects (plugins). Each plugin module implements functions Comments in the language begin with a # sign and continue for generating code for operators (e.g. adding two real num- until the end of the line. bers, multiplying a matrix with a vector, etc.) and parsing constants (optional). The code generation functions actually The precedence of operators is shown in table 1. generate LLVM intermediate representation instructions. Operators Precedence Apart from operators, each module can also generate code ; 1 (binds weakest) for various library functions for that type (e.g. functions = 2 for dot and cross product in the case of a 3D vector type). or 6 One of the library functions for composite types (e.g. vectors xor 7 and matrices) is also the constructor for that type. The and 8 constructor function allocates memory for the structure and ==, <> 9 sets its members to the values passed to the constructor. <, >, <=, >= 10 +, - 20 One of the advantages of generating code for operators and *, / 40 (binds strongest) functions (as opposed to simply implementing them and gen- Table 1: Precedence of binary operators. erating function calls instead) is that this enables further optimisations. Currently, only two unary operators are implemented — unary minus (-) and logical negation (not). matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 26 3.3 Code examples into the global data-flow graph. Nodes that output con- All constructs in our language are functional expressions and stants have an additional argument in their constructor. In-can be combined in the same way as arithmetic expressions. dividual nodes can be connected using the function The language is not whitespace sensitive. connect(node1, "outputName", node2, "inputName"), which connects output port outputName on node1 to input Let’s look at how we might calculate the n-th Fibonacci port inputName on node2. number in our language: To get a better idea of how this works, let’s take a look at a func int fib(int n) if n < 3 then 1 else fib(n-1) + fib(n-2) real-world example: func int main() An iterative version using real numbers would look like this: with node s1, node s2, node s3, node s4, node s5, node c1, node r1, node r2, func real fibr(real n) node li1, node li2, node ck1, with real a = 1.0, real b = 1.0, real c do node cb1, node si1 (for real i = 3.0, i < n do do ( c = a + b; # Create constant generators a = b; s1 = String("bg.png"); b = c); s2 = String("fg.png"); b # this is the value that the function returns c1 = Color(color(0.0, 1.0, 0.0, 1.0)); r1 = Real(0.05); r2 = Real(0.10); Variables are declared with the keyword with and are valid s3 = String("over"); in the expression following the keyword do. The for loop is s4 = String("f"); enclosed in parentheses due to the priority of the ; operator. s5 = String("out.png"); This operator behaves similarly to , (comma) in C. # Create operations li1 = LoadImage(); 4. DATA-FLOW GRAPHS li2 = LoadImage(); ck1 = ChromaKey(); Data-flow graphs consist of nodes, which represent opera- cb1 = Combine(); si1 = SaveImage(); tions, and edges, which facilitate the flow of data between nodes. In our implementation, the graphs are directed and # Connect ports acyclic, as this is powerful enough to represent all current connect(s1, "const", li1, "fileName"); connect(s2, "const", li2, "fileName"); use cases and simplifies parallelisation. connect(c1, "const", ck1, "keyColor"); connect(r1, "const", ck1, "tolNear"); connect(r2, "const", ck1, "tolFar"); connect(li2, "imageOut", ck1, "imageIn"); connect(s3, "const", cb1, "mode"); connect(s4, "const", cb1, "clipTo"); connect(ck1, "imageOut", cb1, "fgImage"); in2 connect(li1, "imageOut", cb1, "bgImage"); in1 in3 connect(cb1, "imageOut", si1, "imageIn"); connect(s5, "const", si1, "fileName"); Operation ); 0 out1 out2 This script generates the graph shown in Figure 3. String: String: Color: Figure 2: Node structure. Real: Real: bg.png fg.png rgb(0,1,0) 0.05 0.10 const const const const const Each node can have input and output ports (see Figure 2). An input port is merely a pointer to an output port of an- fileName fileName other node. All input ports must be connected to something. LoadImage LoadImage Output ports also contain the data or results of the opera- imageOut imageOut tion of its node. Output ports can remain unconnected. imageIn keyColor tolNear tolFar ChromaK String: String: ey over f Nodes without any input ports are usually generators of con- imageOut const const stants, while nodes without output ports usually save or dis- play the results. fgImage mode clipTo String: Combine bgImage out.png Each port has a data type associated with it. If the types imageOut of an input and an output port don’t match, no connection const between them can be made. imageIn fileName SaveImage 5. GRAPH CREATION Figure 3: Graph example (image compositing). Graphs can be created using our domain-specific language. Every operation gets its own constructor, which returns a The generated graph loads two images (bg.png and fg.png), node handle and automatically adds the newly created node performs chroma keying on the foreground image, overlays the result over the background image, and saves the final composite into out.png. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 27 6. GRAPH EXECUTION Figure 5 shows an example image generated with our system. During execution, each operation gets a parameter, which Rendering and compositing of elements in the picture was tells it for which moment in time (t ∈ made entirely using our system. N0) it should per- form that operation. This is useful in graphs that generate animations, in which case t represents the frame of the ani- mation. The currently implemented ways of executing a data-flow graph are: • single execution — executes the graph only once for a given t • multiple execution — executes the graph for many given t’s • continuous execution — executes the graph until interrupted by the user (t starts at 0 and rises mono- tonically). To execute the graph, we convert operations within it into tasks for our batch job scheduler and set their priorities ac-Figure 5: George, I think we’re lost. cording to the dependencies from the graph. 8. CONCLUSION The main goal was to create a generalised and flexible soft- 4 ware package for performing arbitrary operations on any data using a data-flow approach. To this end we developed a system for parallel execution of data-flow graphs with its 3 own domain-specific language, created to facilitate the im- plementation of graph operations. As an example of the sys- 2 tem’s flexibility, we implemented various compositing, ren- dering, and general image processing operations [3]. 1 Although the system is already in a usable state, many fu- 0 ture improvements could be made. Implementing a graph- ical user interface for creating graphs would be most ben- Figure 4: Decomposing the graph into levels. eficial, as it would make the system more accessible and easier to use. The scheduler could be improved to handle distributed scheduling. As shown in Figure 4, we can decompose the graph into levels. Operations which share the same level are mutually The system gets more useful as more operations are added. independent and can be executed in parallel. Possible areas for future development of operations include: digital signal processing, computer vision, data mining, and The parallel schedule is generated using a greedy approach, hardware control. Utilising operations from these areas, the starting at the nodes with no outputs and working upward system could collect data from sensors, analyse it, and visu- to satisfy dependencies. alise the results. It could also be used to control a robot or industrial machinery. The batch job scheduler is currently custom-made. It spawns as many threads as there are processor cores in the system. 9. REFERENCES These threads then obtain work from a global priority queue, [1] J. W. Backus, F. L. Bauer, J. Green, C. Katz, which contains the tasks. Threads with higher priority have J. McCarthy, A. J. Perlis, H. Rutishauser, K. Samelson, to finish before those with a lower one can be run. B. Vauquois, J. H. Wegstein, A. van Wijngaarden, and M. Woodger. Revised report on the algorithm language ALGOL 60. Communications of the ACM, 6(1):1–17, 7. OPERATIONS Jan. 1963. Implemented operations, sorted by area of usage: [2] R. Brinkmann. The art and science of digital compositing. Morgan Kaufmann Publishers Inc., 1999. • compositing [2] — CorrectGamma, Combine, Pre- [3] A. Bukošek. A system for parallel execution of multiply, Unpremultiply, ToneMap, ChromaKey, Dif- data-flow graphs. FRI, University of Ljubljana, 2013. fKey, Resize, Crop, AffineTransform, Convolve2D • [4] M. Pharr and G. Humphreys. Physically Based rendering [4, 5] — Sphere, LoadTriangleMesh, Cam-Rendering, Second Edition: From Theory To era, LoadSpectrum, BRDFMaterial, LambertianBRDF, Implementation. Morgan Kaufmann Publishers Inc., AshikhminBRDF, ApplyEmission, ApplyNormalMap, 2010. SceneGraphNode, AddChild, PathTrace • general — LoadImage, SaveImage, LoadFrame, Save- [5] P. Shirley and R. K. Morley. Realistic Ray Tracing, 2nd Frame edition. A. K. Peters, Ltd., 2003. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 28 An Algorithmic Framework for Real-Time Rescheduling in Public Bus Transportation ∗ Balázs Dávid János Balogh University of Szeged University of Szeged davidb@jgypk.u-szeged.hu balogh@jgypk.u-szeged.hu ABSTRACT complicated disruptions, but this solution is usually really In this paper, we describe an algorithmic framework for the expensive. Operators might not be able to solve compli- vehicle rescheduling problem. This framework is based on cated disruptions in short time, but a decision support sys- problems arising in the operative planning of real-world pub- tem could provide them with helpful suggestions in such a lic transportation companies. case. Categories and Subject Descriptors The goal of our paper is to introduce a solution framework H.4 [[]: Information systems applications]; H.4.2 [[]: Types for the rescheduling problem in public transportation, which of Systems]: Decision support (e.g., MIS), Logistics could be used in aiding company operators responsible for managing disruptions. An important requirement for this General Terms framework is to guide the solution process regardless of the applied solution method. As the problem has to be solved Vehicle scheduling, Disruption management, Public trans- in almost real time, one of the main requirements for such port a system is to provide suggestions quickly. Optimization systems are already present for long-term planning [2, 11, 1. INTRODUCTION 12], but these are all built around given solution methods. Based on the available transportation data, the vehicle- and However, the only paper to our knowledge that deals with a driver schedules of a transportation company are created in system for bus rescheduling is the one by Li et al. [8]. advance. A unique schedule is created for each day (or day- type) of the planning period in such a process. A daily sched-Although there are published mathematical models for the ule of a company is made up of vehicle and driver duties. A problem [6, 10], these cannot be solved in short enough time. duty is considered as a series of tasks, which the correspond-This leads to the design of efficient heuristic methods [6, 9], ing vehicle and driver has to complete in the given order. which are able to provide quick solutions. The system we The most important tasks of these duties are the timetabled present in this paper is designed to work with any kind of trips, which come from the timetable of the company and solution method, and gives results for the problem based on have to be executed. needs of the operators, which they can control through dif- ferent parameters. The heuristic methods we use to demon- However, there are several difficulties that can arise while strate the system were published in [6]. executing such a schedule in real-life: a problem with the vehicle itself, sickness of the driver, or other such things can The outline of our paper is the following: first, we give a make the pre-planned schedule infeasible. These unforeseen quick overview of the rescheduling problem and disruption difficulties are called disruptions. If a disruption arises, a management. We present the basic regulations, and give new feasible schedule has to be created, usually by resche- a list of these that can be regarded in a flexible manner duling the old one. through parameters and penalizing. Based on these design thoughts, we introduce the suggested framework itself. Fi- Disruptions have to be managed almost immediately, which nally, we give some ideas of solution methods that we pro- is usually the task of a company operator. Companies usu- pose for the framework. ally have a backup vehicle with which they address more ∗supervisor 2. TRANSPORTATION SCHEDULING AND DISRUPTION MANAGEMENT The daily schedule of a transportation company consists of several duties. Every duty is executed by a driver, and also has a corresponding vehicle. Each duty is a series of tasks that have to be carried out in the given order. The most common tasks are the ones corresponding to the trips of the timetable, and the so-called deadhead trips, which are re- sponsible for moving the empty vehicle from one location to another. There can also be several different vehicle specific matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 29 tasks (eg. parking, refueling) and driver specific tasks (eg. give a list of the most important rules in both groups. Note, breaks, administration). that these are only the most common regulations. Other ones might arise depending on a specific company or coun- 2.1 Disruption management try. When a disruption happens, usually one of the vehicles in service becomes unavailable for a period of time. This leads • Vehicle regulations to the pre-planned schedule becoming infeasible, as one or more of the tasks are not executed by a vehicle anymore. – Vehicle depot and trip compatibility : As we men- These trips are addressed from now on as disrupted trips. tioned earlier, vehicles can be classified into de- A vehicle schedule becomes infeasible if it contains such pots. Trips can only be serviced by a fixed set of trips. Companies usually have a backup vehicle and driver depots, and this should be respected when creat- ready, which are dedicated to such situation, but this solu- ing a new schedule to manage the disruption. tion might not be the best one. – Vehicle type and trip characteristics: Similarly to depot compatibility, some trips can only be exe- Moreover, most real-life cases have a so-called multiple depot cuted by vehicles of a certain type. For example, problem. There are several different depot locations and/or trips that are carried out between different cities, vehicle types, and every trip can only be serviced by vehicles or have a long distance, must have a bus with belonging to a set of pre-determined depots. Because of special equipment (eg. air conditioning). this fact, our problem is NP-hard in the case of 2 or more depots. The vehicle rescheduling problem can be reduced • Driver regulations to the vehicle scheduling problem, which was proven to be – Maximum driving time: Each driver has a maxi- NP-hard by Bertossi et al. [1]. However, the problem itself mum daily driving time, which they can not ex- should be solved as quickly as possible, and order should be ceed. restored with a new feasible solution. – Driver breaks: After given time periods, drivers have to be assigned breaks. Moreover, these breaks 2.2 Related work have to be assigned at specific geographical loca- To our knowledge, the literature regarding bus disruption tions. management is scarce. Disruption management connected – Maximum working time: Similarly to driving time, to different transportation fields has been researched for a the daily working time of drivers is also maxi- longer period of time. mized. This does not equal driving time, as driver duties have other events as well, which do not re- The earliest papers regarding disruption management were quire a vehicle (eg. administration). published about the airline industry. An overview paper by Clausen et al. can be read in [4, 3]. A bit younger field is disruption management is railway transportation. Some Besides these regulations, some structural modifications should results regarding this area can be read in [7]. also be considered in the schedule. For an illustrative exam- ple, refer to Figure 1. Both airline and railway disruption management differ from bus disruption management in their underlying structure. While buses are quite easy to move around with the help of deadhead trips between different geographical locations, the deadheading of airplanes is mainly prohibited by the high arising cost, and railway deadheading is subject to the underlying limited rail capacities. As we have mentioned, the problem of bus rescheduling as defined above was only considered by Li et al. In their pa- pers, they studied the single depot BRP. They give a quasi- assignment model and an auction algorithm for the prob- lem in [9], and a network flow model is described in [10] which is solved with the help of Lagrangian relaxation. In [8], they also describe a possible decision-support system for this problem, which is illustrated with the help of a small real-world instance. 2.3 Real-life criteria Figure 1: An illustrative example In a real-life application, there are several rules and con- straints that make the problem more complicated. There On the above figure, a disrupted daily schedule can be seen, are regulations that cannot be violated, while the violation of with a disruption represented by a vertical line. There is others should be penalized. As we mentioned earlier, we can one disrupted trip J0. Depending on some circumstances, define daily schedules of both vehicles and drivers, thus we we can give several solutions for the problem. Here are a classify our rules into two groups. In this subsection, we will few examples: matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 30 • If trip J0 is compatible with duty F1, and there is • An integer parameter that gives the maximum number enough time to insert it to the available gap (together of feasible schedules which can be modified. with any necessary deadhead trips), then we can solve • the problem. A parameter that limits the maximum length of the newly introduced deadhead trips. • If we want to insert J0 to duty F2, we have to remove • trip J A parameter, which gives the latest point in time, by 2. There are several different places we can insert this newly removed trip. It might fit into the gaps in the end of which the algorithms should not modify any duties F more feasible schedules. 1 or F5 (if compatible, and there is enough time for deadheads). We might also be able to insert • A parameter on the number of suggestions (feasible them to duties F3 or F4. However this would mean we solutions). have to remove trip J3,2 or J4 respectively, and finding a new duty for them. • A parameter on the maximum running time. • Following the above logic, there are several different scenarios that utlizie removing trip from a duty, and As it can be seen in the list of proposed parameters, there inserting it to another one. are none that correspond driver rules. Driver regulations are very strict, and most of them are defined by the EU, • We might delay trip J3,2 in duty F4. This can give us and cannot be violated by any means. Depot compatibility a big enough gap to insert our trip J0 (if it is allowed and vehicle type correspondence might also be strict, but we by compatibility and deadheads). decided to let the operator of the system decide about their violation. As it can be seen from the above examples, there are some other constraints as well that are connected to the original 3. THE SOLUTION FRAMEWORK duties or tasks (eg. modifying starting time, or removing a In the previous sections, we presented our basic ideas behind trip from its original duty). the methodology for the problem. We described a framework that does not depend on the solution algorithm it executes, A solution method for this problem need to know the ex- and can be controlled through a list of different parameters. tent to which it can violate the constraints we introduced In Figure 2, we give a layout of the different parts of the above, and it also needs to know the hardness of the con- system. straint. This information can be given easily with the use of parameters. 2.4 Parameters In this subsection, we introduce parameters for the differ- ent rules and constraints given in the previous subsections. These parameters need to be considered by any algorithm solving the bus rescheduling problem: Figure 2: The structure of the framework • A binary parameter that allows the violation of depot- compatibilities. A penalty parameter for each viola- The input for the system consists of two parts. One part tion of depot-compatibility is also needed. is the list of parameteres, that we described in the pre- vious section. The other part is the problem data itself, • A binary parameter that allows the violation of ve- containing the problem specific data tables. The type and hicle type correspondence. A penalty parameter also structure of these tables of course can vary between differ- has to be introduced for each violation of vehicle type ent implementations of such a system, but it is important correspondence. that it contains the disrupted schedule and the disrupted • trips. For the remainder of the paper, we will refer to A binary parameter that allows the introduction of the pair of {disruptedschedule, setof disruptedtrips} as a lateness. A penalty parameter also has to be intro- conf iguration. duced per 1 unit (minute) of lateness. • A binary parameter that allows the movement of trips The input determines the starting configuration of our prob- between feasible schedules. A penalty parameter is lem, which is a schedule that contains feasible duties, and a also needed that gives the cost of each such move. set which contains the disrupted trips, that are not executed currently by any vehicle. A configuration is supposed to be • An integer parameter that limits the maximum amount feasible, if its set of trips is empty, and all the duties in its of lateness which can introduced by the algorithms to schedule are feasible and do not violate any regulations or the schedule. parameters. • An integer parameter that limits the maximum amount of lateness an algorithm can introduce to a single event. Once all the input is read, it is then transferred to the main module of the system, which we call the Rescheduling Black • An integer parameter that limits the maximum amount Box (RBB). This is the part that carries out the solution of lateness an algorithm can introduce to one duty. process, and consists of two parts: matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 31 • The Solver Library (SL) manages the different solution The second method is a local search heuristic (where a tabu methods that are built into the system. The system list can also be applied effectively). This method creates can have any number of implemented solution meth- a (probably infeasible) pseudo-schedule from the disrupted ods, and the desired method can be invoked by pa- trips, without any regards to the regulations. In each it- rameter setup. If there is a possibility to parallelize eration step of the search, either a move or a swap opera- solution processes, multiple methods can also be exe- tion is executed. A trip is either moved from a schedule to cuted at once. another, or is swapped with events from another schedule. One important rule is that trips cannot be moved onto the • The output of a solution method is sent to the Solu- pseudo-schedule. The local search finds a feasible solution, tion Collector (SC). This sub-module is responsible for if the pseudo-schedule is empty. managing feasible solutions. If more solution methods are running in parallel, all of them send their results Test results of the framework in real life have been promis- to the SC. The SC then filters any duplicate solutions, ing. All instances have a short running time (they all finish and also gives an ordering of the remaining ones based under 1 minute even for bigger instances). It can also be their cost. seen, that these methods will find multiple feasible solution while exploring there solution space. Because of the SC in the system, the solutions can all be saved, and the ones that Once all solution algorithms finish their execution, or the de-best suit the parameters of the operator can be chosen at sired maximum runtime is reached, the SC returns a number the end. of feasible solutions. The desired number can be given by the operator as an input (number of suggestions), and their Besides the real instances, we also tested our system on ran- order is determined mainly by their cost, but they can also domly generated data. In Table 1 and Table 2, we present be filtered considering the different parameters (eg. ask for the results of the above methods. The tables give the in-ones with no lateness, if possible). The operator receives stance name, the number of original trips in the schedule, this data, and can use these to decide on how to solve the the number of the disrupted trips, and the running time arising problem. of the recursive and local search methods on seconds. All instances in Table 1 are generated using the method in [5]. 4. A REAL LIFE APPLICATION In this section, we briefly describe a real-life application of Table 1: Solution on random instances from [6]. the above system. As we mentioned earlier, solution for the Instance Depots Trips Disrupted Rec. Loc. rescheduling problem has to be adequatly quick, as the order trips (s) (s) in transportation has to be restored as soon as possible. random1 2 12 1 0.02 0.001 Because of this, the implemented heuristic methods must random2 4 100 1 0.05 0.004 have a running time of at most a couple of seconds to be random3 4 500 1 0.08 0.01 acceptable. random4 4 800 1 0.08 0.05 Also, because of the flexibility of the SC module, it is useful to provide solution methods which find multiple feasible so-We also present two real-life test results from the city of lutions during their runtime. This gives the operator more Szeged, Hungary. The smaller instance is only for a district suggestion options to choose from. A simple approach that of the city on a workday, while the bigger instance is that of we implemented in our system is a naive search, which ba-a Saturday. sically finds all the possible trivial insertions in the starting configuration (if any), and inserts them to the solution col-lector. If there is any trivial insertion, it is highly likely that Table 2: Solution on real-life instances. it will be the cheapest solution with regards to any of the parameters. Instance Depots Trips Disrupted Rec. Loc. trips (s) (s) In one of our previous papers, we formulated a mathematical szeged small 4 206 2 0.399 0.548 model for the bus rescheduling problem [6]. The size of the szeged sat 4 1983 1 0.037 0.059 problem is big, and it cannot be solved efficiently even for smaller random instances. However, we also proposed two solution heuristics in the same paper. These methods have As it can be seen from the above tables, all test results have also been implemented to the system. a good running time for both of our heuristics. The methods also returned several suggestions for the test cases. The first method is a recursive search, which uses the ini- tial configuration as an input, and inserts a disrupted trip 5. CONCLUSIONS AND FUTURE WORK into one of the schedules in each step. If the disrupted trip The long-term plans of public transportation companies are cannot be inserted trivially, then other trips are moved from disrupted on a daily basis. These disruptions have to be the schedule to the disrupted set. To avoid exponential ex- addressed as soon as possible. In this paper, we introduced plosion, we gave a depth limit to the recursive calls, and a decision support framework for the rescheduling problem also have a function that cuts certain branches of the search in public bus transportation. tree. The recursive search finds a feasible solution, if the disrupted set is empty. We analyzed both vehicle and driver regulations for the daily matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 32 schedules, and determined the parameters that have to be Symposium on Operational Research in Slovenia considered during the rescheduling process. The framework SOR’11, pages 341–345, 2011. we described is independent of the solution method, and is also able to run different solution methods in parallel. It can also store and return multiple solutions, which gives the flexibility to the company operator to choose according to his or her needs. 6. ACKNOWLEDGMENTS This work was partially supported by the European Union and co-funded by the European Social Fund through project HPC (grant no.: TÁMOP-4.2.2.C-11/1/KONV-2012-0010). 7. REFERENCES [1] A. A. Bertossi, P. Carraresi, and G. G. Gallo. On some matching problems arising in vehicle scheduling models. Networks, 17(3):271–281, 1987. [2] J. Békési, A. Brodnik, D. Pash, and M. Krész. An integrated framework for bus logistic management: case studies. Logistik Management, pages 389–411, 2009. [3] J. Clausen, A. Larsen, J. J. Larsen, and N. J. Rezanova. Disruption management in the airline industry-concepts, models and methods. Computers & Operations Research, 37(5):809–821, 2010. [4] J. Clausen, A. Larsen, and J. Larsen. Disruption management in the airline industry - concepts, models and methods. Technical report, Informatics and Mathematical Modelling, Technical University of Denmark, DTU, 2005. [5] B. Dávid and M. Krész. Application oriented variable fixing methods for the multiple depot vehicle scheduling problem. Acta Cybernetica, 21(1):53–73, 2013. [6] B. Dávid and M. Krész. A model and fast heuristics for the multiple depot bus rescheduling problem. PATAT 2014: Proceedings of the 10th International Conference of the Practice and Theory of Automated Timetabling, pages 128–141, 2014. [7] J. Jespersen-Groth, D. Potthoff, J. Clausen, D. Huisman, L. G. Kroon, G. Maróti, , and M. N. Nielsen. Disruption management in passenger railway transportation. Technical report, Erasmus University Rotterdam, 2007. [8] J.-Q. Li, D. Borenstein, and P. B. Mirchandani. A decision support system for the single-depot vehicle rescheduling problem. Computers & Operations Research, 34(4):1008–1032, 2007. [9] J.-Q. Li, P. B. Mirchandani, and D. Borenstein. The vehicle rescheduling problem: Model and algorithms. Networks, 50(3):211–229, 2007. [10] J.-Q. Li, P. B. Mirchandani, and D. Borenstein. A lagrangian heuristic for the real-time vehicle rescheduling problem. Transportation Research Part E: Logistics and Transportation Review, 45(3):419–433, 2009. [11] K. Nurmi, J. Kyngäs, and G. Post. Driver rostering for bus transit companies. Engineering Letters, 19(2):125–132, 2011. [12] A. Tóth and M. Krész. A flexible framework for driver scheduling. Proceedings of the 11th International matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 33 matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 34 Traffic Sign Symbol Recognition with the D2 Shape Function Simon Mezgec and Peter Rogelj Faculty of Mathematics, Natural Sciences and Information Technologies University of Primorska Koper, Slovenia simon.mezgec@student.upr.si, peter.rogelj@upr.si ABSTRACT Piccioli presented a robust method for traffic sign detec- This paper describes our application of a novel method in tion and recognition [10]. One year later, de la Escalera the field of traffic sign recognition - the D2 shape function. developed an algorithm that focuses on the recognition of We first give an overview of the advances and research in traffic signs - the algorithm uses neural networks for the this field. We then describe the D2 shape function that was classification of traffic signs [4]. De la Escalera built on originally used to classify 3D models of various objects - that work in his 2003 article where he used a genetic algo- because of its robustness we propose its use in the field of rithm for the detection step, which allowed further invari- traffic sign recognition. We describe how the program that ance [3]. In 2000, Pacl´ık proposed his own algorithm for traf-was developed using this method works and present the re- fic sign classification using the Laplace probability method sults on a testing set of self-acquired speed limit traffic signs, and an Expectation-Maximization algorithm to maximize evaluate its performance and compare it to the performance the likelihood function [9]. In 2004, Fang developed a traf- of statistical moments. fic sign detection and recognition system that is based on a computational model of human visual recognition pro- General Terms cessing [5]. Two years later, Gao improved upon the hu- man vision model recognition using separate models for ex- Algorithms tracting color and shape information, respectively [6]. In 2007, Maldonado-Bascón described a traffic sign detection Keywords and recognition system that is based on support vector ma- Symbol recognition, traffic signs, D2 shape function chines which is invariant even to partial occlusions [7]. In the same year, Cyganek also used support vector machines in his 1. INTRODUCTION system for traffic sign detection, whereas the recognition is While driving a vehicle, the driver has to be constantly aware performed using neural networks [2]. Finally, in 2009, Baró of a number of factors, one of which are traffic signs. These presented a system that performs traffic sign detection with signs can have specific properties, such as color and shape, the boosted detectors cascade and traffic sign recognition which allow their recognition with the use of Computer Vi- with a forest of optimal tree structures that are embedded sion. So it is no surprise that this is one of the important in the ECOC (Error-Correcting Output Code) matrix [1], steps towards the automatization of driving vehicles. Once whereas in the next year, Ruta added a tracking procedure the traffic sign is detected, we would usually like to know between the steps of traffic sign detection and recognition - what kind of symbol (if any) the traffic sign contains. Recog-this tracker predicts the position and scale of the sign can- nition is useful for many purposes: the driver could forget didate to reduce computation. That system uses Color Dis- what the last speed limit he drove by was, an inexperienced tance Transform for the classification of traffic signs [11]. driver could see a traffic sign for the first time and he would like to know its meaning etc. It is therefore apparent that In this paper we present an alternative method for traf- traffic sign recognition has a real-world use. fic sign recognition, based on the D2 shape function that promises high robustness required for such applications. The There have been many different methods proposed in the method is presented in the next Section and its practical ap- field of traffic sign detection and recognition in the last two plication in Section 3. decades. Let us mention some implementations that at- tracted the most attention from the community. In 1996, 2. METHOD DESCRIPTION For the recognition of traffic sign symbols we use the D2 shape function - a method first described by Osada in his 2001 article [8] - it is therefore a fairly new Computer Vision method. The function samples random pairs of points and creates a histogram of distances between these point-pairs. Figure 1 shows how the D2 shape function creates the dis- tance histogram between pairs of points on a cup [12]. The idea behind the function is that with a large enough matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 35 Figure 2: On the left we have the original segmented image. We then perform two morphological opera- tions - first erosion (middle image) and then dila- tion (right image). The result is the symbol with the original size but without the noise. 3.1 Preliminaries First we read the image, change its color space from RGB to Figure 1: The distances between randomly gener- HSV and perform the segmentation of the image by using the ated pairs of points are converted to a distance his- mean of all the brightness values - if a pixel has a brightness togram with the appropriate use of normalization. value smaller than the mean brightness of the image, we set The histogram represents the D2 shape function for the corresponding pixel of the binary image to 1, otherwise the cup and therefore defines the object descrip- to 0. Once the image is successfully segmented, we have to tion.1 make sure that aside from the traffic sign symbol we did not detect any anomalies that are not part of the symbol. These errors can occur due to different light reflections, dirt number of sampled point-pairs on the object we get such a or dust on the camera lens as well as on the traffic sign etc. distribution of the distance histogram that is specific for the object, which means that we are able to recognize that ob-We use two morphological operations for noise reduction - ject. In our case we compare the distance histogram with the erosion and dilation. With erosion, we shrink all the areas in histograms created from each of the images in the learning the image and as a result the noise is removed, but we also set. The procedure itself is therefore simple - we randomly get smaller areas of the detected traffic sign symbol. Because generate a large number of point-pairs on the surface of the we only want to delete the noise and preserve the symbol object and compute distances between them. It is worth itself, we then use the morphological operation of dilation noting that aside from the D2 shape function, Osada intro- which expands the symbol to its primary size. Figure 2 duced other randomized shape functions [8]: shows the effect of these operations. • A3: computing the angle between three randomly gen- In this paper we focus on speed limit traffic signs, where erated points on the surface of the 3D model of the the symbol can be divided into numerals that are recog- object, nized separately, which is why the next step is searching • D1: computing the distance between a fixed point and for connected components. This search is important for the a randomly generated point on the surface of the ob- recognition of numerical symbols, because with it, we can ject, divide the number into numerals and since the numerals do not overlap, we can safely assume that one connected com- • D3: computing the square root of the surface of the ponent represents one numeral. The reasoning behind this triangle, defined by three randomly generated points search is that we can achieve greater recognition accuracy on the surface of the object, by recognizing each of the numerals than by recognizing the • D4: computing the cube root of the volume of the whole number at once because we do not have to model ad- tetrahedron, defined by four randomly generated points ditional dependencies between the numerals which can be on the surface of the object. different from traffic sign to traffic sign - the most obvious of these is the distance between the numerals in the num-Out of all these functions, the D2 shape function turned out ber. In the case of non-numerical symbols, the components to be the most robust [12], which is why we decided to use would be recognized separately as well. it for our application in traffic sign symbol recognition. 3.2 Computing the D2 shape function 3. PROGRAM DESCRIPTION At the beginning of the D2 shape function computation we The program for traffic sign symbol recognition was devel- visit each of the connected components (except for the first oped in the Matlab programming language and in this Sec- one which represents the background). The first step is the tion we explain how it works. Here we do not focus on traffic generation of random x and y coordinates between values sign detection and assume that traffic signs are already seg- 1 and the image height for the x coordinate and between mented and symbols cropped out of them. The detection 1 and the image width for the y coordinate - the number and cropping is performed within the program for traffic of generated coordinates is defined by an input parameter. sign detection which was previously developed and is not a Then we go through the array of generated coordinates and part of the traffic sign recognition program. for each pair of coordinates we check if the pixel with that coordinates lies on the numeral that is currently being rec- 1The image is taken from Wohlkinger’s paper [12]. ognized. If it does, we copy this pair into a new coordinate matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 36 Figure 3: Learning set of images with numerals in the standard font of Slovenian traffic signs. Each numeral is saved in its own image. Figure 4: Testing set of speed limit traffic signs that array where we keep only the pixels that lie on the numeral. was used for the definition of the parameters and testing the performance of the traffic sign symbol After that there is the main part of the D2 shape func- recognition program. tion computation. We first check if we managed to find a sufficient number of coordinate pairs from pixels positioned on the numeral - if we did, we continue with the computa- distances between pairs of bins at the same positions. We tion and if we did not, we abort the computation for the use the 1-nearest neighbor method to classify the numerals recognition of the current region and continue with the next in the image because we only have one sample per numeral one. The reason for this check is further insurance from the and one numeral represents one class. presence of noise - if we did not find a sufficient number of coordinate pairs, then we can conclude that the detected 4. RESULTS area is too small and consider it being noise. For the definition of the parameters and testing the perfor- mance of the traffic sign symbol recognition program, we Then we visit the first half of the array with randomly gen- acquired a testing set of 46 speed limit traffic signs. Figure erated x and y coordinates and in each step we compute the 4 shows this testing set of traffic signs. It contains images of Euclidean distance between a pixel in the first half of the traffic signs taken under different conditions - different light-array and a corresponding pixel in the second half. When ing, orientation of the sign, size of the sign, quality of the we are done with the computation, we sort the distances by image etc. This difference between the conditions of each size and normalize them so they all have values from 0 to image was achieved on purpose - to prove the robustness of 1 - we do that by dividing every distance with the maxi- the program. mum distance obtained in the analyzed region. The latter is necessary for the comparison of images with different sizes. The procedure of defining the parameters was performed As a result we therefore return a matrix of these normalized with the help of auxiliary functions and the final values of and sorted distances for each of the detected numerals. the parameters are saved in a configuration file. They are as follow: 3.3 Learning and comparing the input image • number of randomly generated pixels for the D2 shape with the learning set function computation: 100000, Apart from the main function of the traffic sign symbol • percentage of randomly generated pixels that have to recognition program, we use two more auxiliary functions lie on the symbol: 0.1 (10%), to properly recognize the traffic sign - one for teaching the • number of learning iterations for the images from the program on the images from the learning set and one for learning set: 5, the recognition of the input image using the images from • number of iterations of comparison between the input the learning set. Figure 3 shows the learning data set. It image and images from the learning set: 10, is important to note that the font for Slovenian traffic signs • number of bins in distance histograms: 20. is standardized which means that one sample per numeral is sufficient. We will describe the learning process and the Once the parameters were defined, we ran the traffic sign comparison in this Section. symbol recognition program on the testing set of traffic signs to test the performance of the program. The final result is For the learning, we go through the 10 images from the learn- 42 of 46 accurately recognized speed limit traffic signs, which ing set and for each of them we run the symbol recognition is approximately a 91.3% accuracy. code as many times as is defined by the input parameter and we then average the results. For the comparison of the 5. COMPARISON WITH STATISTICAL input image with the learning set, we repeat the recogni- tion multiple times so that we increase the probability of MOMENTS an accurate recognition - we then return the result that was When developing the program for traffic sign recognition, recognized in most of the repetitions. It should be noted we first tried using statistical moments as the recognition that we convert distances into histograms, which is needed method. In particular, we used a combination of normalized because a direct comparison of distances would provide in- central moments, which are invariant to the position as well accurate results as the distance values are too specific. We as to the size of the object, and Hu moments, which are fur- compare each bin of the distance histogram of the current ther invariant to rotation. The framework of the prototype numeral on the input image with the histogram of the cur- is very similar to the program described in Section 3, and rent image from the learning set and compute the Euclidean the learning and testing sets of traffic sign images are the matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 37 same, so the only difference is in the method that was used conclude that the D2 shape function seems well suited for for each prototype. We can therefore compare the results of this field of Computer Vision. the two traffic sign recognition programs. 7. REFERENCES After testing the statistical moments prototype, it quickly [1] Xavier Baró, Sergio Escalera, Jordi Vitrià, Oriol Pujol, became obvious that it would not be a good fit for traffic and Petia Radeva. Traffic Sign Recognition Using sign recognition - we tested the program several times, every Evolutionary Adaboost Detection and Forest-ECOC time with different maximum orders of statistical moments Classification. IEEE Transactions on Intelligent (from 5 all the way up to 40). The problem was that the Transportation Systems, 10(1):113–126, March 2009. calculated values of statistical moments did not yield any [2] Bogus law Cyganek. Circular road signs recognition discernible patterns, which is the basis for object recogni- with soft classifiers. Integrated Computer-Aided tion. The prototype thus behaved unpredictably and often Engineering, 14(4):323–343, 2007. recognized a false symbol. After using the same program [3] Arturo de la Escalera, José M. Armingol, and Mario framework to develop a prototype using the D2 shape func- Mata. Traffic sign recognition and analysis for tion and getting promising results, we concluded that the intelligent vehicles. Image and Vision Computing, fault was not in the way we developed the program or in 21(3):247–258, March 2003. the set of acquired images, but in the method used. We [4] Arturo de la Escalera, Luis E. Moreno, Miguel A. therefore abandoned statistical moments and moved on to Salichs, and José M. Armingol. Road Traffic Sign search for more suitable methods, until we found the D2 Detection and Classification. IEEE Transactions on shape function. Industrial Electronics, 44(6):848–859, December 1997. While statistical moments have a wide range of use, we found [5] Chiung-Yao Fang, Chiou-Shann Fuh, Pei-Shan Yen, the D2 shape function to be much more suitable for traffic Shen Cherng, and Sei-Wang Chen. Road traffic sign sign recognition. detection and classification. Computer Vision and Image Understanding, 96(2):237–268, November 2004. [6] Xiaohong W. Gao, Lubov N. Podladchikova, 6. CONCLUSION Dmitry G. Shaposhnikov, Keum-Shik Hong, and Shape functions up until now were not used in traffic sign Natalia A. Shevtsova. Recognition of traffic signs recognition - we presented a prototype using the D2 shape based on their colour and shape features extracted function that recognizes traffic signs. It should be noted using human vision models. Journal of Visual that while the testing set, presented in Section 4, is a col- Communication and Image Representation, lection of traffic sign images that are as varied as possible, 17(4):675–685, August 2006. therefore maximizing the robustness of the traffic sign recog- [7] Saturnino Maldonado-Bascón, Sergio Lafuente-Arroyo, nition program, the set is still quite small - only 46 images. Pedro Gil-Jiménez, and Hilario Gómez-Moreno. The problem is that there is no publicly available deposi- Road-Sign Detection and Recognition Based on tory or collection of Slovenian traffic sign images and the Support Vector Machines. IEEE Transactions on set therefore had to be acquired by hand, which, due to the Intelligent Transportation Systems, 8(2):264–278, June fact that we want to capture as many different traffic signs 2007. as possible, is very time-consuming. That is why we focused [8] Robert Osada, Thomas Funkhouser, Bernard only on speed limit traffic signs, but we could easily expand Chazelle, and David P. Dobkin. Matching 3D models the scope to all traffic signs as the D2 shape function is not with shape distributions. SMI 2001 Proceedings of the limited to any number of components. To add a new type International Conference on Shape Modeling and of traffic sign, all we would need to do is to add the corre- Applications, pages 154–166, May 2001. sponding traffic sign symbol to the learning set. [9] Pavel Pacl´ık, Jana Novovičová, Pavel Pudil, and Petr Somol. Road sign classification using Laplace kernel As described in Section 4, the final performance of the pro- classifier. Pattern Recognition Letters, gram is 91.3% accurately recognized speed limit traffic signs, 21(13-14):1165–1173, December 2000. which is an encouraging result and an initiative for further use of the D2 shape function in the field of traffic sign recog- [10] Giulia Piccioli, Enrico De Micheli, Pietro Parodi, and nition and possibly other fields too. It is a robust procedure Marco Campani. Robust method for road sign which also works on very small and low quality images - the detection and recognition. Image and Vision smallest image used for testing had a resolution of 38 × 31 Computing, 14(3):209–223, April 1996. pixels and the program correctly recognized it. It is worth [11] Andrzej Ruta, Yongmin Li, and Xiaohui Liu. noting that this method for recognition is much simpler than Real-time traffic sign recognition from video by traditional methods, in particular comparing to solutions, class-specific discriminative features. Pattern mentioned in the Introduction, where methods like neural Recognition, 43(1):416–430, January 2010. networks and support vector machines were used. In terms [12] Walter Wohlkinger and Markus Vincze. Analysis and of performance we can not directly compare it to these solu- Evaluation of Shape Functions for Object Class tions since we do not have a testing set that would be large Recognition. Proceedings of the 17th Computer Vision enough, but we did get a sense that it is a robust method. Winter Workshop, February 2012. As a final note, the D2 shape function performs much bet- ter than statistical moments for traffic sign recognition, as is described in Section 5. Considering our results we can matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 38 On numerical simulation of filtration gas combustion processes on the shared memory machines. Tatyana Kandryukova Institute of Computational Mathematics and Mathematical Geophysics SB RAS 630090, prospect Akademika Lavrentjeva, 6 Novosibirsk, Russia kandryukova@labchem.sscc.ru ABSTRACT In particular low-speed mode of filtration gas combustion This work is devoted to the searh of the efficient algorithms process, which is under consideration, is the physical basis for the simulation of filtration gas combustion processes. In for construction modern industrial flame arrester, environ-particular, a two-level parallel algorithm based on the ex- mentally friendly burners and other. Therefore numerical plicit finite difference scheme using an adaptive mesh is con-simulation of FGC is the problem of current interest that structed. Two ways of parallelization, namely, the direct has important practical application. usage of OpenMP directives and special distribution of data From the physical point of view the process presents the across threads are carried out. It is numerically shown, that propagation of the region of gaseous exothermic reaction in the last one provides significant performance advantage. chemically inert porous medium, as gaseous reactants are being supplied into the region of chemical transformation Categories and Subject Descriptors [2]. It is implemented as follows. Let there be a tube filled with a porous material, measuring about 10 cm. From one D.1.3 [Programming Techniques]: Concurrent Program- edge of it a combustible gas mixture is supplied at a rate of ming; G.1 [Numerical Analysis]: Miscellaneous; J.2 [Physical v. Then the mixture is ignited, resulting in a flat combustion Sciences And Engineering]: Mathematics and statistics, front, which can either be stationary or move in any direc- Physics, Chemistry tion (at a rate of u), depending on the model parameters (see Figure 1). It occurs due to the heat recuperation. In General Terms the heating zone fresh gas mixture gets hot and the chemical Algorithms, Performance process occurs in the reaction zone that causes heat release. In the relaxation zone high-temperature products of com- Keywords bustion exchange heat with the porous solid and then heat Combustion wave, explicit difference schemes, adaptive grids, transfers through it back to the heating zone by means of parallelization on shared memory, OpenMP-threads thermal conduction. Supervisor Yuri Laevsky Institute of Computational Mathematics and Mathematical Geophysics SB RAS 630090, prospect Akademika Lavrentjeva, 6 Novosibirsk, Russia Figure 1: The scheme of the physical model of FGC 1. INTRODUCTION The phenomena of filtration gas combustion (FGC) had The main difficulty in the simulation of this process is its been discovered in the 70s of the previous century and it very different scales; it is especially hard to deal with the has still not been studied fully. Meanwhile the knowledge of kinetic-diffusion distinction. the properties of FGC processes is essential in solving many In this study the simplest case is considered, when the model problems of chemical technology, ecology, fire safety, etc. is one-dimensional, and still it takes hours to simulate the process that lasts less than a half of a minute. It doesn’t seem possible to proceed to multidimensional case under such conditions. At the present time great expectations for having the opportunity to deal with such tasks are related to the appearance of multi-processor machines and the de- velopment of parallel methods. This work is an expansion of [3] and devoted to the analysis of the possibilities to improve the performance of some com- putational models of FGC in low-speed mode. In particular the introduction of adaptive nested grid [5] and paralleliza- matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 39 tion on shared memory [1] are discussed. All the constructed ηn ηn+1 i−1 − 2ηn i + ηn i+1 i = ηn i + τ (ag algorithms provide us with the solutions of the problem that h2 − coincide with the experimental data. The special method ηn i − ηn i−1 of parallelization on the machines with shared memory was − v h − W (ηn i , H n i )), i = 1, ..., M. (6) developed and applied to the constructed algorithms, that Here M is the number of spatial steps, size of which h is reduces the computational time greatly not only compared chosen from the condition: to the sequential implementation, but to the classical way of parallelization with the direct usage of OpenMP pragmas maxi |T N i (h) − T N i (2h)| < 0.02, (7) as well. maxi T N (2h) i where N is the number of time steps. Size of time step τ must satisfy a stability condition for explicit difference 2. MATHEMATICAL BASIS schemes, that is theoretically unknown, and is fitted exper- 2.1 Mathematical model imentally. The table 1 represents values of K(h) and τ (h) The simplest one-dimensional FGC model in the enthalpy corresponding to different values of h. Here L = 0.1 is length formulation includes three equations: of the tube. ∂T ∂2T Q h L = a + α η), (1) · 2−10 L · 2−11 L · 2−12 L · 2−13 L · 2−14 ∂t s ∂x2 s(H − T − cg τ (h) 2−16 2−17 2−19 2−21 2−23 K(h) 0.159 0.096 0.053 0.028 0.014 ∂H ∂2H ∂H Q = a + α η), (2) ∂t g ∂x2 − v ∂x g (T − H + c Table 1: Values of K(h) and τ (h) corresponding to g different values of h ∂η ∂2η ∂η = a ∂t g ∂x2 − v ∂x − W (η, H). (3) Meanwhile away from the flame front, the solution func- Here ai = λi/ciρi is the coefficient of thermal diffusivity tions are quite smooth and don’t need such small steps (see of the i-th phase, ci, ρi, λi are respectively, specific heat graphic example at Figure 2). Thus, one way to speed up at constant pressure, density and thermal conductivity of i-th phase (i = s for porous solid, i = g for gas), αs = α , α , m - porosity, α - interphase heat (1 g = α −m)cs ρs mcg ρg transfer rate, v - flow rate of the combustible mixture, T ≡ Ts, cgH = cgTg + Qη is full gas enthalpy, where Ti is the temperature of the i-th phase, Q is energy release of the re- Q η) action, W (η, H) = k c 0ηe−E/R(H− g - the chemical reaction rate according to Arrhenius law, η - relative concentration of reactive component of the combustible mixture, k0 - pre- exponential factor, E - activation energy, R - universal gas constant. It is worth noting here that the speed of the wave u is a priori unknown. The Cauchy problem is stated by adding the Dirichlet bound- Figure 2: Graph example of the temperature of solid ary conditions on the left edge and the Neumann ones on the T , obtained numerically right. The initial data correspond to the preheated porous medium. the computation seems to be the introduction of an adap- tive grid. It should depend on the solution from the previous time step and be concentrated in the area of the chemical 2.2 Implementation on a regular and adaptive transformation. It’s carried out as follows. Let there be a meshes full-length regular coarse spatial grid. Implementation of As there is no analytical solution of the problem the suit- one time step of the difference scheme on it with the corre- ability of all the constructed algorithms is estimated by com-sponding big time step provides us with the boundary con- parison with the solution obtained on the regular fine mesh ditions on the next time layer for the small task, that is by using the explicit difference scheme: stated in the vicinity of the chemical reaction zone similarly the large one. For this enclosed task the denser time-space T n mesh is used. The initial data and boundary conditions at T n+1 i−1 − 2T n i + T n i+1 i = T n i + τ (as + h2 the every time layer are obtained by the means of linear in- Q terpolation from the coarse mesh. After the execution of all + αs(Hn i − T n i − ηn c i )), i = 1, ..., M (4) the steps of the subproblem the values of coarse-grid solu- g tions in the nodes used for interpolation are replaced with the corresponding values from the embedded problem (see Hn Hn Hn+1 i−1 − 2H n i + H n i+1 i − H n i−1 Figure 3, where n is the number of the current time layer i = Hn i + τ (ag + h2 − v h of the general problem, p is the number of time steps of the Q + α subproblem, i g (T n i 0 is the number of the node where W gets its − H n i + ηn c i )), i = 1, ..., M (5) g maximum). The dense grid moves according to the prop- matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 40 To reduce the time for the exchange of data flows we pro- pose to distribute the data between the threads. In the case of regular mesh by these words we mean the following pro- cedure. The spatial grid is divided into z pieces, where z is the number of threads used, and each thread gets control over one of the pieces: thread number zero deals with nodes [0, . . . , M/z], the next one – with [M/z, . . . , 2M/z] and so on. The last one deals with nodes [(z−1)M/z, . . . , M]. Each thread creates its local arrays, fills them in accordance with the difference scheme, and outputs the results on its assigned grid nodes. On a regular grid we thus obtain practically independent tasks exchanging only boundary conditions. Figure 3: The scheme of the two-level algorithm In the case of adaptive grids communication becomes more complicated because of the need to transfer information from the nodes of the fine mesh to those of the coarse one. In this case the data distribution algorithm has few stages. After n time steps of general task we have data distributed across z threads: [0, . . . , Mc/z], [Mc/z, . . . , 2Mc/z], . . . , [(z − 1)Mc/z, . . . , Mc] and shared arrays of size (Mf + 1), where Mc, Mf are the numbers of nodes of the coarse and fine meshes respectively. The first stage is the implementation of one time step of coarse grid by every thread separately (see Figure 5). The second stage prepares the data for the Figure 4: The scheme of the movement of the adap- tive fine grid agation of the combustion front: as soon as the maximum point of W function moves by the spatial step of the coarse grid, the fine mesh moves by the same distance (see Figure 4). This approach allows us to take big spatial and time steps outside the chemical reaction zone. The numerical ex- periments show that to achieve the same accuracy grade one may use the coarse grid with the spatial step four times big- Figure 5: The first stage of data distribution algo- ger than the one of fine grid. That implies about sixteen rithm in the case of adaptive grid times bigger time steps. It is obvious thus that the usage of such two-level time mesh is indispensable in the problem inner problem. It is meant that the position of the fine grid like the one in question. with respect to the coarse one is known and ileft, iright are the numbers of the nodes being the edges of the enclosed 3. PARALLELIZATION grid. Initial data obtained by interpolation from the layer n 3.1 Classical way of coarse grid are entered to the shared arrays and then redis-The simplest and the most obvious way to implement the tributed across z threads, forming the layer (n + 0/p) of the program on a multiprocessor node is application of OpenMP enclosed problem (Figure 6). Then the enclosed problem is pragmas to the inner loops of the program. All the arrays in this case are shared and all the threads work to fill each of them. As a result data of different types are held in the storages of different threads and the threads have to communicate intensively. In the case of the regular mesh it concerns to the separate computations of the exponent values and the values of η in the scheme equations of which they are used, as well as to the move from one time layer to another. The situation is far more complicated for the algorithm with the enclosed fine mesh, because in this case there are a lot of bottlenecks besides the mentioned ones, related to the repeated transition from coarse mesh to the fine one and back. Such way of parallelism provides some acceleration for the execution of regular mesh algorithm, but it leads even to the slowdown of the computation when the adaptive mesh is used. Figure 6: The second stage of data distribution al- 3.2 Data distribution gorithm in the case of adaptive grid matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 41 solved in parallel by analogy with the first stage, i.e. steps (n + 1/p), . . . , (n + p/p) are implemented by every thread separately (Figure 7). The last stage is the opposite to the Figure 7: The third stage of data distribution algo- rithm in the case of adaptive grid second one. The solution obtained from the layer (n + p/p) Figure 9: Computational time of all the algorithms of the inner problem is entered to the shared array and then depending on the number of threads z: A - regular necessary data are transmitted to the sertain threads in ac- mesh and direct usage of OpenMP pragmas, B - cording to the data distribution on the coarse grid. Besides regular mesh and usage of parallelism based on the that the position of the adaptive grid is determined at this data distribution; obtained with MVS-10P of JSCC step. RAS of which contains two processors Intel R Xeon R X5670 (12M Cache, 2.93 GHz, 6.40 GT/s Intel R QPI, 6 cores), or on MVS-10P of Joint Supercomputer Center of the Russian Academy of Sciences (JSCC), each node of which contains two processors Xeon R E5-2690 (20M Cache, 2.90 GHz, 8.00 GT/s Intel R QPI, 8 cores). Hyper-threading is turned off, the KMP Affinity environment variable is set equal to ”com- pact”. At the Figure 8 one may see a bar chart of the compu- tational time of each algorithm executed on NKS-30T SSCC. First of all the importance of introduction of the adaptive grid in the case of sequentially running programs is well seen from this picture. The use of this technique reduces computational time by a factor of 22 when the only one thread works. Secondly one could easily compare the efficiency of both methods of parallelization applied. In the case of the regular mesh the direct application of OpenMP directives provides some improvement for the number of threads less or equal 6. As the number of threads grows data amount be- comes less while the quantity of data transmissions increases greatly. So for more than 6 threads there is slowdown of par-Figure 8: Computational time of all the algorithms allel computation based on direct usage of OpenMP prag- depending on the number of threads z: A - regular mas. The proposed data distribution algorithm, in opposite, mesh and direct usage of OpenMP pragmas, B - reg- shows rather good scalability per number of threads up to ular mesh and usage of parallelism based on the data z = 12 in this case. The same data are presented by the distribution, C - adaptive mesh and direct usage of means of the Table 2 as well. OpenMP pragmas, D - adaptive mesh and usage of parallelism based on the data distribution; obtained Analogous results for the regular mesh obtained on MVS- with NKS-30T of SSCC 10P cluster are presented at the Figure 9 and in the Table 3. It is seen that the same calculations take a little less time, but the general situation is still the same up to 16 To get more detailed information about the construction threads. As concerns the algorithm with the enclosed prob- of the adaptive grid and implementation of the respective lem the first method of parallelization is inadmissible since algorithm with the data distribution see [4]. the more threads is used the longer computation lasts. At the same time it is seen that for the algorithm with the en- 4. RESULTS closed problem even the proposed method doesn’t provide Numerical experiments were held for the model with the good scalability though has better effect than the usage of tube length L = 0.1 m for the physical time t = 15.0 s. All OpenMP pragmas. So there is an opportunity that with the the calculations were performed either on NKS-30T Clus- greater number of threads the usage of the adaptive grid ter of Siberian Super Computer Center (SSCC), each node might become ineffective. matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 42 number of 1 2 4 6 8 12 of parallelization. threads z A 400 214 137 112 143 166 6. ACKNOWLEDGMENTS B - 260 125 83.0 68.3 46.1 This work was supported by RFBR (12-01-31046, 13-01- C 17.8 13.7 12.8 14.0 25.4 35.3 00019, 13-05-12051). D - 10.4 6.69 5.82 7.82 8.69 7. REFERENCES Table 2: Computational time of all the algorithms [1] The OpenMP depending on the number of threads z: A - regular R API specification for parallel programming. http://openmp.org. mesh and direct usage of OpenMP pragmas, B - reg- ular mesh and usage of parallelism based on the data [2] V. S. Babkin and Y. M. Laevskii. Seepage gas distribution, C - adaptive mesh and direct usage of combustion. Combustion, Explosion, and Shock Waves, OpenMP pragmas, D - adaptive mesh and usage of 23(5):531–547, September-October 1987. parallelism based on the data distribution; obtained [3] T. A. Kandryukova and Y. M. Laevsky. Numerical with NKS-30T of SSCC simulation of filtration gas combustion. In Proceedings of the 11th International Conference on Mathematical number of 1 2 4 6 8 12 16 and Numerical Aspects of Waves, pages 43–44, June threads z 2013. A 323 178 112 90.2 77.4 113 115 [4] T. A. Kandryukova and Y. M. Laevsky. Simulating the B - 203 103 68.2 54.2 38.7 33.5 filtration combustion of gases on multi-core computers. Journal of Applied and Industrial Mathematics, 8(2):218–226, April 2014. Table 3: Computational time of all the algorithms depending on the number of threads z: A - regular [5] Y. M. Laevsky and L. V. Yausheva. Simulation of mesh and direct usage of OpenMP pragmas, B - filtrational gas combustion processes in regular mesh and usage of parallelism based on the nonhomogeneous porous media. Numerical Analysis data distribution; obtained with MVS-10P of JSCC and Applications, 2(2):140–153, April 2009. RAS 5. CONCLUSIONS The appearance of supercomputers and elaboration of par- allel technology have given renewed impetus for the further development of numerical methods and has opened the doors for the modeling of complex tasks. The problem of FGC is the one of such problems. The specific of its solutions causes the usage of an adaptive mesh to be extremely efficient for the numerical simulation of FGC processes in the case of se- quential implementation. However, one should pay special attention to the parallel realization of such an algorithm, since unfortunate parallelization might not only show un- satisfactory scalability, but even augment the computational time with increasing number of threads. At the same time the fact, that the number of cores on the computational node constantly grows and the influence of parallelism increases, implies the need to construct new algorithms permitting al- most perfect scaling in the number of threads. In the paper a special approach to the issue of shared mem- ory parallelism based on the distribution of data across threads has been proposed. It has been applied to the algorithms both with the regular and the adaptive grids. The results of comparison this method with the classical one, when OpenMP directives is applied to all the internal loops of the program, have shown that the proposed method is more efficient for the task in question and is acceptable to the parallel implementation of the algorithm with the usage of embedded fine mesh. All calculations were performed for the problem with characteristic dimensions and empirically cho- sen parameters of the mathematical model. Solutions pro- duced by all the constructed algorithms have the required accuracy grade of approximation and correspond to physical data. Meanwhile the computational time is reduced by 10 times in the case of the regular mesh and by 3 times in the case of the adaptive one by the use of the proposed method matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 43 matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October 44 matcos-13 University of Primorska Press www.hippocampus.si isbn 978-961-6984-20-1 Not for resale 9 789616 984201 Document Outline Krész, Miklós, and Andrej Brodnik (eds.). MATCOS-13. Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science. Koper: University of Primorska Press, 2016 (Front Cover) Colophone Preface Contents Mónika Vigula ◆ Efficient Implementation of Algorithms for Solving Subgraph Isomorphism Problem in Cheminformatics Marko Grgurovič ◆ Adaptive Algorithms For Dynamic Programming With Applications to Bioinformatics Nándor Németh ◆ An Algortihm for Recognition of Position Repetition in Chess Tine Šukljan ◆ Comparison Between a Cache-oblivious Range Query Data Structure and a Quadtree Gergely Suba and Péter Arató ◆ A New Method for Transforming Algorithm Into VHDL by Starting from a Haskell Functional Language Description Andrej Bukošek ◆ A System for Parallel Execution of Data-flow Graphs Balázs Dávid and János Balogh ◆ An Algorithmic Framework for Real-Time Rescheduling in Public Bus Transportation Simon Mezgec and Peter Rogelj ◆ Traffic Sign Symbol Recognition with the D2 Shape Function Tatyana Kandryukova ◆ On Numerical Simulation of Filtration Gas Combustion Processes on the Shared Memory Machines