Proceedings

of the 2013

Mini-Conference

-13 on Applied

Theoretical

Computer

Science

matcos

University of Primorska Press





MATCOS-13

Preface

Proceedings of the 2013 Mini-Conference

on Applied Theoretical Computer Science

Dear reader,

In the cooperation of the University of Primorska and

Edited by

the University of Szeged we have decided to organize

Miklos Kresz and Andrej Brodnik

a conference in 2010, which reflects both the theo-

Reviewers and Programme Committee

retical and applied aspects of computer science. The

Andrej Brodnik ■ Ljubljana, Koper, Slovenia, co-chair

Mini-Conference on Applied Theoretical Computer

Gabor Galambos ■ Szeged, Hungary

Science (MATCOS - 10) was a two-day event in the

Gabriel Istrate ■ Timisoara, Romania

frame of which a student session was organized. This

Hans Kellerer ■ Graz, Austria

occasion gave the opportunity for distinguished Master

Miklos Kresz ■ Szeged, Hungary

students and early stage PhD students to present their

Silvano Martello ■ Bologna, Italy

preliminary results and ongoing research. It turned out

Andras Recski ■ Budapest, Hungary

Gerhard Reinelt

that our initiative was very successful; the full papers

■ Heildeberg, Germany

Jiri Sgall

of 9 student talks were honoured to contribute to the

■ Prague, Czech Republic

Borut Žalik ■ Maribor, Slovenia

post-proceedings of this event.

Steering Committee

Continuing the tradition, the Middle-European Confer-

Andrej Brodnik ■ Ljubljana, Koper, Slovenia

ence on Applied Theoretical Computer Science (MAT-

Gabor Galambos ■ Szeged, Hungary

COS-13) also secured the place for high-level student

Miklos Kresz ■ Szeged, Hungary

research. This post-proceeding is devoted to publish

Gerhard Reinelt ■ Heildeberg, Germany

the carefully reviewed full papers presented at MAT-

Organizing Committee

COS-13 held on October 10th and 11th, 2013 in Koper,

Janez Žibert, chair

Slovenia. The topics of the contributions cover a wide

Tine Šukljan

range of theory and application, reflecting the scope

Marko Grgurovič

of the conference: algorithmic solutions for life science

Published by

problems, artificial intelligence methods for games and

University of Primorska Press

computer vision, compiler construction, advanced data

Titov trg 4, si-6000 Koper

structures, application-oriented scheduling and numer-

ical simulation. All papers presented promising results,

Editor-in-Chief

proving the excellent research attitude of the contribu-

Jonatan Vinkler

tors with showing the potential for a scientific career.

Managing Editor

Alen Ježovnik

The high standard of the presentations in the regular

session convinced us to collect selected papers for a

Koper, 2016

special issue in the international journal Informatica,

isbn 978-961-6984-20-1 (pdf)

which was published as number 3 in volume 39, 2015.

www.hippocampus.si/isbn/978-961-6984-20-1.pdf

The success both in organizing the meeting and in

isbn 978-961-6984-21-8 (html)

publishing the best results of the conference motivate

www.hippocampus.si/isbn/978-961-6984-21-8/index.html

us to make this event a regular meeting. Our plan is to

© 2016 Založba Univerze na Primorskem

organize the next conference in October 2016 in Koper

with involving again regular and student sessions.

MATCOS-13 was smoothly organized, we are very

grateful to Janez Žibert, chair of Organizing Committee

and to “his team”. Special thanks to Professor Silva-

no Martello for accepting our invitation as a keynote

speaker and giving his nice talk. The success of the

conference would have not been possible without the

aid of programme committee, while the professional

CIP - Kataložni zapis o publikaciji

work provided by the University Primorska Press in the

Narodna in univerzitetna knjižnica, Ljubljana

publishing process was also indispensable. Thank you



for all their support.

004(082)(0.034.2)



Miklós Krész, chair of student conference

MINI-Conference on Applied Theoretical Computer Science

(2013 ; Koper)

MATCOS-13 [Elektronski vir] : proceedings of the 2013

Mini-Conference on Applied Theoretical Computer Science /

edited by Miklos Kresz and Andrej Brodnik. - El. knjiga.

- Koper : University of Primorska Press, 2016

Način dostopa (URL): http://www.hippocampus.si/isbn/978-

961-6984-20-1.pdf

Način dostopa (URL): http://www.hippocampus.si/isbn/978-

961-6984-21-8/index.html

ISBN 978-961-6984-20-1 (pdf)

ISBN 978-961-6984-21-8 (html)

1. Gl. stv. nasl. 2. Krész, Miklós

283625728





Contents

Preface

II

Efficient Implementation of Algorithms for Solving Subgraph Isomorphism Problem

in Cheminformatics

5–8

◆ Mónika Vigula

Adaptive Algorithms For Dynamic Programming With Applications to Bioinformatics

◆ Marko Grgurovič

9–11

An Algortihm for Recognition of Position Repetition in Chess

◆ Nándor Németh

13–16

Comparison Between a Cache-oblivious Range Query Data Structure and a Quadtree

◆ Tine Šukljan

17–19

A New Method for Transforming Algorithm into VHDL by Starting from a Haskell

Functional Language Description

21–24

◆ Gergely Suba and Péter Arató

A System for Parallel Execution of Data-flow Graphs

◆ Andrej Bukošek

25–28

An Algorithmic Framework for Real-Time Rescheduling in Public Bus Transportation

◆ Balázs Dávid and János Balogh

29–33

Traffic Sign Symbol Recognition with the D2 Shape Function

◆ Simon Mezgec and Peter Rogelj

35–38

On Numerical Simulation of Filtration Gas Combustion Processes on the Shared Memory

Machines

39–43

◆ Tatyana Kandryukova

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

III

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October





Efficient implementation of algorithms for solving

subgraph isomorphism problem in cheminformatics

Mónika Vigula

Eötvös Loránd University

Budapest, Hungary

vigula.monika@gmail.com

ABSTRACT

(mostly hundreds of thousands or millions) and a query

In a typical task in cheminformatics [1], we have to re-

molecule are given. We have to find all the target molecules

trieve all target molecules from a given database containing a from the database containing the query molecule as sub-query molecule as substructure. Representing the molecular

structure (see Figure 1). In the first step, we represent the structures as labeled graphs, this task can be formulated as

molecules as undirected, labeled graphs. Then, we check for

the subgraph isomorphism problem, which is a well-known

each target molecule whether it contains the query molecule

NP-complete problem.

or not.

As this is a well-known NP-complete problem,

screening methods are used as preliminary step. Our pur-

In our work, we have studied three algorithms solving the

pose is to identify quickly as many of those molecules from

subgraph isomorphism problem. We have implemented the

the database that cannot contain the query as substructure.

Ullmann algorithm [5] and the VF2 algorithm [2] along with

Only the remaining molecules are to be examined by the

the atom re-ordering step of QuickSI [4] and additional

actual substructure search algorithm.

heuristics.

The time and memory requirements of these

algorithms were also evaluated. We have determined the

combination of heuristics resulted in the best performance

on real data sets as well.

Keywords

cheminformatics, subgraph isomorphism problem, substruc-

ture search

Supervisors

Krisztián Tichler1 and Péter Kovács2

1.

INTRODUCTION

Cheminformatics is a rapidly growing field which implies

interesting challenges for information technology beyond

chemical background. It helps us to select pharmaceutical

research directions and reduce the numbers of costly research tests. We try to predict the possible outcome of an experiment using our mathematical and information technological

knowledge.

In a typical task, a database containing a lot of molecules

Figure 1: Our motivation

1Eötvös Loránd University, Hungary, Institute of Informat-

This problem has been studied extensively for decades. One

ics, ktichler@inf.elte.hu

2

of the most common solution methods is the algorithm pro-

Eötvös Loránd University, Hungary, Institute of Informat-

ics, kpeter@inf.elte.hu

posed by J. R. Ullmann [5] in 1976, the VF2 algorithm [2]

published by Cordella et. al. in 2001 is also widely used,

while the QuickSI algorithm [4] is a quite new algorithm, it

was published in 2008.

Our aim was to overview algorithms related to the subgraph

isomorphism problem and implement some of these meth-

ods efficiently using heuristics that exploit characteristics of molecular graphs to improve performance (e.g., bounded

degree, vertex and edge labels).

In some applications, it

is also necessary to compute all possible mappings between

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

5





the query and the target structures, thus we also considered

(e.g., C, N, O ) and bonds (e.g., simple, double, triple) as

this problem. We tested the correctness and efficiency of our well.

implementations on real data sets.

In another screening method, two matrices are calculated for

2.

THE SUBGRAPH ISOMORPHISM

each molecule. Columns and rows correspond to frequent

atom types3 (e.g., C, N, O ) and bond types (e.g., simple,

PROBLEM

double), respectively. Initially, all matrix elements are zero.

Definition 1. A labeled graph is defined as a 6-tuple

The bonds of the molecules are examined sequentially. If

G = (V, E, ΣV , ΣE, lV , lE) where V is the set of vertices, E

the current bond connects two atoms of frequent type, then

is the set of undirected edges, ΣV and ΣE are sets of vertex

the two elements of the matrix are increased by one. It can

and edge labels, and lV : V → ΣV , lE : E → ΣE denote

easily be proved that each element of the query’s matrix

label functions mapping a vertex or an edge to a label, re-

is less than or equal to the corresponding element in the

spectively.

target’s matrix if query ⊆ target.

Definition 2. A graph G1 = (V1, E1, ΣV , Σ

, l

, l

)

1

E1

V1

E1

is subgraph isomorphic to G

Simple example molecules are shown in Figure 3. The cor-

2 = (V2, E2, ΣV , Σ

, l

, l

)

2

E2

V2

E2

(denoted by G

responding matrices are shown in Table 1. The elements of

1 ⊆ G2) if there exists an injective function

f : V

the matrices of the query and the target are separated by a

1 → V2 satisfying

colon. As the query’s matrix contains non-zero elements and

1. ∀u∈V1 (lV (u) = l

(f (u)))

1

V2

all elements are zero in the target’s matrix, query * target

2. ∀u, v∈V

holds.

1 ((u, v) ∈ E1 ⇒ (f (u), f (v)) ∈ E2)

3. ∀u, v∈V1 ((u, v)∈E1 ⇒ lE ((u, v)) = l

((f (u), f (v))))

1

E2

Notice that G2 does not necessarily contain G1 as induced

subgraph. A simple example is shown in Figure 2. A possible

mapping is

f (1) = 14

f (5) = 5

f (9) = 12

f (2) = 2

f (6) = 8

f (10) = 3

f (3) = 1

f (7) = 4

f (4) = 6

f (8) = 11

Figure 3: Example molecules to screening

C

N

O

F

P

S

. . .



single

4 : 0

1 : 0

1 : 0

0 : 0

0 : 0

0 : 0

. . . 

double

0 : 0

0 : 0

0 : 0

0 : 0

0 : 0

0 : 0

. . .





triple

0 : 0

0 : 0

0 : 0

0 : 0

0 : 0

0 : 0

. . .

Table 1: Matrix

Figure 2: An example where G2 contains G1 as sub-

4.

ALGORITHMS

structure

We have studied three well-known substructure search algo-

Note furthermore, that hydrogen is usually not represented

rithms, the Ullmann algorithm [5], the VF2 [2], and the

in molecules as its presence can be deduced from our chem-

atom-reordering step of the QuickSI [4].

Although they

ical knowledge.

are all backtracking algorithms, they build the backtrack-

ing trees in different ways. Furthermore, QuickSI rearranges

In cheminformatics, query molecule, target molecule, atoms,

the atoms of the query molecule in the first step to achieve

bonds and substructure are usually used instead of G

better performance. Each node of the backtracking tree rep-

1, G2,

vertices, edges and subgraph, respectively, therefore these

resents a partial isomorphism from the query to the target

notions are used in the rest of the paper.

molecule. As this tree can be exponential in size, the pur-

pose of the algorithms is to filter out nodes that can not

be extended to achieve a subgraph isomorphism function.

3.

SCREENING

Therefore, different feasibility functions are applied by the A preprocessing step, called screening, is used to exclude as methods.

many molecules as possible from the target database that

cannot contain the query molecule as substructure. Our aim

The Ullmann algorithm maintains a boolean matrix (de-

was not to study and improve the existing screening meth-

noted by M ) representing the possible branches at the dif-

ods, therefore only some simple conditions were checked

ferent depths. Mi,j is true if the current partial mapping

before running the implemented substructure search algo-

can be extended by adding f (i) = j to achieve a subgraph

rithms. If query ⊆ target then the atom/bond count of the

query is less than or equal to the atom/bond count of the

3The frequent atom types were determined based on a public

target. Similar inequalities hold for various types of atoms

molecule set of National Cancer Institute [3].

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

6



isomorphism between the query and the target. This ma-

trix is refined after each step: the algorithm checks for each pair of i and j where Mi,j is true whether the non-mapped

neighbors of the i-th atom in the query can be mapped to

a neighbor of the j-th atom in the target. If this condition

does not hold, then Mi,j is set to false and a new refine phase is performed.

The VF2 algorithm maintains a neighbor set for each

molecule (N bq, N bt) which contains those atoms that are not Figure 4: The parent heuristic

involved in the current partial mapping but have a neighbor

that is already involved. The algorithm distinguishes the

following two cases.

frequency of the different atom and bond types in the given

database. They suggested mapping the rare atoms before

the frequent ones while keeping connectivity. This new atom

1. N b

order needs to be calculated only once before any search

q 6= ∅.

In this case, the next atom to be mapped

is the element of this set with the minimum index.

against the database.

This atom can be mapped only to an atom from N bt.

Before the algorithm extends the current partial map-

Although this method has been shown to reduce search time,

ping by adding f (i) = j, it checks if the atom and the

it has some limitations. In its original form, it is applicable bond types match and |N b

only for connected query graphs. However, as in many real-

q | ≤ |N bt| holds for the new

neighbor sets.

world applications, the query structure may consist of mul-

tiple components, we have extended this method to be ap-

2. N bq = ∅. In this case, the non-mapped query atom

plicable for such queries. For example, an isolated atom can

with the minimum index is considered, which can be

be mapped to any atom of the target having the same atom

mapped to any non-used atom of the target. Before

type; therefore it is beneficial to consider isolated atoms last.

the extension of the current partial mapping, only the

Furthermore, if we apply this atom ordering technique, then

atom types are checked.

the calculation of parent atoms should also be adjusted.

5.

HEURISTICS

5.3

Ullmann Algorithm

In this section, we briefly introduce the heuristics we have

We have successfully applied the aforementioned two heuris-

developed to improve upon the considered algorithms.

tics in the Ullmann algorithm. They both substantially de-

crease its running time.

5.1

Parent heuristic

Definition 3. Suppose the atoms of the query are indexed

In addition, we could also decrease the memory requirement

by positive integers and a partial mapping is given.

Let

of the algorithm.

Its straightforward implementation re-

q

quires O(n3) space (where n = max{|V1|, |V2|}), which can

i (i ∈ {1, . . . , |V1|}) be an arbitrary non-mapped atom of

the query molecule. Define the parent atom of q

be reduced to O(n2) by saving only the modified matrix el-

i (denoted by

parent(q

ements at each depth instead of saving the entire boolean

i)) as its mapped neighbor atom with the minimum

index (if it has any).

compatibility matrix. Because every element can be set from

true to false only once under a node in a backtracking tree,

Notice that at least one atom in each component does not

at most O(n2) positions need to be saved in total.

have a parent atom (i.e., the atom mapped first in its com-

ponent). Furthermore, note that the parent atoms are not

Another improvement exploits that the compatibility matrix

modified by extensions of the current partial mapping. The

contains exactly one true value in the first d rows, where d

key observation that we exploited is that a query atom qi can denotes the current search depth.

Therefore, the matrix

only be mapped to those atoms of target that are adjacent

refinement step can be started at the (d + 1)-th row.

to the image of parent(qi). Search algorithms can effectively take advantage of this property in case of molecular graphs,

5.4

VF2

because they are very sparse.

We have also devised a few improvements for the VF2 algo-

rithm. First, the candidate sets are not pre-calculated and

A simple example is shown in Figure 4 to demonstrate how

not stored in memory. If the next candidate is needed, it

this heuristic can be applied. Suppose that the current map-

can be calculated on the fly. Additionally, if only the first ping is f (1) = 4, f (2) = 5, and f (3) is under consideration.

mapping is required, then there is no need to calculate those The parent atoms are indicated by arrows pointing from an

candidates that might not be used in a later step.

atom to its parent atom. As f (parent(3)) = 4, which has

two neighbors that are not mapped yet, namely 2 and 6, the

Although comparing the cardinality of neighbor sets im-

only possible assignments are f (3) = 2 and f (3) = 6.

proves the performance of the original algorithm, we found

that it becomes superfluous when the parent heuristic is also 5.2

Atom order in the query molecule

applied. In fact, this new heuristic developed by us seems

H. Shang et. al. introduced a new method in [4] to obtain

to be clearly superior to the original technique.

better performance of search algorithm.

Their idea is to

rearrange the atoms of the query molecule according to the

Apart from that, the atom reordering heuristic is also ap-

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

7



plied to further reduce the running time of the algorithm.

Ullmann alg.

VF2

Moreover, an additional improvement is based on the obser-

vation that the order of the atoms in the query can be fixed

first

all

first

all

in the first step, thereby avoiding to find the atom with the small queries

minimum index at each step.

0.004 s

0.005 s

0.001 s

0.002 s

and targets

small queries

6.

TEST CASES, SPEED AND MEMORY

and

large

tar-

0.067 s

0.136 s

0.009 s

0.013 s

RESULTS

gets

large queries

We have compared the implemented algorithms combined

and

large

tar-

0.27 s

1.084 s

0.01 s

0.027 s

with all the improvements mentioned above in different as-

gets

pects. We have studied the time requirement for finding

either the first or all substructure mappings for molecules

Table 3: The search time for 1000 query-target pairs

having different size.

Our test suite consists of three in-

at finding the first/all mapping

stance sets: (1) small queries and small targets4; (2) small

queries and large targets, and (3) large queries and large

targets.

The test molecules were selected from a public

7.

CONCLUSION

molecule database of National Cancer Institute [3].

The

This paper presents our results for solving the subgraph iso-

different test cases are summarized in Table 2. The first

morphism problem. We have implemented the Ullmann al-

and second row of each cell corresponds to the queries and

gorithm, the VF2 algorithm, and the atom-reordering step of

targets, respectively. Some of the 20 small, frequent test

QuickSI along with various heuristics that we devised to im-

molecules are shown in Figure 5.

prove upon them. We have compared the memory and time

requirements of the implementations on real-world molecu-

The heuristics mentioned above have decreased the search

lar graphs having different size. The search time has been

time of the Ullmann algorithm and VF2 by 35-40%. The

reduced by 35-40% by applying the developed heuristics.

running time can be further reduced by 10% if the algo-

rithms are implemented in iterative way instead of recursive

8.

ACKNOWLEDGMENTS

way. Finding all mappings required between 1.5 and 3 times

I would like to express my gratitude to Krisztián Tichler

as much running time as finding only the first mapping.

and Péter Kovács, my supervisors for their patient guidance,

enthusiastic encouragement and useful critiques of this re-

Our results show that the VF2 algorithm is typically much

search work. I am grateful to István Fekete for introducing

faster than the Ullmann algorithm. We found that the re-

me to cheminformatics and for his continuous support. I am

finement of the initial boolean matrix used by the Ullmann

obliged to staff members of ChemAxon Ltd. for the valuable

algorithm takes about 70% of the total search time.

information provided by them in their respective fields.

The reduced search time for 1000 query-target pairs are

The research project presented in this paper is supported

shown in Table 3.

by the European Union and co-financed by the European

Social Fund (grant agreement no. TÁMOP 4.2.2./B-10/1-

Query molecules

2010-0030).

small

large

Marvin was used for drawing, displaying and characterizing

20 small molecules

chemical structures, substructures, and reactions, Marvin

cules

small

–

6.0.5, 2013, ChemAxon (http://www.chemaxon.com).

molecules having 11-15 atoms

mole

20 small molecules

large, similar

9.

REFERENCES

rget

large

a

[1] N. Brown. Chemoinformatics – an introduction for

T

molecules at least 76 atoms

molecules

computer scientists. ACM Computing Surveys, pages

Table 2: Test cases

8:1–8:38, 2009.

[2] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento.

An Improved Algorithm for Matching Large Graphs.

In: 3rd IAPR-TC15 Workshop on Graph-based

Representations in Pattern Recognition, Cuen, pages

149–159, 2001.

[3] National Cancer Institute. NCI Open Database

Compounds, 2000.

http://cactus.nci.nih.gov/download/nci/.

[4] H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming

Figure 5: Some small query molecules

Verification Hardness: An Efficient Algorithm for

Testing Subgraph Isomorphism. Proceedings of the

VLDB Endowment, pages 364–375, 2008.

[5] J. R. Ullmann. An algorithm for subgraph

4In cheminformatics, molecules having at most 15 non-

isomorphism. Journal of the ACM, 23:31–42, 1976.

hydrogen atoms are considered as small molecules.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

8





Adaptive Algorithms For Dynamic Programming With Applications to Bioinformatics

[Extended Abstract]

Marko Grgurovič

University of Primorska

Titov Trg 4

Koper, Slovenia

marko.grgurovic@student.upr.si

ABSTRACT

It can be shown that certain dynamic programming algo-

In this paper we study a simple computation that lies at

rithms, including algorithms for the edit distance problem

the core of many dynamic programming algorithms, perhaps

and algorithms used in geology [4] and in speech recogni-

most notably those for computing the edit distance between

tion [3], can be reduced to this simple computation. The

two strings. Past solutions have assumed properties of the

straightforward algorithm for computing Eq. 1 takes O(n)

input such as convexity and concavity and would not work

time per i, which amounts to a total of O(n2) time over all

on general inputs. We obtain an algorithm that combines

i.

the previous algorithms in a way that makes no assumptions

on the input, yet its running time depends on a measure of

Previous algorithms for this problem focused on specific

“sortedness” present in the input. This measure turns out to

types of the function g(·) (usually referred to as the gap

be very natural, and if the input comes from a function, it

function), e.g. those that satisfy the quadrangle inequal-

corresponds to the number of inflection points of the input

ity [1, 5, 2] or the inverse quadrangle inequality [1]. None of function. An immediate consequence of the result is that

these algorithms produce the correct answer when g(·) does

if the input comes from a polynomial of degree d, then the

not satisfy their assumptions. In this paper, we develop a

running time of the algorithm can be upper bounded by

combination of these approaches that works for all inputs,

a function of d. The new algorithm extends the previous

but has a running time that depends on the function g(·) in

results to a wider family of functions and is never worse

a natural way.

than either special cases.

All logarithms in this paper are in base two.

Categories and Subject Descriptors

2.

THE CONVEX AND CONCAVE CASE

F.2.2 [Theory of Computation]: Nonnumerical Algorithms

and Problems

In this section we describe the algorithm of [1] for the convex case. We do not explicitly consider the concave case, since

it is essentially analogous.

General Terms

Algorithms, Theory

Consider the case when g(·) is a convex function, that is

a function which increases at an increasing rate. In other

Keywords

words, given a < b the following residues are implicitly

sorted:

Dynamic programming, edit distance

g(b + c) − g(a + c) ≤ g(b + c0) − g(a + c0)

for

0 ≤ c ≤ c0.

1.

INTRODUCTION

We consider the following simple computation: given a se-

Now consider the line 1, 2, ..., n.

We will assign to each

quence of values X1, ..., Xn and a function g(k) for 1 ≤ k ≤

point i on this line the value Y [i], i.e. the combination that n, compute for all 1 ≤ i ≤ n:

achieves the minimum for a given i in Eq. 1. Observe that

we can represent Y [i] with Xk, since given some Xk and

Yi = min1≤k<iXk + g(i − k).

(1)

i the choice of gap is implicit (i.e. g(i − k)). For each i

in the list we would like to find the Xk which achieves the

minimum for that i.

Lemma 2.1. If Xk achieves the minimum for some i, but

does not achieve the minimum for i + 1, then it does not

achieve the minimum for any i0 > i.

Proof. Let Xj be the element that achieves the mini-

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

9

mum for i + 1. Then:

elements in block bi. Observe that 1 ≤ B ≤ n/2, since two

elements form either an ascending or descending sequence,

Xk + g(i + 1 − k) ≥ Xj + g(i + 1 − j)

and in the case B = 1 the “function” is either convex or

Xk − Xj ≥ g(i + c − j) − g(i + c − k).

concave. This step takes O(n) time, and allows us to par-

tition G into disjoint intervals where the property of sorted Which holds for all c ≥ 1, since the residues on the right-residues holds.

hand side at most decrease.

The algorithm works by moving through the list L and for

What follows from Lemma 2.1 is that each element from X

each block (start, end, asc), it finds the optimal combinations occupies at most one interval on our line i = 1...n.

of Yi = Xk + Gi−k whenever i − k is inside the interval

(start, end). Once these combinations have been found, they

We would also like to know, given two elements X

are stored as Yi if they are lower than the current value and k and Xj

and j < k, for which gap values will X

the algorithm moves onto the next block, where the same

k be smaller than Xj .

This leads to the inequality:

process is repeated. Observe that these subproblems can

be solved by a slight modification of the algorithms from

Xk + g(a) ≤ Xj + g(b)

∀b, a : b − a = k − j.

[1], where we use the convex algorithm if asc = 1 and the

concave algorithm otherwise. Once we solve the subproblem

Rearranging this gives us

in a block, we will not revisit it and so the space can be

X

reused.

The algorithm requires O(n) extra space, which

k − Xj ≤ g(k + c) − g(j + c).

matches the space requirements of [1].

We can now find the maximal value of c for which the in-

equality still holds in O(lg n) by employing binary search.

The time required to solve the subproblem on block bi comes

from the algorithm described in Section 2. In our case, the

Now we are ready to describe the algorithm. First we assign

binary search is performed on bi, so the time is O(n lg |bi|).

X1 as the interval that covers the entire line. Then, the al-

Over all subproblems the time becomes:

gorithm works by traversing the list of candidates X2, ..., Xn and comparing the current candidate X

B

k with the element

X

X

O(n

lg |bi|).

j , which corresponds to the rightmost interval on the line.

One can now determine for which values of i the new candi-

i=1

date is better, and the interval can be updated. In the case

Turning the sum of logarithms into the logarithm of the

that the new candidate is better than the previous one for

product we get:

the entire interval, we simply remove the old candidate, and

B

repeat the process on the rightmost remaining interval.

Y

O(n lg(

|bi|)).

i=1

In order to analyze the time required by the algorithm, we

note that each new candidate from X will perform at least

Assume B is fixed. Since the logarithm is a monotonically

one binary search, which takes O(lg n) time. Multiple binary

increasing function, the time is maximized when QB

|b

i=1

i| is

searches may be performed once intervals are removed from

maximized. Recall that the geometric mean is less than or

the rightmost end of the line, but observe that each element

equal to the arithmetic mean. Then we have:

X is only added once, so after being removed it is never

B

reinserted. Therefore, the algorithm takes O(n lg n) time.

Y

(

|bi|)1/B ≤ n/B

i=1

3.

ALGORITHM

B

Y

Observe that the function g only takes on integral inputs.

(

|bi|) ≤ (n/B)B.

Therefore, we can consider a more general computation:

i=1

given a sequence of values X1, ..., Xn and a sequence of val-

Thus, we can upper bound the time by O(Bn lg( n )). Re-

ues G

B

1, ..., Gn for 1 ≤ k ≤ n, compute for all 1 ≤ i ≤ n:

gardless of G, the time is never worse than the straight-

Y

forward O(n2) algorithm, and achieves the O(n lg n) bound

i = min1≤k<iXk + Gi−k .

(2)

when B = O(1). If the elements of G come from a function

g(·), which is defined in the domain [1, n], then B can be

This allows us to do away with the strict convexity require-

upper bounded by the number of inflection points1 of g(·)

ments. Consider the sequence of values:

in that region. Recall for example, that a polynomial of de-

G

gree d has at most d − 2 inflection points. Therefore, if the 2 − G1, G3 − G2, ..., Gn − Gn−1.

values in G come from a polynomial, we can upper bound

In a preprocessing step, we can traverse this sequence to

the running time by its degree.

discover blocks which contain elements in ascending (resp.

descending) order. These are precisely the regions of the

4.

FUTURE WORK

“function” which are convex (resp. concave). Build the list

Since we are no longer working with functions explicitly, but L which, for each block, contains an element (start, end, asc) rather sequences, we can also obtain a speedup by working

that stores information about the block boundary (start,

on X in an analogous way. For example, we could traverse X

end ) and whether the block is in ascending (asc = 1) or de-

scending (asc = 0) order. Let us denote the number of such

1These are points where convexity changes into concavity

blocks b1, b2, ..., bB by B, and let |bi| denote the number of and vice-versa.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

10

and find blocks, then decide to use either X or G, whichever has fewer blocks. However, in most dynamic programming

applications, the minimum computation that we study is

used as a subroutine. In these cases, X changes between

calls to the subroutine, whereas G remains static. Hence,

it makes sense to analyze G, especially since it tends to

be a particular mathematical function. However, working

on X would have its merits if one could show that, for a

given application, the number of blocks B in X between

subroutine calls can be bounded by some value.

Similar techniques could perhaps be applied to other dy-

namic programming algorithms that operate on the (min, +)

semi-ring. One notable example is (min, +) matrix multipli-

cation.

5.

REFERENCES

[1] Z. Galil and R. Giancarlo. Speeding up dynamic

programming with applications to molecular biology.

Theoretical Computer Science, 64(1):107 – 118, 1989.

[2] D. S. Hirschberg and L. L. Larmore. The least weight

subsequence problem. In Proceedings of the 26th

Annual Symposium on Foundations of Computer

Science, SFCS ’85, pages 137–143, Washington, DC,

USA, 1985. IEEE Computer Society.

[3] D. Sankoff and J. B. Kruskal. Time warps, string edits,

and macromolecules. Cambridge University Press,

Cambridge, England, 2000.

[4] M. S. W. T. F. Smith. New stratigraphic correlation

techniques. Journal of Geology, (88):451 – 457, 1980.

[5] F. F. Yao. Speed-up in dynamic programming. SIAM J.

on Alg. Discr. Meth., (3):532 – 540, 1982.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

11

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

12





An Algortihm for Recognition of Position Repetition in Chess

Nándor Németh

Eötvös Loránd University

Department of Software Technology and Methodology

nem.nandor@gmail.com

ABSTRACT

enormous mistakes, roughly in every second game. Nowa-

A chess game ends in draw if the same position appears for

days we can find several solutions for the problem in sophis-

the third time. In a chess program, we need to implement

ticated chess programs[2]. Some of the programs use correct

this rule to avoid losing some winning positions, and not

algorithms for the problem. Others use statistically correct

to lose positions, where we have the possibility to make the

methods which run faster, and rarely make mistakes. In

match a draw. Recognizing position repetitions is not trivial this paper, the author demonstrates a new correct method

problem. The algorithm must run in most of the positions

for the problem.

we reach in the game tree, so it must run quickly. We can

read about a lot of algorithms which recognize the repeti-

2.

DEFINITIONS

tions. There are two groups of these algorithms: correct

To understand this paper more easily, let us describe some

ones, and statistically correct algorithms having faster so-

definitions:

lutions. This paper demonstrates a correct solution which

is simpler and faster than any other correct algorithm, and

Two positions are identical if each square of the board con-

it can work with any board representation. We can check

tains the same piece, the same player has to move, and the

some statistically correct algorithms quickly with a correct

move sentences, the payers can make, are identical, too.

method. Therefore we can get a more faster correct algo-

rithm for the problem.

The normal search tree is a graph. The root vertex is the

position we can see on the chessboard. The other vertices are Categories and Subject Descriptors

the positions the program reaches during the search. The

I.2.1 [Artificial Intelligence]: Applications and Expert

edges represent the moves. Let us extend this tree with all

Systems—games

the positions which have appeared on the chessboard during

the chess game. So, the root vertex of the new graph is

the initial position. In this paper, let us name this graph

General Terms

extended search tree.

Algorithms

There are moves, such as pawn moves or captures, when

Keywords

the position before the move can not be created after the

chess programming, position repetition, move sequences

move. Let us say them boundary moves in the extended

search tree. Let us name the position after a boundary move

boundary position. To use the definitions more easily, name

1.

INTRODUCTION

the initial position boundary, too. Therefore, all positions

Among the laws of chess, we can find rules which guaran-

are boundary or have a boundary position created previously

tee that games must end in finite moves. One of them is the

in the extended search tree.

rule of the position repetition. When the same repetition ap-

pears at the third time on the board, any of the players can

The examined position is the position which is just examined

ask for a draw. Although the fifty-move rule is easily imple-

by the chess program.

mentable in chess programs, the algorithms for recognizing

position repetitions are very complex and slow. Therefore,

3.

ALGORITHMS REALIZED IN

some chess programs omit this rule. This can cause some

WELL-KNOWN PROGRAMS

There are some solutions for recognizing position repeti-

tions[2]. The programs usually recognize the first repetition.

It is enough for the correct control. If one of the players can make a better move than making a repetition, then the repetition will not appear on the board. If one player can make

a repetition independently from the opponent, then he can

produce easily the position at the third time, too. There-

fore, it is enough to find the first repetition, and it is not neccessary to wait for the second one. In such a way, the

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

13





problem was simplified greatly. The neccessary search depth

piece’s original and actual square. If a piece continues its

for recognizing repetitions is reduced to half of the original journey, then the program modifies its actual square. When

depth.

a piece reaches its original position, then the program deletes its record from the data structure. When all of the pieces

moved reached their own original position, there isn’t any

record in the data structure. So, the program recognizes a

position repetition. Of course, the algorithm can manage

the swap of two identical pieces.

To understand this method, let us see a simple example: an

examined position, and a reverse move sequence. In Table

1 we can see, what is stored in the data structure during the running of the algorithm.

Figure 1: In the original case there are 8 plies nec-

cessary, but 4-ply search is sufficient for recognizing

the first repetition

There are two groups of the algorithms we can find in chess

programs[2]. The correct methods recognize the repetitions

exactly. The other methods are only statistically correct

ones. Although they make mistakes, they need shorter run-

ning time.

Figure 2: The examined position

3.1

Correct algorithms

All of the correct algorithms are based on two control meth-

The reverse move sequence is:

ods, the position comparing control or the move control.

Both of them run in linear time. Of course, we must exam-

Q H6-G5, K H8-G7, Q G5-H6, K G7-H8, (...)

ine all of the positions or moves after the last boundary position in the extended search tree. Therefore, the algorithm’s

running time depends on the number of the moves after the

Table 1:

The data structure during the reverse

boundary position, linearly. This type of the algorithms can

moves

result significant slowdowns in some move sequences, when

Reverse move

The records

Numbers

there are no pawn moves, or captures.

Q H6-G5

Q H6-G5

1

K H8-G7

Q H6-G5, K H8-G7

2

3.1.1

Board-comparing methods

Q G5-H6

Q H6-H6, K H8-G7

2

The easiest way for the problem is the comparsion of the

-

K H8-G7

1

examined position to the previous ones. The more bits define

K G7-H8

K H8-H8

1

the chess board, the slower the algorithm is. There are some

-

-

0

methods to optimize the algorithm. Of course, we need to

investigate only every second position. We can first check for 3.2

Statistically correct algorithms

example the last move’s target square, and next the whole

There are some algorithms which are faster than the cor-

table, if there is a same type of piece on this square.

rect methods, but they make mistakes occassionally. Most

of them use hash keys, and hash tables[5, 3]. If the algo-

3.1.2

Computing repetitions from move sequences

rithm uses hash tables, it can run in constant time[3]. This

In the Axon chess program there is a very interesting algo-

is a great advantage against the correct methods in long

rithm [4] which uses only the moves for recognizing position

capture-, and pawn-move-free move sequences. The pro-

repetitions. Let us see the move sequence from the last

gram doesn’t make really big mistakes frequently, because

boundary to the examined position. We can easily reverse

mistakes occur rarely, and only a few mistakes cause bad

this sequence. The program investigates this reverse move

move on the board, most of them are disappear because of

sequence.

the minimax algorithm.

When the program scans the reverse moves sequentially, it

4.

THE NEW ALGORITHM

updates a data structure that stores some information about

The paper shows an algorithm based on the method of the

the moved pieces. In one reverse move a piece can start a

Axon chess program[4]. The algorithm uses the reverse move

new journey, continue its journey or reach its original posi-

sequence, and the examined position, too, to recognize the

tion. If a piece starts a new journey, the program stores the repetitions. It investigates every reverse move. If all moved matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

14





pieces during the reverse moves reach a square which con-

pieces swap their positions. In the example, only the white

tains the same type of piece in the examined position, then

moves are shown. The coloumn ”Cases” shows which of the

the position is repeated. Let us name the pieces which are

four cases is applied.

standing on a square which doesn’t contain the same type

of piece in the examined position as moved pieces. In the

examined position the number of the moved pieces is zero.

When it will be zero again, the algorithm finds a repetition.

Sometimes it can happen that two identical pieces swap their

position. In this case there can be repetition. Therefore, if a piece reaches a square which contains a same-type piece

in the examined position, the algorithm considers it as the

end of its journey.

This number is similar to the number of the records in the

Axon. The great observation is that we need no data struc-

tures, only this number, to recognize repetitions.

4.1

The basis of the algorithm

The algorithm modifies the number of the moved pieces in

Figure 3: Swap of two knights

every reverse move. When the algorithm starts to make a

move, it investigates the from-square of the move. If the

square in the examined position contains a piece which has

The reverse move sequence is:

the same type as the moving piece, then a piece starts a new

journey, so the number of moved pieces increases by one. In

N C5-E6, N E6-F4, N D3-C5, K F4-C5, (...)

other cases, its value doesn’t change.

Table 2: The modifying of the number of the moving

The algorithm then investigates the to-square. If it contains pieces

a same-type piece in the examined position, a piece ends its

Reverse move

Moving pieces

Case

journey. Therefore, the number of moved pieces decreases

-

0

-

by one. In other cases, its value doesn’t change.

N C5-E6

0 + (1 + 0) = 1

1

So, we have 2

N E6-F4

1 + (0 + 0) = 1

3

· 2 = 4 cases.

In the first case, the moving

piece starts a new journey, and it doesn’t finish the journey N D3-C5

1 + (1 − 1) = 1

4

at the end of the move. Then, the algorithm increases the

N F4-C5

1 + (0 − 1) = 0

2

number of the moving pieces by one.

4.2

Optimizing the algorithm in array-based

In the second case, when a piece ends its journey. Of course, board representation

the algorithm decreases the number by one.

The algorithm is good, but we can do some optimizations.

In the third case, when a piece continues its journey. So,

The program must stop the reverse moves at the point,

it has some moves, but it may make a big tour before it

where the algorithm recognized the repetition, or where there finishes its journey. In this case the number of the moving

can’t be any repetitions. When there are less than four plies pieces doesn’t change.

distance from the last boundary position to the examined

one, then players can’t make repetitions. Sometimes a lot of

The fourth case can happen very rarely. In this situation, a

pieces move. If there isn’t the neccessary number of moves

piece starts its journey, and ends it at the end of the move.

to end all of the moved pieces’ journey, then the algorithm

In this case, the number of the moving pieces increases and

stops the search for repetitions. Considering the fact that

decreases at the same time, so its value doesn’t change.

only every second position can be identical with the exam-

ined position, it is better if the algorithm investigates two This algorithm is enough for managing the number of the

plies during one running of the body of the loop.

moving pieces. Its great advantage is that it needs no extra

data structure. It can manage the swapping of the pieces

4.3

The method with bitboard representations

similarly with a piece which reaches its starting square dur-

We can define a condition(piece,square) boolean function

ing the reverse moves. In the Axon chess program the pro-

which gives true if the square contains piece-type piece in

grammers had to implement two different algorithms for the

the examined position, and gives false in other cases. With

problem. The running time of the algorithm is stable. In

this function, we can generalize the algorithm for any type

the Axon, the running time depends on the number of the

of board representation[1].

moved pieces.

If a program uses bitboard representation, the function can

The running of the algorithm is usually similar to the run-

be seen as follows: Let us consider the bitboard which stores ning of the algorithm of the Axon. In Figure 2, the two

the position of the piece-type pieces. Let us make an other

algorithms are similar, but the new one uses only one num-

bitboard which has only one non-zero bit. This bit is the bit ber. Let us see, how the algorithm works, when two identical

of the square. If the algorithm uses the xor binary operator

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

15

in this bitboards, the result is nonzero if and only if there is recognizing algorithm prunes branches from the search tree.

a piece-type piece in the square at the examined position.

If the program uses greater search depth, the algorithm can

prune longer branches.

5.

ANOTHER USE OF THE NUMBER OF

THE MOVED PIECES

6.3

Running time with long move sequences

The algorithm’s running time can move up to 15 times longer

In some situations the programs can evaluate the positions

than in average cases. The Axon’s algorithm needs time 3

better, if it calculates with the measure of the change of

times longer than the new algorithm. Therefore the algo-

the position. This defines a distance between two positions.

rithm used in Axon is not considered as a linear running-

The number of the moved pieces is one approximation of its

time algorithm. This result was expected.

value.

In the opening moves, the players try to move a lot of pieces 6.4

Using the algorithm to control statistically

to make an open position. In this part of the game, there are correct methods

some pawn moves, too. So, we must extend the algorithm.

Some statistically correct methods make mistakes only if the

examined position isn’t repeated. This type of algorithm can

In the endgame, when there are long move sequences without

be controlled with correct ones. With the new algorithm,

pawn move or captures, the nearly constant position implies

the slowdown of the control becomes negligible. Because of

the draw. If a few pieces move within several plies, it seems the rare repetitions and the very few number of mistakes,

none of the players can make a really good attack. In this

the mixed algorithm must use only 0.4% more time than a

case the program does well to use modified distance in some

simple statistically correct method.

cases. Its reason is that if the original distance is zero, but not the same player has to move, then at least 5 plies must

7.

CONCLUSIONS

be between the two positions.

The author made a new position repetition recognizing al-

gorithm for chess. The algorithm is demonstrably correct.

Such a use of the number of the moved pieces needs further

Its running time is linear, as the other correct algorithms.

research.

But it is faster than other correct methods. Programs can

use the algorithm to control some statistically correct algo-

6.

THE RESULTS

rithms. This mixed algorithm is correct of course, and has

6.1

The test program

a running time as fast as simple statistically correct algo-

A test program has been written to compare some algo-

rithms. So the author could combine the advantages of the

rithms for recognizing the repetitions. The compare made

two type of repetition recognizing algorithms.

in several search depth, with alpha-beta and simple minimax

methods[3]. The program used all positions which appeared

8.

ACKNOWLEDGMENTS

in 2010 at the Linares International Chess Tournament. It

The author is grateful to his supervisor Tibor Gregorics for

runs the minimax algorithm at all the positions, and mea-

the motivation and the continuous guidance. The author

sures the running times.

is also thankful to István Fekete, because of his valuable

suggestions; and to Gábor Gévay, who gave a lot of great

6.2

Comparsions with other algorithms in av-

ideas about the algorithm.

erage case

9.

REFERENCES

In an average case, the new algorithm needed almost half the

[1] Chess programming - board representation.

time comparing to the Axon’s method[4]. The other correct

http://chessprogramming.wikispaces.com/

algorithms had a little bit more longer running time. The

Board+Representation, 2012. [Online; accessed

statistically correct methods were a little bit faster. Their 07-Sept-2013].

great advantage is that the hash keys are needed in other

[2] Chess programming - repetitions.

algorithms[3], so the slowdown of the program is smaller

than that of the correct methods.

http://chessprogramming.wikispaces.com/Repetitions,

2013. [Online; accessed 07-Sept-2013].

The type of the pruning also modifies the running time. The

[3] T. Marsland. Computer chess and search. Technical

minimax method needs relatively the most extra time if the

report, Computing Science Department, University of

program uses repetition recognizing algorithm. The alpha-

Alberta, 1992.

beta method needs less time. Its reason is the following:

[4] V. Vuckovic and D. Vidanovic. An algorithm for the

when the program uses pruning, the number of investigated

detection of move repetition without the use of

long move sequences without captures and pawn moves is

hash-keys. Yugoslav Journal of Operations Research,

less.

17(2):257–274, September 2007.

[5] A. L. Zobrist. A new hashing method with application

The author investigated the running time difference between

for game playing. Technical report, Computer Sciences

a chess program that uses the new repetition recognizing

Department, University of Wisconsin, Madison,

algorithm, and another one that doesn’t use any repetition

Wisconsin, 1969.

recognizing algorithm. The program runs a little bit slower

with the algorithm. The difference reduces by the growing

of the search depth. The reason may be that the repetition

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

16





Comparison between a cache-oblivious range query data structure and a quadtree

Tine Šukljan

University of Primorska

Institute Andrej Marušič

Muzejski trg 2

Koper, Slovenia

tine.sukljan@upr.si

ABSTRACT

multi-level hierarchy, mostly because many parameters are

After the introduction of the cache-oblivious model by Frigo

used in such models to describe the memory levels. Until

et al. in 1999, a lot of research has been done about data

the introduction of the cache-oblivious model by Frigo et

structures that work well in multi-level memory hierarchies.

al. in 1999[6] that allows obtaining algorithms that are ef-

Range reporting, as one of the fundamental problems in

ficient in multi-level memory hierarchy without the use of

computational geometry, received a lot of attention.

As

complicated models.

there was a lot of research done in this area, there are

very little implementation and comparisons between cache-

Range reporting is one of the most studied problems in com-

oblivious and ”non”-cache-oblivious data structures.

putational geometry. Given a set S in Rd, the problem is to

preprocess S, so that for a given query range q, the points in In this article we implemented one of the cache-oblivious

S ∩q are reported quickly. There exist many types of queries

data structures for range queries and compared it with the

as axis-aligned boxes, circles, halfspaces. If R2 there exists quadtree. The results show, that the cache-oblivious data

special cases like the two-sided and three-sided range report-structure answers the queries much faster, but at the ex-

ing. In the case of the two-sided range reporting, it consists pense of bigger space consumption.

of an axis-aligned box with two adjacent boundaries fixed

at ∞ (or −∞ respectively). Similarly, a three-sided range

Keywords

reporting consists of a axis-aligned box with one boundary

fixed at ∞ (or −∞ respectively). A four-sided query is also

cache-oblivious, range queries, quadtree

called orthogonal.

1.

INTRODUCTION

In this paper we try to make a comparison of data structures

The memory systems in modern computers normally con-

for orthogonal range queries. Specifically between a classi-

sist of several levels in a hierarchy, typically of cache, main cal data structure for range query which was developed for

memory and disk. The access time of different levels vary

the RAM model and a data structure created in the cache-

by orders of magnitude. To improve the running time of

oblivious model.

accessing the data far from the processor, the data are often moved from the farthest levels to the close ones in big blocks.

1.1

Related work

It is important to design the algorithms and data structures

Much of research has been done about cache-oblivious data

to support high locality in the memory-access patterns.

structures for range reporting, range counting and domi-

nance in various dimension after the introduction of the

When working in RAM model, one assumes a flat mem-

cache-oblivious model. Most notable were the cache-oblivious

ory system with uniform access time. Because of that most

B-Trees structures[5] with O(log

N ) search and update time.

of the algorithms and data structures exhibit low memory-

B

All the structures rely intensely on the so-called van-Emde-

access locality and are not efficient in a hierarchical memory Boas layout for storing a balanced constant-degree tree of

system. A lot of work has been done in a two-level mem-

size O(N ) in memory so that any root-to-leaf path can

ory model (I/O model) introduced by Agarwal and Vitter

be traversed cache-obliviously in O(log

N ) memory trans-

in 1988[3] to model the high difference in access time be-

B

fers[7].

tween memory and disks. Much less work has been done in

Agarwal et al.[2] were the first to develop a cache-oblivious version of a two-dimensional range tree that answers planar range queries in the optimal O(log

N + T /B) memory

B

transfers using O(log2 N ) space. The downside was that B

2

has to be of the form B = 22c for some non-negative integer

constant c. Later, Arge et al.[4] developed a static cache-

oblivious linear-space data structure for answering two-sided queries in O(log

N + T /B). Using a simple construction

B

method, we can also obtain a static cache-oblivious data

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

17

structure for four-sided queries, with the query running time Lemma 2. [4]The layout L and search tree Y can be used

O(log N + T /B) and O(log2 N ) space usage. This struc-

to answer any two-sided range query in O(log

N T /B) mem-

B

2

B

ture doesn’t have any assumption about B and it’s much

ory transfers, where T is the number of reported points.

simpler. In 2009, Afshani et al.[1] proved, that for optimal

query bound, for three-sided range reporting, the structure

2.1.1

The construction

has to use Ω(N (log log N )ε) space, where ε > 0.

Arge et al.[4] presented a contruction method for the layout.

The basic idea is to store all the points, sorted by the x-

2.

CACHE-OBLIVIOUS DATA STRUCTURE

coordinate, in the leafs of a balanced binary search tree,

For the cache-oblivious data structure we chose the data

stored implicitly using the van-Emde-Boas layout. All the

structure from [4], as it’s the simplest and doesn’t have as-

internal nodes carry additional information about the actual

sumptions about B, and has the same running time. We

points in the subtree. Then using the sweep line approach;

first describe the data structure for two-sided queries and

starting from −∞, we sweep a horizontal line upwards across

how to construct it. After that we describe the construction

the plane. Each time the sweep line is at point p, we traverse of the data structure for four-sided queries.

the tree leaf-to-root and update all the nodes on the path.

After this step, we have to check in the root of the tree, if 2.1

Data structure for two-sided queries

there exists a sparse query. If it does, we have to traverse a root-to-leaf path (specified by the information in the internal The structure consists of two parts: a sequence L of length

nodes), to find the point, we use to create the tree Y. The

O(N ), called the layout, which stores the points of S, with

whole process was proved[4] to take O(N log

N ) memory

the possibility of duplicate entries; and a balanced binary

B

transfers.

search tree Y on a subset of the y-coordinates of the points

in S, stored implicitly in the van-Emde-Boas layout. Each

2.2

Data structure for four-sided queries

leaf in the tree stores a pointer to an element of L. To answer a query (−∞, x] × [y, ∞), a search for y in Y is done and

We first explain how to construct a data structure for three-

then the pointer is followed into L. Then a scan forward

sided queries, as it is a step to construct the data structure over L is performed until an x-coordinate greater than x is

for four-sided queries. Our three-sided structure consists of found.

a balanced binary tree T with the points in S stored at

the leaves, sorted by their x-coordinates. T is laid out in

The key point of the data structure is the way the points

memory using the van-Emde-Boas layout, so that a root-to-

are stored in the layout L. Assume that we are prepared

leaf path can be traversed cache-obliviously in O(log

N )

B

to scan through αT points for α > 1, when answering a

memory transfers. For each internal node v in T , let Sv be

query with output T . We will call it a dense if we scan at

the points of S stored in the subtree rooted at v. We store

most αT points, and sparse otherwise. Consider the mini-

the points in Sv in two secondary structures, Lv and Rv,

mum y-coordinate y

associated with v. Lv is a structure for answering two-sided

1 such that there exists a sparse query

(−∞, x] × [y

queries of the form (∞, x] × [y, ∞) (with the x-opening to

1, ∞).

Let S0 be a sequence sorted by the x-

coordinate, with y coosrdinate less than y

the left); Rv is a structure for answering two-sided queries

1.

As there are

no sparse queries with the y-coordinate less than y

of the form [x, ∞) × [y, ∞) (with the x-opening to the right).

1, we can

answer those queries efficiently. As we repeat the step with

the remaining points, we get a concatenation of sequences

Each point is stored in two linear space structures on each

S

of the O levels of the tree T , the structure uses O(N log N ) 0, S1, ..., Sk of points, with which we can answer all the

2

queries efficiently. The problem with this approach is that

space. To answer the query [xl, xr] × [yb, ∞), we take the

the space can be more than linear, since the worst case size

first node v such that xl is contained in the subtree rooted

is Θ(N ).

at he left child l, and xr is contained in the subtree rooted at the right child r. Then we query Rl with the query range

To reduce the size of the layout we need to store the se-

[xl, ∞) × [yb, ∞) and Lr with the query range (inf ty, xr] ×

quences in such a way, that every sequence S

[yb, ∞).

i is identical

with Si+1 having a suffix valuse as large as possible. We take a point with the minimum y-coordinate, such that there ex-To construct the data structure for four-sided queries, we

ists a sparse query (−∞, x

need to apply the same method again.

We store all the

i] × [yi, ∞). Instead of storing all

the points with y-coordinate less than y

points, this time sorted by the y-coordinate, in a balanced

i, we store only the

points that has the x-coordinate less than x

binary tree, stored using the van-Emde-Boas layout. For

i too. With this

improvement it can be proved that the size of the layout L

each internal node v, let Sv represent the points from S

is linear:

stored in the subtree rooted in v. We store the points in

Sv in two secondary structures Uv and Bv, associated with

v. Uv is a structure for answering a three-sided query of the Lemma 1. [4]Layout L uses at most α N = O(N ) space.

form [xl, xr]×[y, ∞); Bv is a structure for answering a three-

α−1

sided query of the form [xl, xr] × [y, −∞). This construction To answer the two-sided queries we create, in addition to

adds another factor of O(log2N ) to the space complexity,

the leayout L, a binary search tree Y over the y-coordinates

to get the final O(N log2 N ) space complexity of the data

2

we used to construct L, and store the whole tree using the

structure for four-sided queries.

van-Emde-Boas layout. Each leaf stores a pointer to the

start of the sequence it produced in L.With both, the tree

3.

RESULTS

Y and the layout L, we can answer two-sided range queries

As we implemented the mentioned data structures, we no-

in O(log

N T /B).

ticed that the construction algorithm rely heavily on root-

B

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

18

Table 1: Comparison between the cache-oblivious data structure (C-O) and the quadtree (QT).

Construction time

Query time

Size

Number of points

C-O

QT

C-O

QT

C-O

QT

100.000

30 s

0,06 s

5 ms

43 ms

∼1 GB

2MB

200.000

69 s

0,12 s

28 ms

90 ms

∼2,5 GB

4MB

1.000.000

7,5 min

0,7 s

132 ms

453 ms

∼7 GB

24MB

2.000.000

17 min

1.5 s

413 ms

951 ms

∼16 GB

50MB

10.000.000

∼2 h

22 s

2817 ms

6259 ms

∼80 GB

300MB

100.000.000

∼8 h

115 s

32523 ms

341282 ms

∼950 GB

3,5GB

to-leaf and leaf-to-root traversals.

Each computation to

space. So there is actually a trade-off between the speed of

find the index of the parent/child of a specific node takes

the queries and the space.

O(log N ) time, so we decided to precompute the indexes

2

for all the topologies of the trees, which drastically reduced 5.

REFERENCES

the construction time.

[1] P. Afshani, C. Hamilton, and N. Zeh. Cache-oblivious

range reporting with optimal queries requires

We decided to compare the cache-oblivious data structure

superlinear space. In SCG ’09: Proceedings of the 25th

with a quadtree, probably the most used data structure

annual symposium on Computational geometry. ACM

for range queries in the RAM model.

We compared the

Request Permissions, June 2009.

query time, construction time and the size of the data struc-

[2] P. K. Agarwal, L. Arge, A. Danner, and

ture. We tested on problems of different sizes, varying from

B. Holland-Minkley. Cache-oblivious data structures for

100.000 to 100.000.000 points. All the coordinates were ran-

orthogonal range searching. pages 237–245, 2003.

dom 32-bits decimal numbers. All the queries were of type

[3] A. Aggarwal and J. Vitter. The input/output

(−∞, ∞) × (−∞, ∞), so all the points had to be returned.

complexity of sorting and related problems.

All the tests were run five times and the average results are Communications of the ACM, 31(9):1116–1127, 1988.

shown in Table 1. All the tests were run on a Core 2 Duo @

[4] L. Arge and N. Zeh. Simple and semi-dynamic

2.53 GHz and 4GB of RAM.

structures for cache-oblivious planar orthogonal range

searching. In SCG ’06: Proceedings of the

From the results of the tests, it can be seen that the cache-

twenty-second annual symposium on Computational

oblivious is much faster in answering the queries. The shaded geometry. ACM Request Permissions, June 2006.

cells in Table 1 show, that the data structure for that specific size was already bigger than the available main memory, so

[5] M. A. Bender, E. D. Demaine, and M. Farach-Colton.

swapping occured. Note, that because of high locality for

Cache-oblivious B-trees. SIAM J. Comput.,

the cache-oblivious data structure, swapping didn’t affect it 35(2):341–358, 2005.

as much as it affected the quadtree.

[6] M. Frigo, C. E. Leiserson, H. Prokop, and

S. Ramachandran. Cache-Oblivious Algorithms. In

On the other side, the size of the whole cache-oblivious data FOCS ’99: Proceedings of the 40th Annual Symposium

structure is much bigger than the quadtree. The main reason

on Foundations of Computer Science. IEEE Computer

is that even if the key ingredient of the cache-oblivious data Society, Oct. 1999.

structure (the two-sided data structure) has linear space

[7] H. Prokop. Cache-oblivious algorithms. Master thesis,

complexity, we need a lot of them (O(log2 N ) exactly). When

Massachusetts Institute of Technology, Cambridge.

2

N is getting bigger, this factor is not negligible.

The difference in the construction time is probably of the

least importance, as the cache-oblivious data structure is a

static data structure, so it can be precomputed. Despite

that, it is interesting to notice, that 2/3 of the construction time is sorting the points, as the four-sided data structure

needs the points sorted by their y-coordinate, the three-sided needs them sorted by their x-coordinate. So for every three-sided data structure we need to sort all the points. The same happens for every two-sided data structure, as it needs them

sorted by their x coordinate.

4.

CONCLUSIONS

This is the first imeplementation of this data structure to

the author’s knowledge. It can be seen from the results,

that the cache-oblivious data structure, once constructed, is not affected by the swapping of blocks made by the operating system, as the theory suggested. On the other side,

to achieve that, the implemented solution takes a lot more

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

19

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

20





A new method for transforming algorithm into VHDL by starting from a Haskell functional language description

∗

Gergely Suba

Prof. Dr. Péter Arató

Budapest University of Technology and

Budapest University of Technology and

Economics

Economics

2 Magyar tudósok körútja

2 Magyar tudósok körútja

Budapest, Hungary, 1117

Budapest, Hungary, 1117

sugergo@iit.bme.hu

arato@iit.bme.hu

ABSTRACT

1.

INTRODUCTION

In the field of computer engineering, there are a lot of prob-Although the speed of the conventional processors is growing

lems that are too time-consuming like biological or physical

continually, there are a lot of problems that are too time-

calculations and that’s why they have to be implemented

consuming like biological or physical computing and that’s

using special hardware structures. Usually, these hardware

why they have to be implemented in a more efficient hard-

structures are described in HDL (Hardware Description Lan-

ware structure and sometimes in special hardware. Usually,

guage). Developing in HDL languages is not as efficient as

these hardware structures are described in HDL (Hardware

it would be in case of software languages due to its low level Description Language). These languages are not so well-structures.

known by the mathematicians and software engineers, they

usually write the algorithms in general-purpose program-

The aim of this work is to implement and test a method

ming languages. There is a great difference between using

(a compiler program), where the starting point is a code

HDL or software languages, and the chance to transform

written in the functional language Haskell and the output

these two representations into each other is rather small. To is the same algorithm in VHDL (a kind of HDL) language.

overcome this gap, there is plenty of research that are very

The main advantage of the novel method presented in this

important parts of the so called System Level Synthesis.

paper is that it generates a synthesizable VHDL description

from Haskell code automatically for FPGA implementation.

Hardware synthesis starting with a functional software lan-

guage has some advantages compared to a hardware descrip-

The method introduced in this paper can solve two sepa-

tion language. Besides, modifying the hardware when the

rate problems: 1) running algorithms effectively in FPGA,

algorithm changes is easier, because a code written in func-

2) development of digital hardware implementing a speci-

tional language can be adapted more directly than one writ-

fied function. We demonstrate the efficiency of the method

ten in a hardware description language. Transforming an

through practical examples like a part of the MP3 decoding

algorithm to HDL can be easily automated by the novel com-

algorithm.

piler program introduced in this paper. The program code

written in functional language can be run in PC, therefore

it can be tested easily. In contrast to this, the hardware de-Categories and Subject Descriptors

scription languages require a complex simulation for testing, C.3 [Special Purpose and Application-based Systems]:

where the inputs and outputs are handled as digital signals,

Real-time and embedded systems; D.2.2 [Software Engi-

which is an additional complication.

neering]: Design Tools and Techniques

Based on the above arguments, the aim of this work is to

Keywords

implement and test a method (a compiler program), where

the starting point is a code written in a functional language HLS, FPGA, Haskell, VHDL, dataflow graph

and the output is the same algorithm in HDL language.

∗

Functional languages have benefits in case of digital signal

Prof. Dr. Péter Arató is the supervisor.

Department of Control Engineering and Information Tech-

processing and it fits the capabilities of the hardware world nology, Faculty of Electrical Engineering and Informatics

better than the imperative ones. I focus on the Haskell [9]

functional language, which is being developed dynamically,

and it supports the research work well.

The main advantage of the method presented in this paper

is that it generates the synthesizable VHDL [6] description

from Haskell code automatically for FPGA implementation.

Even a pipeline optimization tool can be involved by the pro-

cedure such as the high level synthesis tool PIPE, developed

at the Department of Control Engineering and Information

Technology in BME.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

21

We demonstrate the efficiency of the method through prac-Algorithm

tical examples like a PID controller and a part of the MP3

writtenhinhHaskell

User

decoding algorithm.

Compiler

OperationSet

parameter

For the method, we defined the following requirements:

Compiler

FOpMap

Haskell-EOG

1. compile Haskell to VHDL code automatically

compiler

2. notify the user about the error details in case the code

cannot be compiled

HLShoptimization

EOG

3. make the compiler program modular, so that its parts

can be used separately

OpVhdlMap

EOG-VHDL

4. ensure that pipeline optimization methods can be in-

compiler

cluded if necessary

VhdlModules

VHDLhcode

5. the operation set should be dynamically extendable,

Externalhsoftware

without modifying the compiler program

FPGAhsynthesizer

Among some functional languages (Lava [3], µF P [12], Ruby

[5], SASL [4]) that can use to generate HDL code, the most

Hardware

similar project is the CλasH (CAES Language for Synchronous

Hardware) [2, 7], nevertheless it meets only the first two

FPGAhconfiguration

requirements from the above list. However, the purpose of

CλasH is different from the method introduced in this paper.

Figure 1: The structure of the compiler method

Although, CλasH is based on Haskell syntax, it is a hardware

description language, in contrast to our method where the

purpose is a language without hardware concepts. CλasH

also has some disadvantages compared with the method in

known homogeneous synchronous dataflow (HSDF) [8, 11]

this paper. It has no pipeline optimization stage, and it can graph, where the vertices are the operations and the edges

not include external optimization methods, as it doesn’t use

are dataflow links. In this representation, HLS optimization

dataflow graphs as intermediate representation. The opera-

stage can be included, such as the HLS system PIPE [1].

tion set is constant, thus new operations can not be added

After the optional optimization, the EOG-VHDL compiler

dynamically.

as backend part will produce the expected VHDL output.

From VHDL, an external FPGA development software can

The method and the compiler program introduced in this

synthesize the FPGA bitstream, which can result in a real

paper meets Requirement 1, as it compiles the code auto-

FPGA application.

matically. In our method the GHC frontend is used to pre-

process the source code that includes the parser, lexer and

2.1

Haskell-EOG compiler

the generation of the abstract syntax tree. Therefore, re-

The frontend of our compiler takes the source code written

porting of the compiler errors (Requirement 2) is ensured

in Haskell, and produces the EOG via the successive stages

by GHC. Our method is based on modular structure: it is

introduced in this section.

divided into frontend and backend parts, and these parts

consist of inner stages as it will be detailed later (Require-Stage 1. Produce the AST Core

ment 3). Between the frontend and backend, the interface

The first steps are to parse and analyze the code, and pro-

is a dataflow graph, which is an intermediate representation

duce the abstract syntax tree (AST). These steps are made

of the algorithm.

Most of the pipeline optimization sys-

by the GHC frontend, which produces the GHC Core [13]

tems are based on dataflow graphs, therefore these stages

tree. This Core tree mostly consist of the following type of

can be included in our system (Requirement 4). An exter-

nodes: Lit (constant literal), Var (using of variable), App

nal database is used as operation set, therefore expanding

(lambda application, a.k.a. function calling), Lam (lambda

the set of operations dynamically is simple (Requirement 5).

abstraction, a.k.a. function body) and Case (branching). In

the further steps of the frontend pipeline these nodes have to 2.

THE PROPOSED COMPILER

be converted to EOG nodes and edges by successive graph

ARCHITECTURE

transformations.

In this section the architecture of our Haskell to VHDL com-

piler system will be introduced.

Stage 2. Eliminate of the branching

The second step is to eliminate all of the branching (Case)

The structure of the compiler method is shown in Figure

nodes from the Core. During the elimination every branch

1. As it can be seen, the structure is divided into frontend

is to be converted to a special function calling (App) node,

and backend parts. The first is the Haskell-EOG compiler,

which performs the branching as a black box. This step is a

which produces the elementary operation graph (EOG) [1] as

so-called Core2Core transformation, as its input and output

intermediate dataflow representation. EOG is a kind of well

are Core trees with the same structure.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

22

Stage 3. Produce the operation dependency tree Elementary operation graph

The operation dependency tree (ODT, here which is an in-

termediate data structure between the Core and the EOG

OperationSet

representation) consists of the operation nodes and the de-

Preprocess

Make

pendency edges between the operations. Two types of nodes

EOG

ModuleList

OpVhdlMap

are defined in this tree: operation nodes and pointer nodes.

VhdlModules

An operation node has as many children as the number of

its operands. A pointer node is always leaf and it can be

Operations

ModulList

considered as a reference to an operation node. If an oper-

ation node has a pointer node child, it will mean the same

function as its child would be the operation node referenced

by the given pointer node.

Produce

Produce

Produce

Operations

Signals

Components

TopModule

Producing the ODT is the largest part of the Haskell-EOG

Template

compiler. The main concept is supercompilation [10], where

Generate

the idea is to travel the Core tree in the same order as the

VHDL

CPU would process the statements of this functional code.

During this graph traversal, the following conversions are

performed to produce the ODT (as the output tree) from

VHDL top module

the Core description:

Figure 2: EOG-VHDL compiler

• Var is transformed to pointer node in ODT

• Lit is transformed to constant operation node in

ODT

operations will result in component declarations in VHDL

• App (if it is elementary) is transformed to elemen-

(it is generated by the Produce Components task). Each

tary operation in ODT

EOG operation will result in component instantiation and

• App (if it is complex): the body of the function must

each dataflow link will result in signal declarations, which

be inlined, and the nodes contained by the inlined tree

interconnect the component instantiations. These parts are

has to be processed as every other node

generated by the Produce Operations and Produce Sig-

nals tasks in Figure 2.

• Lam can be eliminated without restriction

• Case nodes have already been eliminated in Stage 2

Finally, the Generate VHDL task creates the output VHDL

(the top module), from the produced VHDL parts discussed

Stage 4. Produce the EOG

before.

The final step in the Haskell-EOG compiler is to transform

the ODT to elementary operation graph, which is a sim-

The EOG-VHDL compiler produces a single, structural VHDL

pler process, than the previous stages were. Every opera-

file, which contains only certain types of language structures: tion node in ODT will also be operation in EOG, and every

component and signal declarations and component instan-

dataflow link will also be link in EOG. The difference is that tiations.

The behavior of the components are defined in

the pointer nodes must be resolved, and it has to be substi-

VHDL files separately component-by-component (these files

tuted by the operation node referenced by the pointer node.

are stored in the OperationSet).

In practice, if the end of a link is a pointer node, this end has to be replaced by the referenced operation node. After

3.

MULTI-RATE EXTENSION

this transformation, the dataflow structure is obtained, not

The EOG, previously used as an interface between the com-

necessarily being a tree anymore.

piler frontend and backend, is a single-rate dataflow graph,

which is a huge restriction. EOG can only represent dataflow, 2.2

EOG-VHDL compiler

where each operation is performed once during one restart

The architecture of the EOG-VHDL compiler, which is the

of the system (in other words: it is performed once for each

backend of the whole method and system introduced in this

input of the system). Although, certain loops can be im-

paper, can be seen in Figure 2. First, the EOG goes through

plemented in that case too, it is not an efficient way due to a preprocess task (Preprocess EOG in Figure 2), which

the increasing number of operation nodes. (for example, if

collects all of the information about the EOG nodes, then

a loop iterates between 1 and 5, the operations of the loop

saves it into the store Operations. Another task (Make

body have to be replicated 5 times) For overcome this re-

ModuleList in Figure 2) creates the store ModuleList,

striction, a modified EOG is introduced, where groups of

which will contain all of the necessary information about

operations can be performed in different number of times

the modules used in the input EOG. These modules have

during one restart period.

to exist in the external OperationSet, because the Make

ModuleList task loads the information from that for each

The extended version of EOG introduced in this section is

operation.

called multi-rate EOG (MR-EOG). In MR-EOG, groups of

operations (blocks) can be defined, where each block con-

From the two stores created before, the produce tasks will

tains one or more operations and optionally other block(s).

generate the parts of the VHDL output. All of the used

In this way the blocks are built up in a hierarchy, and the

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

23

top of the hierarchy is a block denoted by TOP in further.

Program

Num. of

Num. of

Num. of

operations

links

blocks

i

1

PID controller

29

36

0

2

MP3 SFB

93

133

6

a

g

h

j

3

MP3 ssum

19

25

1

k

4

MP3 uconv

32

45

0

b

e

f

5

MP3 usum

11

16

1

c

d

Table 1: Summary of the compiler tests

Figure 3: MR-EOG

Haskell code automatically, 2) it reports in case of any com-

piler errors, 3) the structure of the compiler is modular (it An example can be seen in Figure 3. Here the block TOP

has a frontend and a backend part), thus the parts could

contains the operations a, i, j and k, and two inner blocks.

be used separately in other projects. 4) external pipeline

One of that contains the operations g and h, and the other

optimization stage can be included, since it has a dataflow

contains b, e and f, and its inner block contains the opera-

graph as intermediate representation. 5) the operation set

tions c and d.

is able to be extended, because they are from an external

database separated from the compiler program itself.

For the new multi-rate representation, there is an important

purpose: ensure that the general algorithms performed in

The method, including its extended variant for multi-rate

dataflow graphs (such as pipeline optimization algorithms)

tasks, was tested with practical applications, such as a PID

can also be used in MR-EOG. The solution is running the

controller and an MP3 synthesis filter bank algorithm.

algorithms recursively, starting with the innermost block.

Since the innermost block does not contain inner blocks, the

6.

REFERENCES

methods developed for a single-rate dataflow graph can be

[1] Peter Arato, Visegrady Tamas, and Istvan Jankovits.

performed without restriction. If the algorithm finish with

High Level Synthesis of Pipelined Datapaths. John

a given block, the block has to be substituted by a virtual

Wiley & Sons, Inc., New York, NY, USA, 2001.

operation. The execution time of this operation will be the

[2] C. P. R. Baaij. Cλash: From haskell to hardware.

same as the execution time of the whole block was.

Master’s thesis, Univ. of Twente, December 2009.

[3] Per Bjesse, Koen Claessen, Mary Sheeran, and

When a block contains only operations (including virtual

Satnam Singh. Lava: hardware design in haskell.

operations), the same algorithm can be run on it. In the

SIGPLAN Not., 34:174–184, September 1998.

end, even the block TOP is performed, and the algorithm is

[4] Simon Frankau. Hardware synthesis from a

finished for the whole MR-EOG hierarchy.

stream-processing functional language, 2004.

[5] Shaori Guo and Wayne Luk. Compiling ruby into

4.

SOME CHARACTERISTIC TASKS FOR

fpgas. In Proceedings of the 5th International

TESTING

Workshop on Field-Programmable Logic and

We implemented the compiler program based on the method

Applications, FPL ’95, pages 188–197, London, UK,

introduced above in Haskell language. For testing the func-

1995. Springer-Verlag.

tionality, two characteristic tasks (written also in Haskell)

[6] ISO/IEC. Ieee 1076-2008: Ieee standard vhdl language

were chosen. The first one is a PID controller, which is a

reference manual. Technical report, Institute of

single-rate algorithm, therefore the method written in Sec-

Electrical and Electronics Engineers, Computer

tion 2 can even compile this code to generate the desired

Society, 26. January 2009.

VHDL. The second one is the synthesis filter bank (SFB)

[7] M. Kooijman. Haskell as a higher order structural

part of the MP3 decoder algorithm. Since it contains sev-

hardware description language, December 2009.

eral loops, it can only be represented by multi-rate types of

[8] E.A. Lee and D.G. Messerschmitt. Synchronous data

dataflow graphs efficiently. The multi-rate extension of the

flow. Proceedings of the IEEE, 75(9):1235 – 1245, sept.

method detailed in Section 3 was used to compile this task.

1987.

[9] Simon Marlow. Haskell 2010 language report, 2010.

A brief summary of the functional tests are shown in Table

[10] Neil Mitchell. Rethinking supercompilation. SIGPLAN

1. Row 1 shows the characteristic numbers of the PID con-

Not., 45:309–320, September 2010.

troller, while Row 2 shows the same for the MP3 SFB. The

[11] K.K. Parhi. Algorithm transformation techniques for

subtasks of the MP3 SFB can be seen in Row 3, 4 and 5

concurrent processors. Proceedings of the IEEE,

separately.

77(12):1879 –1895, dec 1989.

[12] Mary Sheeran. ufp, an algebraic vlsi design language -

5.

CONCLUSIONS

phd thesis, 1983.

A compiler program is developed on base of the novel method

[13] Andrew Tolmach. An external representation for the

introduced in this paper. The method meets the require-

ghc core language, 2009.

ments that was set before: 1) it generate the VHDL from

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

24





A system for parallel execution of data-flow graphs Andrej Bukošek

FRI, University of Ljubljana

Večna pot 113,

1000 Ljubljana, Slovenia

andrej.bukosek@gmail.com

ABSTRACT

1.

INTRODUCTION

This paper describes the system we developed for performing

Computer generation and manipulation of images is becom-

arbitrary operations on data in parallel using a data-flow

ing more and more prevalent in the film industry [2]. From graph [3].

compositing of complex rendered scenes with live actors to

simple colour grading, such tasks are accomplished with spe-

Each operation is implemented in a dynamically loadable

cialised software that allows its users to create a data-flow module or using a domain-specific language, which was de-graph of operations, which are applied to the footage.

signed specifically for this purpose. We also implemented

a compiler for this language. Our domain-specific language

Our goal was to create a generalised and more flexible soft-

is functional and strongly typed. We designed its type sys-

ware package for performing arbitrary operations on any

tem to be modular. Every data type is implemented in an

data using a data-flow approach.

external dynamically loadable module, which the compiler

loads during its initialisation. Each module contains func-

The core functionality of the system is the parallel execution tions for generating an intermediate representation for the

of data-flow graphs. Each node in a graph represents an

LLVM system, which optimises it and translates it into ma-

operation on data, which passes through its edges.

chine code.

To ease creation of new graph operations, we developed a

As an example usage of our system, we developed operations

domain-specific language.

Operations can also be imple-

for image manipulation, compositing, and rendering of 3D

mented in any language that supports compiling code into

scenes. Such a set of operations is commonly used in the

shared objects with a standard C ABI (these plugins are

film industry for the creation of special effects.

then loaded at runtime). Our domain-specific language is

functional and strongly typed. An interesting feature is its

We also implemented a renderer based on the path tracing

type system. Each type is implemented in an external shared

algorithm, which creates an image from the description of

object (plugin), which enables users to add custom types and

a 3D scene. This method is based on a physically-correct

parsers for constant values.

simulation of light bouncing around the scene.

As a practical usage of this system, we also implemented a

Categories and Subject Descriptors

variety of operations and types. A particularly interesting

operation is a 3D scene renderer, which employs the Monte

D.1.3 [Programming Techniques]: Parallel programming;

Carlo path tracing algorithm to generate physically-correct

D.3.2 [Programming Languages]: Functional languages;

images of 3D scenes [4, 5].

D.3.2 [Programming Languages]: Specialized applica-

tion languages; D.3.4 [Programming Languages]: Com-

pilers; E.1 [Data Structures]: Graphs; I.3.4 [Computer

2.

SYSTEM ARCHITECTURE

Graphics]: Graphics packages; I.3.7 [Computer Graph-

ics]: Raytracing, Animation; I.4.0 [Image Processing and

Figure 1 shows an overview of the system components.

Computer Vision]: Image processing software

General Terms

Task scheduler

I/O modules

Design, Languages, Algorithms

Script

Keywords

Gr

Oper

aph executor

ation

modules

data-flow graphs, parallelisation, domain-specific language,

compositing, rendering, computer graphics

User interface

Compiler

Type modules

Supervisors

Dr. Andrej Brodnik, Dr. Borut Robič

Figure 1: System architecture.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

25

In the most common use case, the system accepts a script Each type module can also define a parser hook for con-written in our domain-specific language.

This script de-

stants. While looking for constants, the lexer gives full con-scribes the data-flow graph (nodes and connections) using

trol to such hooks. This means that the syntax of constant

standard function calls (this could be achieved in any other

values can be completely arbitrary. One could even include

language if the system was converted into a library).

an interpreter for another programming language and use

that to generate a constant by running a program.

First, the system loads and initialises I/O and operation

modules. This is followed by initialisation of the compiler,

3.2

Grammar

which also loads the type modules. Then the script is com-

The grammar for our language in BNF [1]: piled into machine code in memory and executed. The script

forms the data-flow graph to be executed in parallel. The

hidentifierexpr i

::= hidentifier i | hidentifier i ‘(’ hexpressioni* ‘)’

graph is executed after script execution finishes.

hconstexpr i

::= hconst i

hparenexpr i

::= ‘(’ hexpressioni ‘)’

The main advantage of having external modules for I/O,

hifexpr i

::= ‘if’ hexpressioni ‘then’ hexpressioni

operations, and types is greater flexibility, as the main pro-

‘else’ hexpressioni

gram doesn’t need to be rewritten or recompiled to support

hforexpr i

::= ‘for’ htypei hidentifier i ‘=’ hexpressioni ‘,’

new file formats, operations, and data types.

hexpressioni [‘,’ hexpressioni] ‘do’ hexpressioni

hwithexpr i

::= ‘with’ htypei hidentifier i [‘=’ hexpressioni]

The subsequent sections of this paper contain more detailed

(‘,’ htypei hidentifier i [‘=’ hexpressioni])* ‘do’

hexpressioni

explanations of the various subsystems.

hprimaryi

::= hidentifierexpr i

|

hconstexpr i

|

hparenexpr i

3.

OUR DOMAIN-SPECIFIC LANGUAGE

|

hifexpr i

|

hforexpr i

The main purpose of the language we created is to make

|

hwithexpr i

developing new graph operations easier, especially for people hunaryi

::= hprimaryi | hunopi hunaryi

who aren’t professional programmers. Its secondary purpose

hbinoprhsi

::= (hopi hunaryi)*

is to provide a language in which users can describe the

hexpressioni

::= hunaryi hbinoprhsi

data-flow graph, however, this functionality could easily be

hprototypei

::= htypei hidentifier i ‘(’ [htypei hidentifier i [‘,’

accomplished using an existing language.

htypei hidentifier i]*] ‘)’

hfunc definitioni

::= ‘func’ hprototypei hexpressioni

Our language is functional, strongly typed and compiled into

hextern definitioni ::= ‘extern’ hprototypei

machine code for greater performance.

hprogrami

::= (hfunc definitioni | hextern definitioni)*

Optimisations and machine code generation are provided by

The symbol hidentifier i represents the name of a variable

the LLVM framework (http://www.llvm.org/). The com-or function.

It can contain alphanumeric characters and

piler we developed compiles source code into LLVM inter-

underscores, but cannot start with a digit.

mediate representation, on which we run various LLVM op-

timisation passes and finally generate native machine code

hconsti represents a constant value, which can be parsed by

for the architecture of the machine the program is running

any of the type modules. When encountering this symbol,

on. The code is compiled directly to memory, no temporary

the lexer goes through all the defined hooks in type modules

files are created during compilation.

until it finds one that accepts the constant.

htypei represents a type keyword. Each type module defines

3.1

Type system

a unique keyword for its type.

All types in the language are implemented as external shared

objects (plugins). Each plugin module implements functions

Comments in the language begin with a # sign and continue

for generating code for operators (e.g. adding two real num-

until the end of the line.

bers, multiplying a matrix with a vector, etc.) and parsing

constants (optional). The code generation functions actually

The precedence of operators is shown in table 1.

generate LLVM intermediate representation instructions.

Operators

Precedence

Apart from operators, each module can also generate code

;

1

(binds weakest)

for various library functions for that type (e.g. functions

=

2

for dot and cross product in the case of a 3D vector type).

or

6

One of the library functions for composite types (e.g. vectors xor

7

and matrices) is also the constructor for that type.

The

and

8

constructor function allocates memory for the structure and

==, <>

9

sets its members to the values passed to the constructor.

<, >, <=, >=

10

+, -

20

One of the advantages of generating code for operators and

*, /

40

(binds strongest)

functions (as opposed to simply implementing them and gen-

Table 1: Precedence of binary operators.

erating function calls instead) is that this enables further

optimisations.

Currently, only two unary operators are implemented —

unary minus (-) and logical negation (not).

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

26

3.3

Code examples

into the global data-flow graph.

Nodes that output con-

All constructs in our language are functional expressions and stants have an additional argument in their constructor. In-can be combined in the same way as arithmetic expressions.

dividual nodes can be connected using the function

The language is not whitespace sensitive.

connect(node1, "outputName", node2, "inputName"), which connects output port outputName on node1 to input

Let’s look at how we might calculate the n-th Fibonacci

port inputName on node2.

number in our language:

To get a better idea of how this works, let’s take a look at a func int fib(int n) if n < 3 then 1 else fib(n-1) + fib(n-2) real-world example:

func int main()

An iterative version using real numbers would look like this: with node s1, node s2, node s3, node s4,

node s5, node c1, node r1, node r2,

func real fibr(real n)

node li1, node li2, node ck1,

with real a = 1.0, real b = 1.0, real c do

node cb1, node si1

(for real i = 3.0, i < n do

do (

c = a + b;

# Create constant generators

a = b;

s1 = String("bg.png");

b = c);

s2 = String("fg.png");

b

# this is the value that the function returns

c1 = Color(color(0.0, 1.0, 0.0, 1.0));

r1 = Real(0.05);

r2 = Real(0.10);

Variables are declared with the keyword with and are valid

s3 = String("over");

in the expression following the keyword do. The for loop is

s4 = String("f");

enclosed in parentheses due to the priority of the ; operator.

s5 = String("out.png");

This operator behaves similarly to , (comma) in C.

# Create operations

li1 = LoadImage();

4.

DATA-FLOW GRAPHS

li2 = LoadImage();

ck1 = ChromaKey();

Data-flow graphs consist of nodes, which represent opera-

cb1 = Combine();

si1 = SaveImage();

tions, and edges, which facilitate the flow of data between

nodes. In our implementation, the graphs are directed and

# Connect ports

acyclic, as this is powerful enough to represent all current

connect(s1,

"const",

li1, "fileName");

connect(s2,

"const",

li2, "fileName");

use cases and simplifies parallelisation.

connect(c1,

"const",

ck1, "keyColor");

connect(r1,

"const",

ck1, "tolNear");

connect(r2,

"const",

ck1, "tolFar");

connect(li2, "imageOut", ck1, "imageIn"); connect(s3,

"const",

cb1, "mode");

connect(s4,

"const",

cb1, "clipTo");

connect(ck1, "imageOut", cb1, "fgImage"); in2

connect(li1, "imageOut", cb1, "bgImage"); in1

in3

connect(cb1, "imageOut", si1, "imageIn"); connect(s5,

"const",

si1, "fileName");

Operation

);

0

out1

out2

This script generates the graph shown in Figure 3.

String:

String:

Color:

Figure 2: Node structure.

Real:

Real:

bg.png

fg.png

rgb(0,1,0)

0.05

0.10

const

const

const

const

const

Each node can have input and output ports (see Figure 2).

An input port is merely a pointer to an output port of an-

fileName

fileName

other node. All input ports must be connected to something.

LoadImage

LoadImage

Output ports also contain the data or results of the opera-

imageOut

imageOut

tion of its node. Output ports can remain unconnected.

imageIn

keyColor tolNear tolFar

ChromaK

String:

String:

ey

over

f

Nodes without any input ports are usually generators of con-

imageOut

const

const

stants, while nodes without output ports usually save or dis-

play the results.

fgImage

mode

clipTo

String:

Combine

bgImage

out.png

Each port has a data type associated with it. If the types

imageOut

of an input and an output port don’t match, no connection

const

between them can be made.

imageIn

fileName

SaveImage

5.

GRAPH CREATION

Figure 3: Graph example (image compositing).

Graphs can be created using our domain-specific language.

Every operation gets its own constructor, which returns a

The generated graph loads two images (bg.png and fg.png),

node handle and automatically adds the newly created node

performs chroma keying on the foreground image, overlays

the result over the background image, and saves the final

composite into out.png.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

27



6.

GRAPH EXECUTION

Figure 5 shows an example image generated with our system.

During execution, each operation gets a parameter, which

Rendering and compositing of elements in the picture was

tells it for which moment in time (t ∈

made entirely using our system.

N0) it should per-

form that operation. This is useful in graphs that generate

animations, in which case t represents the frame of the ani-

mation.

The currently implemented ways of executing a data-flow

graph are:

• single execution — executes the graph only once for

a given t

• multiple execution — executes the graph for many

given t’s

• continuous execution — executes the graph until

interrupted by the user (t starts at 0 and rises mono-

tonically).

To execute the graph, we convert operations within it into

tasks for our batch job scheduler and set their priorities ac-Figure 5: George, I think we’re lost.

cording to the dependencies from the graph.

8.

CONCLUSION

The main goal was to create a generalised and flexible soft-

4

ware package for performing arbitrary operations on any

data using a data-flow approach. To this end we developed

a system for parallel execution of data-flow graphs with its

3

own domain-specific language, created to facilitate the im-

plementation of graph operations. As an example of the sys-

2

tem’s flexibility, we implemented various compositing, ren-

dering, and general image processing operations [3].

1

Although the system is already in a usable state, many fu-

0

ture improvements could be made. Implementing a graph-

ical user interface for creating graphs would be most ben-

Figure 4: Decomposing the graph into levels.

eficial, as it would make the system more accessible and

easier to use. The scheduler could be improved to handle

distributed scheduling.

As shown in Figure 4, we can decompose the graph into levels. Operations which share the same level are mutually

The system gets more useful as more operations are added.

independent and can be executed in parallel.

Possible areas for future development of operations include:

digital signal processing, computer vision, data mining, and

The parallel schedule is generated using a greedy approach,

hardware control. Utilising operations from these areas, the

starting at the nodes with no outputs and working upward

system could collect data from sensors, analyse it, and visu-

to satisfy dependencies.

alise the results. It could also be used to control a robot or industrial machinery.

The batch job scheduler is currently custom-made. It spawns

as many threads as there are processor cores in the system.

9.

REFERENCES

These threads then obtain work from a global priority queue,

[1] J. W. Backus, F. L. Bauer, J. Green, C. Katz,

which contains the tasks. Threads with higher priority have

J. McCarthy, A. J. Perlis, H. Rutishauser, K. Samelson,

to finish before those with a lower one can be run.

B. Vauquois, J. H. Wegstein, A. van Wijngaarden, and

M. Woodger. Revised report on the algorithm language

ALGOL 60. Communications of the ACM, 6(1):1–17,

7.

OPERATIONS

Jan. 1963.

Implemented operations, sorted by area of usage:

[2] R. Brinkmann. The art and science of digital

compositing. Morgan Kaufmann Publishers Inc., 1999.

• compositing [2] — CorrectGamma, Combine, Pre-

[3] A. Bukošek. A system for parallel execution of

multiply, Unpremultiply, ToneMap, ChromaKey, Dif-

data-flow graphs. FRI, University of Ljubljana, 2013.

fKey, Resize, Crop, AffineTransform, Convolve2D

•

[4] M. Pharr and G. Humphreys. Physically Based

rendering [4, 5] — Sphere, LoadTriangleMesh, Cam-Rendering, Second Edition: From Theory To

era, LoadSpectrum, BRDFMaterial, LambertianBRDF,

Implementation. Morgan Kaufmann Publishers Inc.,

AshikhminBRDF, ApplyEmission, ApplyNormalMap,

2010.

SceneGraphNode, AddChild, PathTrace

• general — LoadImage, SaveImage, LoadFrame, Save-

[5] P. Shirley and R. K. Morley. Realistic Ray Tracing, 2nd

Frame

edition. A. K. Peters, Ltd., 2003.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

28





An Algorithmic Framework for Real-Time Rescheduling in Public Bus Transportation

∗

Balázs Dávid

János Balogh

University of Szeged

University of Szeged

davidb@jgypk.u-szeged.hu

balogh@jgypk.u-szeged.hu

ABSTRACT

complicated disruptions, but this solution is usually really

In this paper, we describe an algorithmic framework for the

expensive. Operators might not be able to solve compli-

vehicle rescheduling problem. This framework is based on

cated disruptions in short time, but a decision support sys-

problems arising in the operative planning of real-world pub-

tem could provide them with helpful suggestions in such a

lic transportation companies.

case.

Categories and Subject Descriptors

The goal of our paper is to introduce a solution framework

H.4 [[]: Information systems applications]; H.4.2 [[]: Types

for the rescheduling problem in public transportation, which

of Systems]: Decision support (e.g., MIS), Logistics

could be used in aiding company operators responsible for

managing disruptions. An important requirement for this

General Terms

framework is to guide the solution process regardless of the

applied solution method. As the problem has to be solved

Vehicle scheduling, Disruption management, Public trans-

in almost real time, one of the main requirements for such

port

a system is to provide suggestions quickly.

Optimization

systems are already present for long-term planning [2, 11,

1.

INTRODUCTION

12], but these are all built around given solution methods.

Based on the available transportation data, the vehicle- and

However, the only paper to our knowledge that deals with a

driver schedules of a transportation company are created in

system for bus rescheduling is the one by Li et al. [8].

advance. A unique schedule is created for each day (or day-

type) of the planning period in such a process. A daily sched-Although there are published mathematical models for the

ule of a company is made up of vehicle and driver duties. A

problem [6, 10], these cannot be solved in short enough time.

duty is considered as a series of tasks, which the correspond-This leads to the design of efficient heuristic methods [6, 9], ing vehicle and driver has to complete in the given order.

which are able to provide quick solutions. The system we

The most important tasks of these duties are the timetabled

present in this paper is designed to work with any kind of

trips, which come from the timetable of the company and

solution method, and gives results for the problem based on

have to be executed.

needs of the operators, which they can control through dif-

ferent parameters. The heuristic methods we use to demon-

However, there are several difficulties that can arise while

strate the system were published in [6].

executing such a schedule in real-life: a problem with the

vehicle itself, sickness of the driver, or other such things can The outline of our paper is the following: first, we give a

make the pre-planned schedule infeasible. These unforeseen

quick overview of the rescheduling problem and disruption

difficulties are called disruptions. If a disruption arises, a management. We present the basic regulations, and give

new feasible schedule has to be created, usually by resche-

a list of these that can be regarded in a flexible manner

duling the old one.

through parameters and penalizing. Based on these design

thoughts, we introduce the suggested framework itself. Fi-

Disruptions have to be managed almost immediately, which

nally, we give some ideas of solution methods that we pro-

is usually the task of a company operator. Companies usu-

pose for the framework.

ally have a backup vehicle with which they address more

∗supervisor

2.

TRANSPORTATION SCHEDULING AND

DISRUPTION MANAGEMENT

The daily schedule of a transportation company consists of

several duties. Every duty is executed by a driver, and also

has a corresponding vehicle. Each duty is a series of tasks

that have to be carried out in the given order. The most

common tasks are the ones corresponding to the trips of the

timetable, and the so-called deadhead trips, which are re-

sponsible for moving the empty vehicle from one location to

another. There can also be several different vehicle specific matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

29



tasks (eg. parking, refueling) and driver specific tasks (eg.

give a list of the most important rules in both groups. Note, breaks, administration).

that these are only the most common regulations. Other

ones might arise depending on a specific company or coun-

2.1

Disruption management

try.

When a disruption happens, usually one of the vehicles in

service becomes unavailable for a period of time. This leads

• Vehicle regulations

to the pre-planned schedule becoming infeasible, as one or

more of the tasks are not executed by a vehicle anymore.

– Vehicle depot and trip compatibility : As we men-

These trips are addressed from now on as disrupted trips.

tioned earlier, vehicles can be classified into de-

A vehicle schedule becomes infeasible if it contains such

pots. Trips can only be serviced by a fixed set of

trips. Companies usually have a backup vehicle and driver

depots, and this should be respected when creat-

ready, which are dedicated to such situation, but this solu-

ing a new schedule to manage the disruption.

tion might not be the best one.

– Vehicle type and trip characteristics: Similarly to

depot compatibility, some trips can only be exe-

Moreover, most real-life cases have a so-called multiple depot cuted by vehicles of a certain type. For example,

problem. There are several different depot locations and/or

trips that are carried out between different cities,

vehicle types, and every trip can only be serviced by vehicles or have a long distance, must have a bus with

belonging to a set of pre-determined depots.

Because of

special equipment (eg. air conditioning).

this fact, our problem is NP-hard in the case of 2 or more

depots. The vehicle rescheduling problem can be reduced

• Driver regulations

to the vehicle scheduling problem, which was proven to be

– Maximum driving time: Each driver has a maxi-

NP-hard by Bertossi et al. [1]. However, the problem itself

mum daily driving time, which they can not ex-

should be solved as quickly as possible, and order should be

ceed.

restored with a new feasible solution.

– Driver breaks: After given time periods, drivers

have to be assigned breaks. Moreover, these breaks

2.2

Related work

have to be assigned at specific geographical loca-

To our knowledge, the literature regarding bus disruption

tions.

management is scarce. Disruption management connected

– Maximum working time: Similarly to driving time,

to different transportation fields has been researched for a

the daily working time of drivers is also maxi-

longer period of time.

mized. This does not equal driving time, as driver

duties have other events as well, which do not re-

The earliest papers regarding disruption management were

quire a vehicle (eg. administration).

published about the airline industry. An overview paper by

Clausen et al. can be read in [4, 3]. A bit younger field

is disruption management is railway transportation. Some

Besides these regulations, some structural modifications should results regarding this area can be read in [7].

also be considered in the schedule. For an illustrative exam-

ple, refer to Figure 1.

Both airline and railway disruption management differ from

bus disruption management in their underlying structure.

While buses are quite easy to move around with the help

of deadhead trips between different geographical locations,

the deadheading of airplanes is mainly prohibited by the

high arising cost, and railway deadheading is subject to the

underlying limited rail capacities.

As we have mentioned, the problem of bus rescheduling as

defined above was only considered by Li et al. In their pa-

pers, they studied the single depot BRP. They give a quasi-

assignment model and an auction algorithm for the prob-

lem in [9], and a network flow model is described in [10]

which is solved with the help of Lagrangian relaxation. In

[8], they also describe a possible decision-support system for this problem, which is illustrated with the help of a small

real-world instance.

2.3

Real-life criteria

Figure 1: An illustrative example

In a real-life application, there are several rules and con-

straints that make the problem more complicated. There

On the above figure, a disrupted daily schedule can be seen,

are regulations that cannot be violated, while the violation of with a disruption represented by a vertical line. There is

others should be penalized. As we mentioned earlier, we can

one disrupted trip J0. Depending on some circumstances,

define daily schedules of both vehicles and drivers, thus we

we can give several solutions for the problem. Here are a

classify our rules into two groups. In this subsection, we will few examples:

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

30



• If trip J0 is compatible with duty F1, and there is

• An integer parameter that gives the maximum number

enough time to insert it to the available gap (together

of feasible schedules which can be modified.

with any necessary deadhead trips), then we can solve

•

the problem.

A parameter that limits the maximum length of the

newly introduced deadhead trips.

• If we want to insert J0 to duty F2, we have to remove

•

trip J

A parameter, which gives the latest point in time, by

2. There are several different places we can insert

this newly removed trip. It might fit into the gaps in

the end of which the algorithms should not modify any

duties F

more feasible schedules.

1 or F5 (if compatible, and there is enough

time for deadheads). We might also be able to insert

• A parameter on the number of suggestions (feasible

them to duties F3 or F4. However this would mean we

solutions).

have to remove trip J3,2 or J4 respectively, and finding

a new duty for them.

• A parameter on the maximum running time.

• Following the above logic, there are several different

scenarios that utlizie removing trip from a duty, and

As it can be seen in the list of proposed parameters, there

inserting it to another one.

are none that correspond driver rules. Driver regulations

are very strict, and most of them are defined by the EU,

• We might delay trip J3,2 in duty F4. This can give us

and cannot be violated by any means. Depot compatibility

a big enough gap to insert our trip J0 (if it is allowed

and vehicle type correspondence might also be strict, but we

by compatibility and deadheads).

decided to let the operator of the system decide about their

violation.

As it can be seen from the above examples, there are some

other constraints as well that are connected to the original

3.

THE SOLUTION FRAMEWORK

duties or tasks (eg. modifying starting time, or removing a

In the previous sections, we presented our basic ideas behind trip from its original duty).

the methodology for the problem. We described a framework

that does not depend on the solution algorithm it executes,

A solution method for this problem need to know the ex-

and can be controlled through a list of different parameters.

tent to which it can violate the constraints we introduced

In Figure 2, we give a layout of the different parts of the

above, and it also needs to know the hardness of the con-

system.

straint. This information can be given easily with the use of parameters.

2.4

Parameters

In this subsection, we introduce parameters for the differ-

ent rules and constraints given in the previous subsections.

These parameters need to be considered by any algorithm

solving the bus rescheduling problem:

Figure 2: The structure of the framework

• A binary parameter that allows the violation of depot-

compatibilities. A penalty parameter for each viola-

The input for the system consists of two parts. One part

tion of depot-compatibility is also needed.

is the list of parameteres, that we described in the pre-

vious section.

The other part is the problem data itself,

• A binary parameter that allows the violation of ve-

containing the problem specific data tables. The type and

hicle type correspondence. A penalty parameter also

structure of these tables of course can vary between differ-

has to be introduced for each violation of vehicle type

ent implementations of such a system, but it is important

correspondence.

that it contains the disrupted schedule and the disrupted

•

trips.

For the remainder of the paper, we will refer to

A binary parameter that allows the introduction of

the pair of {disruptedschedule, setof disruptedtrips} as a

lateness. A penalty parameter also has to be intro-

conf iguration.

duced per 1 unit (minute) of lateness.

• A binary parameter that allows the movement of trips

The input determines the starting configuration of our prob-

between feasible schedules.

A penalty parameter is

lem, which is a schedule that contains feasible duties, and a also needed that gives the cost of each such move.

set which contains the disrupted trips, that are not executed currently by any vehicle. A configuration is supposed to be

• An integer parameter that limits the maximum amount

feasible, if its set of trips is empty, and all the duties in its of lateness which can introduced by the algorithms to

schedule are feasible and do not violate any regulations or

the schedule.

parameters.

• An integer parameter that limits the maximum amount

of lateness an algorithm can introduce to a single event.

Once all the input is read, it is then transferred to the main module of the system, which we call the Rescheduling Black

• An integer parameter that limits the maximum amount

Box (RBB). This is the part that carries out the solution

of lateness an algorithm can introduce to one duty.

process, and consists of two parts:

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

31

• The Solver Library (SL) manages the different solution The second method is a local search heuristic (where a tabu

methods that are built into the system. The system

list can also be applied effectively). This method creates

can have any number of implemented solution meth-

a (probably infeasible) pseudo-schedule from the disrupted

ods, and the desired method can be invoked by pa-

trips, without any regards to the regulations. In each it-

rameter setup. If there is a possibility to parallelize

eration step of the search, either a move or a swap opera-

solution processes, multiple methods can also be exe-

tion is executed. A trip is either moved from a schedule to

cuted at once.

another, or is swapped with events from another schedule.

One important rule is that trips cannot be moved onto the

• The output of a solution method is sent to the Solu-

pseudo-schedule. The local search finds a feasible solution,

tion Collector (SC). This sub-module is responsible for

if the pseudo-schedule is empty.

managing feasible solutions. If more solution methods

are running in parallel, all of them send their results

Test results of the framework in real life have been promis-

to the SC. The SC then filters any duplicate solutions,

ing. All instances have a short running time (they all finish and also gives an ordering of the remaining ones based

under 1 minute even for bigger instances). It can also be

their cost.

seen, that these methods will find multiple feasible solution while exploring there solution space. Because of the SC in

the system, the solutions can all be saved, and the ones that Once all solution algorithms finish their execution, or the de-best suit the parameters of the operator can be chosen at

sired maximum runtime is reached, the SC returns a number

the end.

of feasible solutions. The desired number can be given by

the operator as an input (number of suggestions), and their

Besides the real instances, we also tested our system on ran-

order is determined mainly by their cost, but they can also

domly generated data. In Table 1 and Table 2, we present

be filtered considering the different parameters (eg. ask for the results of the above methods. The tables give the in-ones with no lateness, if possible). The operator receives

stance name, the number of original trips in the schedule,

this data, and can use these to decide on how to solve the

the number of the disrupted trips, and the running time

arising problem.

of the recursive and local search methods on seconds. All

instances in Table 1 are generated using the method in [5].

4.

A REAL LIFE APPLICATION

In this section, we briefly describe a real-life application of Table 1: Solution on random instances from [6].

the above system. As we mentioned earlier, solution for the

Instance

Depots

Trips

Disrupted

Rec.

Loc.

rescheduling problem has to be adequatly quick, as the order

trips

(s)

(s)

in transportation has to be restored as soon as possible.

random1

2

12

1

0.02

0.001

Because of this, the implemented heuristic methods must

random2

4

100

1

0.05

0.004

have a running time of at most a couple of seconds to be

random3

4

500

1

0.08

0.01

acceptable.

random4

4

800

1

0.08

0.05

Also, because of the flexibility of the SC module, it is useful to provide solution methods which find multiple feasible so-We also present two real-life test results from the city of

lutions during their runtime. This gives the operator more

Szeged, Hungary. The smaller instance is only for a district

suggestion options to choose from. A simple approach that

of the city on a workday, while the bigger instance is that of we implemented in our system is a naive search, which ba-a Saturday.

sically finds all the possible trivial insertions in the starting configuration (if any), and inserts them to the solution col-lector. If there is any trivial insertion, it is highly likely that Table 2: Solution on real-life instances.

it will be the cheapest solution with regards to any of the

parameters.

Instance

Depots

Trips

Disrupted

Rec.

Loc.

trips

(s)

(s)

In one of our previous papers, we formulated a mathematical

szeged small

4

206

2

0.399

0.548

model for the bus rescheduling problem [6]. The size of the

szeged sat

4

1983

1

0.037

0.059

problem is big, and it cannot be solved efficiently even for

smaller random instances. However, we also proposed two

solution heuristics in the same paper. These methods have

As it can be seen from the above tables, all test results have also been implemented to the system.

a good running time for both of our heuristics. The methods

also returned several suggestions for the test cases.

The first method is a recursive search, which uses the ini-

tial configuration as an input, and inserts a disrupted trip

5.

CONCLUSIONS AND FUTURE WORK

into one of the schedules in each step. If the disrupted trip The long-term plans of public transportation companies are

cannot be inserted trivially, then other trips are moved from disrupted on a daily basis. These disruptions have to be

the schedule to the disrupted set. To avoid exponential ex-

addressed as soon as possible. In this paper, we introduced

plosion, we gave a depth limit to the recursive calls, and

a decision support framework for the rescheduling problem

also have a function that cuts certain branches of the search in public bus transportation.

tree. The recursive search finds a feasible solution, if the

disrupted set is empty.

We analyzed both vehicle and driver regulations for the daily matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

32

schedules, and determined the parameters that have to be Symposium on Operational Research in Slovenia

considered during the rescheduling process. The framework

SOR’11, pages 341–345, 2011.

we described is independent of the solution method, and is

also able to run different solution methods in parallel. It

can also store and return multiple solutions, which gives the flexibility to the company operator to choose according to

his or her needs.

6.

ACKNOWLEDGMENTS

This work was partially supported by the European Union

and co-funded by the European Social Fund through project

HPC (grant no.: TÁMOP-4.2.2.C-11/1/KONV-2012-0010).

7.

REFERENCES

[1] A. A. Bertossi, P. Carraresi, and G. G. Gallo. On

some matching problems arising in vehicle scheduling

models. Networks, 17(3):271–281, 1987.

[2] J. Békési, A. Brodnik, D. Pash, and M. Krész. An

integrated framework for bus logistic management:

case studies. Logistik Management, pages 389–411,

2009.

[3] J. Clausen, A. Larsen, J. J. Larsen, and N. J.

Rezanova. Disruption management in the airline

industry-concepts, models and methods. Computers &

Operations Research, 37(5):809–821, 2010.

[4] J. Clausen, A. Larsen, and J. Larsen. Disruption

management in the airline industry - concepts, models

and methods. Technical report, Informatics and

Mathematical Modelling, Technical University of

Denmark, DTU, 2005.

[5] B. Dávid and M. Krész. Application oriented variable

fixing methods for the multiple depot vehicle

scheduling problem. Acta Cybernetica, 21(1):53–73,

2013.

[6] B. Dávid and M. Krész. A model and fast heuristics

for the multiple depot bus rescheduling problem.

PATAT 2014: Proceedings of the 10th International

Conference of the Practice and Theory of Automated

Timetabling, pages 128–141, 2014.

[7] J. Jespersen-Groth, D. Potthoff, J. Clausen,

D. Huisman, L. G. Kroon, G. Maróti, , and M. N.

Nielsen. Disruption management in passenger railway

transportation. Technical report, Erasmus University

Rotterdam, 2007.

[8] J.-Q. Li, D. Borenstein, and P. B. Mirchandani. A

decision support system for the single-depot vehicle

rescheduling problem. Computers & Operations

Research, 34(4):1008–1032, 2007.

[9] J.-Q. Li, P. B. Mirchandani, and D. Borenstein. The

vehicle rescheduling problem: Model and algorithms.

Networks, 50(3):211–229, 2007.

[10] J.-Q. Li, P. B. Mirchandani, and D. Borenstein. A

lagrangian heuristic for the real-time vehicle

rescheduling problem. Transportation Research Part

E: Logistics and Transportation Review,

45(3):419–433, 2009.

[11] K. Nurmi, J. Kyngäs, and G. Post. Driver rostering

for bus transit companies. Engineering Letters,

19(2):125–132, 2011.

[12] A. Tóth and M. Krész. A flexible framework for driver

scheduling. Proceedings of the 11th International

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

33

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

34





Traffic Sign Symbol Recognition with the D2 Shape Function

Simon Mezgec and Peter Rogelj

Faculty of Mathematics, Natural Sciences and Information Technologies University of Primorska

Koper, Slovenia

simon.mezgec@student.upr.si, peter.rogelj@upr.si

ABSTRACT

Piccioli presented a robust method for traffic sign detec-

This paper describes our application of a novel method in

tion and recognition [10]. One year later, de la Escalera

the field of traffic sign recognition - the D2 shape function.

developed an algorithm that focuses on the recognition of

We first give an overview of the advances and research in

traffic signs - the algorithm uses neural networks for the

this field. We then describe the D2 shape function that was

classification of traffic signs [4].

De la Escalera built on

originally used to classify 3D models of various objects -

that work in his 2003 article where he used a genetic algo-

because of its robustness we propose its use in the field of

rithm for the detection step, which allowed further invari-

traffic sign recognition. We describe how the program that

ance [3]. In 2000, Pacl´ık proposed his own algorithm for traf-was developed using this method works and present the re-

fic sign classification using the Laplace probability method

sults on a testing set of self-acquired speed limit traffic signs, and an Expectation-Maximization algorithm to maximize

evaluate its performance and compare it to the performance

the likelihood function [9]. In 2004, Fang developed a traf-

of statistical moments.

fic sign detection and recognition system that is based on

a computational model of human visual recognition pro-

General Terms

cessing [5]. Two years later, Gao improved upon the hu-

man vision model recognition using separate models for ex-

Algorithms

tracting color and shape information, respectively [6]. In

2007, Maldonado-Bascón described a traffic sign detection

Keywords

and recognition system that is based on support vector ma-

Symbol recognition, traffic signs, D2 shape function

chines which is invariant even to partial occlusions [7]. In the same year, Cyganek also used support vector machines in his

1.

INTRODUCTION

system for traffic sign detection, whereas the recognition is While driving a vehicle, the driver has to be constantly aware performed using neural networks [2]. Finally, in 2009, Baró

of a number of factors, one of which are traffic signs. These presented a system that performs traffic sign detection with

signs can have specific properties, such as color and shape,

the boosted detectors cascade and traffic sign recognition

which allow their recognition with the use of Computer Vi-

with a forest of optimal tree structures that are embedded

sion. So it is no surprise that this is one of the important

in the ECOC (Error-Correcting Output Code) matrix [1],

steps towards the automatization of driving vehicles. Once

whereas in the next year, Ruta added a tracking procedure

the traffic sign is detected, we would usually like to know

between the steps of traffic sign detection and recognition -

what kind of symbol (if any) the traffic sign contains. Recog-this tracker predicts the position and scale of the sign can-

nition is useful for many purposes: the driver could forget

didate to reduce computation. That system uses Color Dis-

what the last speed limit he drove by was, an inexperienced

tance Transform for the classification of traffic signs [11].

driver could see a traffic sign for the first time and he would like to know its meaning etc. It is therefore apparent that

In this paper we present an alternative method for traf-

traffic sign recognition has a real-world use.

fic sign recognition, based on the D2 shape function that

promises high robustness required for such applications. The

There have been many different methods proposed in the

method is presented in the next Section and its practical ap-

field of traffic sign detection and recognition in the last two plication in Section 3.

decades.

Let us mention some implementations that at-

tracted the most attention from the community. In 1996,

2.

METHOD DESCRIPTION

For the recognition of traffic sign symbols we use the D2

shape function - a method first described by Osada in his

2001 article [8] - it is therefore a fairly new Computer Vision method. The function samples random pairs of points and

creates a histogram of distances between these point-pairs.

Figure 1 shows how the D2 shape function creates the dis-

tance histogram between pairs of points on a cup [12].

The idea behind the function is that with a large enough

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

35





Figure 2: On the left we have the original segmented

image. We then perform two morphological opera-

tions - first erosion (middle image) and then dila-

tion (right image).

The result is the symbol with

the original size but without the noise.

3.1

Preliminaries

First we read the image, change its color space from RGB to

Figure 1: The distances between randomly gener-

HSV and perform the segmentation of the image by using the

ated pairs of points are converted to a distance his-

mean of all the brightness values - if a pixel has a brightness togram with the appropriate use of normalization.

value smaller than the mean brightness of the image, we set

The histogram represents the D2 shape function for

the corresponding pixel of the binary image to 1, otherwise

the cup and therefore defines the object descrip-

to 0. Once the image is successfully segmented, we have to

tion.1

make sure that aside from the traffic sign symbol we did

not detect any anomalies that are not part of the symbol.

These errors can occur due to different light reflections, dirt number of sampled point-pairs on the object we get such a

or dust on the camera lens as well as on the traffic sign etc.

distribution of the distance histogram that is specific for the object, which means that we are able to recognize that ob-We use two morphological operations for noise reduction -

ject. In our case we compare the distance histogram with the

erosion and dilation. With erosion, we shrink all the areas in histograms created from each of the images in the learning

the image and as a result the noise is removed, but we also

set. The procedure itself is therefore simple - we randomly

get smaller areas of the detected traffic sign symbol. Because generate a large number of point-pairs on the surface of the

we only want to delete the noise and preserve the symbol

object and compute distances between them. It is worth

itself, we then use the morphological operation of dilation

noting that aside from the D2 shape function, Osada intro-

which expands the symbol to its primary size.

Figure 2

duced other randomized shape functions [8]:

shows the effect of these operations.

• A3: computing the angle between three randomly gen-

In this paper we focus on speed limit traffic signs, where

erated points on the surface of the 3D model of the

the symbol can be divided into numerals that are recog-

object,

nized separately, which is why the next step is searching

• D1: computing the distance between a fixed point and

for connected components. This search is important for the

a randomly generated point on the surface of the ob-

recognition of numerical symbols, because with it, we can

ject,

divide the number into numerals and since the numerals do

not overlap, we can safely assume that one connected com-

• D3: computing the square root of the surface of the

ponent represents one numeral. The reasoning behind this

triangle, defined by three randomly generated points

search is that we can achieve greater recognition accuracy

on the surface of the object,

by recognizing each of the numerals than by recognizing the

• D4: computing the cube root of the volume of the

whole number at once because we do not have to model ad-

tetrahedron, defined by four randomly generated points

ditional dependencies between the numerals which can be

on the surface of the object.

different from traffic sign to traffic sign - the most obvious of these is the distance between the numerals in the num-Out of all these functions, the D2 shape function turned out

ber. In the case of non-numerical symbols, the components

to be the most robust [12], which is why we decided to use

would be recognized separately as well.

it for our application in traffic sign symbol recognition.

3.2

Computing the D2 shape function

3.

PROGRAM DESCRIPTION

At the beginning of the D2 shape function computation we

The program for traffic sign symbol recognition was devel-

visit each of the connected components (except for the first

oped in the Matlab programming language and in this Sec-

one which represents the background). The first step is the

tion we explain how it works. Here we do not focus on traffic generation of random x and y coordinates between values

sign detection and assume that traffic signs are already seg-

1 and the image height for the x coordinate and between

mented and symbols cropped out of them. The detection

1 and the image width for the y coordinate - the number

and cropping is performed within the program for traffic

of generated coordinates is defined by an input parameter.

sign detection which was previously developed and is not a

Then we go through the array of generated coordinates and

part of the traffic sign recognition program.

for each pair of coordinates we check if the pixel with that

coordinates lies on the numeral that is currently being rec-

1The image is taken from Wohlkinger’s paper [12].

ognized. If it does, we copy this pair into a new coordinate

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

36





Figure 3: Learning set of images with numerals in

the standard font of Slovenian traffic signs.

Each

numeral is saved in its own image.

Figure 4: Testing set of speed limit traffic signs that

array where we keep only the pixels that lie on the numeral.

was used for the definition of the parameters and

testing the performance of the traffic sign symbol

After that there is the main part of the D2 shape func-

recognition program.

tion computation. We first check if we managed to find a

sufficient number of coordinate pairs from pixels positioned

on the numeral - if we did, we continue with the computa-

distances between pairs of bins at the same positions. We

tion and if we did not, we abort the computation for the

use the 1-nearest neighbor method to classify the numerals

recognition of the current region and continue with the next

in the image because we only have one sample per numeral

one. The reason for this check is further insurance from the

and one numeral represents one class.

presence of noise - if we did not find a sufficient number

of coordinate pairs, then we can conclude that the detected

4.

RESULTS

area is too small and consider it being noise.

For the definition of the parameters and testing the perfor-

mance of the traffic sign symbol recognition program, we

Then we visit the first half of the array with randomly gen-

acquired a testing set of 46 speed limit traffic signs. Figure erated x and y coordinates and in each step we compute the

4 shows this testing set of traffic signs. It contains images of Euclidean distance between a pixel in the first half of the

traffic signs taken under different conditions - different light-array and a corresponding pixel in the second half. When

ing, orientation of the sign, size of the sign, quality of the we are done with the computation, we sort the distances by

image etc. This difference between the conditions of each

size and normalize them so they all have values from 0 to

image was achieved on purpose - to prove the robustness of

1 - we do that by dividing every distance with the maxi-

the program.

mum distance obtained in the analyzed region. The latter is

necessary for the comparison of images with different sizes.

The procedure of defining the parameters was performed

As a result we therefore return a matrix of these normalized

with the help of auxiliary functions and the final values of

and sorted distances for each of the detected numerals.

the parameters are saved in a configuration file. They are

as follow:

3.3

Learning and comparing the input image

• number of randomly generated pixels for the D2 shape

with the learning set

function computation: 100000,

Apart from the main function of the traffic sign symbol

• percentage of randomly generated pixels that have to

recognition program, we use two more auxiliary functions

lie on the symbol: 0.1 (10%),

to properly recognize the traffic sign - one for teaching the

• number of learning iterations for the images from the

program on the images from the learning set and one for

learning set: 5,

the recognition of the input image using the images from

• number of iterations of comparison between the input

the learning set. Figure 3 shows the learning data set. It

image and images from the learning set: 10,

is important to note that the font for Slovenian traffic signs

• number of bins in distance histograms: 20.

is standardized which means that one sample per numeral

is sufficient. We will describe the learning process and the

Once the parameters were defined, we ran the traffic sign

comparison in this Section.

symbol recognition program on the testing set of traffic signs to test the performance of the program. The final result is

For the learning, we go through the 10 images from the learn-

42 of 46 accurately recognized speed limit traffic signs, which ing set and for each of them we run the symbol recognition

is approximately a 91.3% accuracy.

code as many times as is defined by the input parameter

and we then average the results. For the comparison of the

5.

COMPARISON WITH STATISTICAL

input image with the learning set, we repeat the recogni-

tion multiple times so that we increase the probability of

MOMENTS

an accurate recognition - we then return the result that was

When developing the program for traffic sign recognition,

recognized in most of the repetitions. It should be noted

we first tried using statistical moments as the recognition

that we convert distances into histograms, which is needed

method. In particular, we used a combination of normalized

because a direct comparison of distances would provide in-

central moments, which are invariant to the position as well

accurate results as the distance values are too specific. We

as to the size of the object, and Hu moments, which are fur-

compare each bin of the distance histogram of the current

ther invariant to rotation. The framework of the prototype

numeral on the input image with the histogram of the cur-

is very similar to the program described in Section 3, and

rent image from the learning set and compute the Euclidean

the learning and testing sets of traffic sign images are the

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

37

same, so the only difference is in the method that was used conclude that the D2 shape function seems well suited for

for each prototype. We can therefore compare the results of

this field of Computer Vision.

the two traffic sign recognition programs.

7.

REFERENCES

After testing the statistical moments prototype, it quickly

[1] Xavier Baró, Sergio Escalera, Jordi Vitrià, Oriol Pujol,

became obvious that it would not be a good fit for traffic

and Petia Radeva. Traffic Sign Recognition Using

sign recognition - we tested the program several times, every Evolutionary Adaboost Detection and Forest-ECOC

time with different maximum orders of statistical moments

Classification. IEEE Transactions on Intelligent

(from 5 all the way up to 40). The problem was that the

Transportation Systems, 10(1):113–126, March 2009.

calculated values of statistical moments did not yield any

[2] Bogus law Cyganek. Circular road signs recognition

discernible patterns, which is the basis for object recogni-

with soft classifiers. Integrated Computer-Aided

tion. The prototype thus behaved unpredictably and often

Engineering, 14(4):323–343, 2007.

recognized a false symbol. After using the same program

[3] Arturo de la Escalera, José M. Armingol, and Mario

framework to develop a prototype using the D2 shape func-

Mata. Traffic sign recognition and analysis for

tion and getting promising results, we concluded that the

intelligent vehicles. Image and Vision Computing,

fault was not in the way we developed the program or in

21(3):247–258, March 2003.

the set of acquired images, but in the method used. We

[4] Arturo de la Escalera, Luis E. Moreno, Miguel A.

therefore abandoned statistical moments and moved on to

Salichs, and José M. Armingol. Road Traffic Sign

search for more suitable methods, until we found the D2

Detection and Classification. IEEE Transactions on

shape function.

Industrial Electronics, 44(6):848–859, December 1997.

While statistical moments have a wide range of use, we found

[5] Chiung-Yao Fang, Chiou-Shann Fuh, Pei-Shan Yen,

the D2 shape function to be much more suitable for traffic

Shen Cherng, and Sei-Wang Chen. Road traffic sign

sign recognition.

detection and classification. Computer Vision and

Image Understanding, 96(2):237–268, November 2004.

[6] Xiaohong W. Gao, Lubov N. Podladchikova,

6.

CONCLUSION

Dmitry G. Shaposhnikov, Keum-Shik Hong, and

Shape functions up until now were not used in traffic sign

Natalia A. Shevtsova. Recognition of traffic signs

recognition - we presented a prototype using the D2 shape

based on their colour and shape features extracted

function that recognizes traffic signs. It should be noted

using human vision models. Journal of Visual

that while the testing set, presented in Section 4, is a col-

Communication and Image Representation,

lection of traffic sign images that are as varied as possible, 17(4):675–685, August 2006.

therefore maximizing the robustness of the traffic sign recog-

[7] Saturnino Maldonado-Bascón, Sergio Lafuente-Arroyo,

nition program, the set is still quite small - only 46 images.

Pedro Gil-Jiménez, and Hilario Gómez-Moreno.

The problem is that there is no publicly available deposi-

Road-Sign Detection and Recognition Based on

tory or collection of Slovenian traffic sign images and the

Support Vector Machines. IEEE Transactions on

set therefore had to be acquired by hand, which, due to the

Intelligent Transportation Systems, 8(2):264–278, June

fact that we want to capture as many different traffic signs

2007.

as possible, is very time-consuming. That is why we focused

[8] Robert Osada, Thomas Funkhouser, Bernard

only on speed limit traffic signs, but we could easily expand Chazelle, and David P. Dobkin. Matching 3D models

the scope to all traffic signs as the D2 shape function is not with shape distributions. SMI 2001 Proceedings of the

limited to any number of components. To add a new type

International Conference on Shape Modeling and

of traffic sign, all we would need to do is to add the corre-

Applications, pages 154–166, May 2001.

sponding traffic sign symbol to the learning set.

[9] Pavel Pacl´ık, Jana Novovičová, Pavel Pudil, and Petr

Somol. Road sign classification using Laplace kernel

As described in Section 4, the final performance of the pro-

classifier. Pattern Recognition Letters,

gram is 91.3% accurately recognized speed limit traffic signs, 21(13-14):1165–1173, December 2000.

which is an encouraging result and an initiative for further

use of the D2 shape function in the field of traffic sign recog-

[10] Giulia Piccioli, Enrico De Micheli, Pietro Parodi, and

nition and possibly other fields too. It is a robust procedure Marco Campani. Robust method for road sign

which also works on very small and low quality images - the

detection and recognition. Image and Vision

smallest image used for testing had a resolution of 38 × 31

Computing, 14(3):209–223, April 1996.

pixels and the program correctly recognized it. It is worth

[11] Andrzej Ruta, Yongmin Li, and Xiaohui Liu.

noting that this method for recognition is much simpler than

Real-time traffic sign recognition from video by

traditional methods, in particular comparing to solutions,

class-specific discriminative features. Pattern

mentioned in the Introduction, where methods like neural

Recognition, 43(1):416–430, January 2010.

networks and support vector machines were used. In terms

[12] Walter Wohlkinger and Markus Vincze. Analysis and

of performance we can not directly compare it to these solu-

Evaluation of Shape Functions for Object Class

tions since we do not have a testing set that would be large

Recognition. Proceedings of the 17th Computer Vision

enough, but we did get a sense that it is a robust method.

Winter Workshop, February 2012.

As a final note, the D2 shape function performs much bet-

ter than statistical moments for traffic sign recognition, as is described in Section 5. Considering our results we can

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

38





On numerical simulation of filtration gas combustion

processes on the shared memory machines.

Tatyana Kandryukova

Institute of Computational Mathematics and Mathematical Geophysics SB RAS

630090, prospect Akademika Lavrentjeva, 6

Novosibirsk, Russia

kandryukova@labchem.sscc.ru

ABSTRACT

In particular low-speed mode of filtration gas combustion

This work is devoted to the searh of the efficient algorithms process, which is under consideration, is the physical basis

for the simulation of filtration gas combustion processes. In for construction modern industrial flame arrester, environ-particular, a two-level parallel algorithm based on the ex-

mentally friendly burners and other. Therefore numerical

plicit finite difference scheme using an adaptive mesh is con-simulation of FGC is the problem of current interest that

structed. Two ways of parallelization, namely, the direct

has important practical application.

usage of OpenMP directives and special distribution of data

From the physical point of view the process presents the

across threads are carried out. It is numerically shown, that propagation of the region of gaseous exothermic reaction in

the last one provides significant performance advantage.

chemically inert porous medium, as gaseous reactants are

being supplied into the region of chemical transformation

Categories and Subject Descriptors

[2]. It is implemented as follows. Let there be a tube filled with a porous material, measuring about 10 cm. From one

D.1.3 [Programming Techniques]: Concurrent Program-

edge of it a combustible gas mixture is supplied at a rate of ming; G.1 [Numerical Analysis]: Miscellaneous; J.2 [Physical v. Then the mixture is ignited, resulting in a flat combustion Sciences And Engineering]: Mathematics and statistics,

front, which can either be stationary or move in any direc-

Physics, Chemistry

tion (at a rate of u), depending on the model parameters

(see Figure 1). It occurs due to the heat recuperation. In

General Terms

the heating zone fresh gas mixture gets hot and the chemical

Algorithms, Performance

process occurs in the reaction zone that causes heat release.

In the relaxation zone high-temperature products of com-

Keywords

bustion exchange heat with the porous solid and then heat

Combustion wave, explicit difference schemes, adaptive grids, transfers through it back to the heating zone by means of

parallelization on shared memory, OpenMP-threads

thermal conduction.

Supervisor

Yuri Laevsky

Institute of Computational Mathematics and Mathematical

Geophysics SB RAS

630090, prospect Akademika Lavrentjeva, 6

Novosibirsk, Russia

Figure 1: The scheme of the physical model of FGC

1.

INTRODUCTION

The phenomena of filtration gas combustion (FGC) had

The main difficulty in the simulation of this process is its

been discovered in the 70s of the previous century and it

very different scales; it is especially hard to deal with the has still not been studied fully. Meanwhile the knowledge of

kinetic-diffusion distinction.

the properties of FGC processes is essential in solving many

In this study the simplest case is considered, when the model problems of chemical technology, ecology, fire safety, etc.

is one-dimensional, and still it takes hours to simulate the

process that lasts less than a half of a minute. It doesn’t

seem possible to proceed to multidimensional case under

such conditions. At the present time great expectations for

having the opportunity to deal with such tasks are related

to the appearance of multi-processor machines and the de-

velopment of parallel methods.

This work is an expansion of [3] and devoted to the analysis

of the possibilities to improve the performance of some com-

putational models of FGC in low-speed mode. In particular

the introduction of adaptive nested grid [5] and paralleliza-

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

39





tion on shared memory [1] are discussed. All the constructed

ηn

ηn+1

i−1 − 2ηn

i + ηn

i+1

i

= ηn

i + τ (ag

algorithms provide us with the solutions of the problem that

h2

−

coincide with the experimental data. The special method

ηn

i − ηn

i−1

of parallelization on the machines with shared memory was

− v

h

− W (ηn

i , H n

i )),

i = 1, ..., M.

(6)

developed and applied to the constructed algorithms, that

Here M is the number of spatial steps, size of which h is

reduces the computational time greatly not only compared

chosen from the condition:

to the sequential implementation, but to the classical way

of parallelization with the direct usage of OpenMP pragmas

maxi |T N

i (h) − T N

i (2h)| < 0.02,

(7)

as well.

maxi T N (2h)

i

where N is the number of time steps. Size of time step

τ must satisfy a stability condition for explicit difference

2.

MATHEMATICAL BASIS

schemes, that is theoretically unknown, and is fitted exper-

2.1

Mathematical model

imentally. The table 1 represents values of K(h) and τ (h)

The simplest one-dimensional FGC model in the enthalpy

corresponding to different values of h. Here L = 0.1 is length formulation includes three equations:

of the tube.

∂T

∂2T

Q

h

L

= a

+ α

η),

(1)

· 2−10

L · 2−11

L · 2−12

L · 2−13

L · 2−14

∂t

s ∂x2

s(H − T − cg

τ (h)

2−16

2−17

2−19

2−21

2−23

K(h)

0.159

0.096

0.053

0.028

0.014

∂H

∂2H

∂H

Q

= a

+ α

η),

(2)

∂t

g ∂x2 − v ∂x

g (T − H + c

Table 1: Values of K(h) and τ (h) corresponding to

g

different values of h

∂η

∂2η

∂η

= a

∂t

g ∂x2 − v ∂x − W (η, H).

(3)

Meanwhile away from the flame front, the solution func-

Here ai = λi/ciρi is the coefficient of thermal diffusivity

tions are quite smooth and don’t need such small steps (see

of the i-th phase, ci, ρi, λi are respectively, specific heat graphic example at Figure 2). Thus, one way to speed up

at constant pressure, density and thermal conductivity of

i-th phase (i = s for porous solid, i = g for gas), αs =

α

, α

, m - porosity, α - interphase heat

(1

g =

α

−m)cs ρs

mcg ρg

transfer rate, v - flow rate of the combustible mixture, T ≡

Ts, cgH = cgTg + Qη is full gas enthalpy, where Ti is the

temperature of the i-th phase, Q is energy release of the re-

Q η)

action, W (η, H) = k

c

0ηe−E/R(H− g

- the chemical reaction

rate according to Arrhenius law, η - relative concentration

of reactive component of the combustible mixture, k0 - pre-

exponential factor, E - activation energy, R - universal gas

constant. It is worth noting here that the speed of the wave

u is a priori unknown.

The Cauchy problem is stated by adding the Dirichlet bound-

Figure 2: Graph example of the temperature of solid

ary conditions on the left edge and the Neumann ones on the

T , obtained numerically

right. The initial data correspond to the preheated porous

medium.

the computation seems to be the introduction of an adap-

tive grid. It should depend on the solution from the previous time step and be concentrated in the area of the chemical

2.2

Implementation on a regular and adaptive

transformation. It’s carried out as follows. Let there be a

meshes

full-length regular coarse spatial grid. Implementation of

As there is no analytical solution of the problem the suit-

one time step of the difference scheme on it with the corre-

ability of all the constructed algorithms is estimated by com-sponding big time step provides us with the boundary con-

parison with the solution obtained on the regular fine mesh

ditions on the next time layer for the small task, that is

by using the explicit difference scheme:

stated in the vicinity of the chemical reaction zone similarly the large one. For this enclosed task the denser time-space

T n

mesh is used. The initial data and boundary conditions at

T n+1

i−1 − 2T n

i + T n

i+1

i

= T n

i + τ (as

+

h2

the every time layer are obtained by the means of linear in-

Q

terpolation from the coarse mesh. After the execution of all

+ αs(Hn

i − T n

i −

ηn

c

i )),

i = 1, ..., M

(4)

the steps of the subproblem the values of coarse-grid solu-

g

tions in the nodes used for interpolation are replaced with

the corresponding values from the embedded problem (see

Hn

Hn

Hn+1

i−1 − 2H n

i + H n

i+1

i − H n

i−1

Figure 3, where n is the number of the current time layer

i

= Hn

i + τ (ag

+

h2

− v

h

of the general problem, p is the number of time steps of the

Q

+ α

subproblem, i

g (T n

i

0 is the number of the node where W gets its

− H n

i +

ηn

c

i )),

i = 1, ..., M

(5)

g

maximum). The dense grid moves according to the prop-

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

40





To reduce the time for the exchange of data flows we pro-

pose to distribute the data between the threads. In the case

of regular mesh by these words we mean the following pro-

cedure. The spatial grid is divided into z pieces, where z

is the number of threads used, and each thread gets control

over one of the pieces: thread number zero deals with nodes

[0, . . . , M/z], the next one – with [M/z, . . . , 2M/z] and so on. The last one deals with nodes [(z−1)M/z, . . . , M]. Each thread creates its local arrays, fills them in accordance with the difference scheme, and outputs the results on its assigned grid nodes. On a regular grid we thus obtain practically independent tasks exchanging only boundary conditions.

Figure 3: The scheme of the two-level algorithm

In the case of adaptive grids communication becomes more

complicated because of the need to transfer information from

the nodes of the fine mesh to those of the coarse one. In this case the data distribution algorithm has few stages. After n time steps of general task we have data distributed

across z threads: [0, . . . , Mc/z], [Mc/z, . . . , 2Mc/z], . . . , [(z −

1)Mc/z, . . . , Mc] and shared arrays of size (Mf + 1), where Mc, Mf are the numbers of nodes of the coarse and fine

meshes respectively. The first stage is the implementation

of one time step of coarse grid by every thread separately

(see Figure 5). The second stage prepares the data for the

Figure 4: The scheme of the movement of the adap-

tive fine grid

agation of the combustion front: as soon as the maximum

point of W function moves by the spatial step of the coarse

grid, the fine mesh moves by the same distance (see Figure

4). This approach allows us to take big spatial and time

steps outside the chemical reaction zone. The numerical ex-

periments show that to achieve the same accuracy grade one

may use the coarse grid with the spatial step four times big-

Figure 5: The first stage of data distribution algo-

ger than the one of fine grid. That implies about sixteen

rithm in the case of adaptive grid

times bigger time steps. It is obvious thus that the usage

of such two-level time mesh is indispensable in the problem

inner problem. It is meant that the position of the fine grid like the one in question.

with respect to the coarse one is known and ileft, iright are the numbers of the nodes being the edges of the enclosed

3.

PARALLELIZATION

grid. Initial data obtained by interpolation from the layer n 3.1

Classical way

of coarse grid are entered to the shared arrays and then redis-The simplest and the most obvious way to implement the

tributed across z threads, forming the layer (n + 0/p) of the program on a multiprocessor node is application of OpenMP

enclosed problem (Figure 6). Then the enclosed problem is

pragmas to the inner loops of the program. All the arrays

in this case are shared and all the threads work to fill each of them. As a result data of different types are held in

the storages of different threads and the threads have to

communicate intensively. In the case of the regular mesh

it concerns to the separate computations of the exponent

values and the values of η in the scheme equations of which

they are used, as well as to the move from one time layer

to another. The situation is far more complicated for the

algorithm with the enclosed fine mesh, because in this case

there are a lot of bottlenecks besides the mentioned ones,

related to the repeated transition from coarse mesh to the

fine one and back. Such way of parallelism provides some

acceleration for the execution of regular mesh algorithm, but it leads even to the slowdown of the computation when the

adaptive mesh is used.

Figure 6: The second stage of data distribution al-

3.2

Data distribution

gorithm in the case of adaptive grid

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

41





solved in parallel by analogy with the first stage, i.e. steps (n + 1/p), . . . , (n + p/p) are implemented by every thread

separately (Figure 7). The last stage is the opposite to the

Figure 7: The third stage of data distribution algo-

rithm in the case of adaptive grid

second one. The solution obtained from the layer (n + p/p)

Figure 9: Computational time of all the algorithms

of the inner problem is entered to the shared array and then

depending on the number of threads z: A - regular

necessary data are transmitted to the sertain threads in ac-

mesh and direct usage of OpenMP pragmas, B -

cording to the data distribution on the coarse grid. Besides

regular mesh and usage of parallelism based on the

that the position of the adaptive grid is determined at this

data distribution; obtained with MVS-10P of JSCC

step.

RAS

of which contains two processors Intel R

Xeon R

X5670 (12M

Cache, 2.93 GHz, 6.40 GT/s Intel R

QPI, 6 cores), or on

MVS-10P of Joint Supercomputer Center of the Russian

Academy of Sciences (JSCC), each node of which contains

two processors Xeon R

E5-2690 (20M Cache, 2.90 GHz, 8.00

GT/s Intel R

QPI, 8 cores).

Hyper-threading is turned off,

the KMP Affinity environment variable is set equal to ”com-

pact”. At the Figure 8 one may see a bar chart of the compu-

tational time of each algorithm executed on NKS-30T SSCC.

First of all the importance of introduction of the adaptive

grid in the case of sequentially running programs is well seen from this picture. The use of this technique reduces computational time by a factor of 22 when the only one thread

works. Secondly one could easily compare the efficiency of

both methods of parallelization applied. In the case of the

regular mesh the direct application of OpenMP directives

provides some improvement for the number of threads less

or equal 6. As the number of threads grows data amount be-

comes less while the quantity of data transmissions increases greatly. So for more than 6 threads there is slowdown of par-Figure 8: Computational time of all the algorithms

allel computation based on direct usage of OpenMP prag-

depending on the number of threads z: A - regular

mas. The proposed data distribution algorithm, in opposite,

mesh and direct usage of OpenMP pragmas, B - reg-

shows rather good scalability per number of threads up to

ular mesh and usage of parallelism based on the data

z = 12 in this case. The same data are presented by the

distribution, C - adaptive mesh and direct usage of

means of the Table 2 as well.

OpenMP pragmas, D - adaptive mesh and usage of

parallelism based on the data distribution; obtained

Analogous results for the regular mesh obtained on MVS-

with NKS-30T of SSCC

10P cluster are presented at the Figure 9 and in the Table

3. It is seen that the same calculations take a little less

time, but the general situation is still the same up to 16

To get more detailed information about the construction

threads. As concerns the algorithm with the enclosed prob-

of the adaptive grid and implementation of the respective

lem the first method of parallelization is inadmissible since algorithm with the data distribution see [4].

the more threads is used the longer computation lasts. At

the same time it is seen that for the algorithm with the en-

4.

RESULTS

closed problem even the proposed method doesn’t provide

Numerical experiments were held for the model with the

good scalability though has better effect than the usage of

tube length L = 0.1 m for the physical time t = 15.0 s. All

OpenMP pragmas. So there is an opportunity that with the

the calculations were performed either on NKS-30T Clus-

greater number of threads the usage of the adaptive grid

ter of Siberian Super Computer Center (SSCC), each node

might become ineffective.

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

42





number of

1

2

4

6

8

12

of parallelization.

threads z

A

400

214

137

112

143

166

6.

ACKNOWLEDGMENTS

B

-

260

125

83.0

68.3

46.1

This work was supported by RFBR (12-01-31046, 13-01-

C

17.8

13.7

12.8

14.0

25.4

35.3

00019, 13-05-12051).

D

-

10.4

6.69

5.82

7.82

8.69

7.

REFERENCES

Table 2: Computational time of all the algorithms

[1] The OpenMP

depending on the number of threads z: A - regular

R

API specification for parallel

programming. http://openmp.org.

mesh and direct usage of OpenMP pragmas, B - reg-

ular mesh and usage of parallelism based on the data

[2] V. S. Babkin and Y. M. Laevskii. Seepage gas

distribution, C - adaptive mesh and direct usage of

combustion. Combustion, Explosion, and Shock Waves,

OpenMP pragmas, D - adaptive mesh and usage of

23(5):531–547, September-October 1987.

parallelism based on the data distribution; obtained

[3] T. A. Kandryukova and Y. M. Laevsky. Numerical

with NKS-30T of SSCC

simulation of filtration gas combustion. In Proceedings

of the 11th International Conference on Mathematical

number of

1

2

4

6

8

12

16

and Numerical Aspects of Waves, pages 43–44, June

threads z

2013.

A

323

178

112

90.2

77.4

113

115

[4] T. A. Kandryukova and Y. M. Laevsky. Simulating the

B

-

203

103

68.2

54.2

38.7

33.5

filtration combustion of gases on multi-core computers.

Journal of Applied and Industrial Mathematics,

8(2):218–226, April 2014.

Table 3: Computational time of all the algorithms

depending on the number of threads z: A - regular

[5] Y. M. Laevsky and L. V. Yausheva. Simulation of

mesh and direct usage of OpenMP pragmas, B -

filtrational gas combustion processes in

regular mesh and usage of parallelism based on the

nonhomogeneous porous media. Numerical Analysis

data distribution; obtained with MVS-10P of JSCC

and Applications, 2(2):140–153, April 2009.

RAS

5.

CONCLUSIONS

The appearance of supercomputers and elaboration of par-

allel technology have given renewed impetus for the further

development of numerical methods and has opened the doors

for the modeling of complex tasks. The problem of FGC is

the one of such problems. The specific of its solutions causes the usage of an adaptive mesh to be extremely efficient for

the numerical simulation of FGC processes in the case of se-

quential implementation. However, one should pay special

attention to the parallel realization of such an algorithm,

since unfortunate parallelization might not only show un-

satisfactory scalability, but even augment the computational

time with increasing number of threads. At the same time

the fact, that the number of cores on the computational node

constantly grows and the influence of parallelism increases,

implies the need to construct new algorithms permitting al-

most perfect scaling in the number of threads.

In the paper a special approach to the issue of shared mem-

ory parallelism based on the distribution of data across threads has been proposed. It has been applied to the algorithms

both with the regular and the adaptive grids. The results

of comparison this method with the classical one, when

OpenMP directives is applied to all the internal loops of

the program, have shown that the proposed method is more

efficient for the task in question and is acceptable to the

parallel implementation of the algorithm with the usage of

embedded fine mesh. All calculations were performed for the

problem with characteristic dimensions and empirically cho-

sen parameters of the mathematical model. Solutions pro-

duced by all the constructed algorithms have the required

accuracy grade of approximation and correspond to physical

data. Meanwhile the computational time is reduced by 10

times in the case of the regular mesh and by 3 times in the

case of the adaptive one by the use of the proposed method

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

43

matcos-13 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science Koper, Slovenia, 10-11 October

44

matcos-13



University of Primorska Press

www.hippocampus.si

isbn 978-961-6984-20-1

Not for resale

9 789616 984201





Document Outline


Krész, Miklós, and Andrej Brodnik (eds.). MATCOS-13. Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science. Koper: University of Primorska Press, 2016 (Front Cover)

Colophone

Preface

Contents

Mónika Vigula ◆ Efficient Implementation of Algorithms for Solving Subgraph Isomorphism Problem in Cheminformatics

Marko Grgurovič ◆ Adaptive Algorithms For Dynamic Programming With Applications to Bioinformatics

Nándor Németh ◆ An Algortihm for Recognition of Position Repetition in Chess

Tine Šukljan ◆ Comparison Between a Cache-oblivious Range Query Data Structure and a Quadtree

Gergely Suba and Péter Arató ◆ A New Method for Transforming Algorithm Into VHDL by Starting from a Haskell Functional Language Description

Andrej Bukošek ◆ A System for Parallel Execution of Data-flow Graphs

Balázs Dávid and János Balogh ◆ An Algorithmic Framework for Real-Time Rescheduling in Public Bus Transportation

Simon Mezgec and Peter Rogelj ◆ Traffic Sign Symbol Recognition with the D2 Shape Function

Tatyana Kandryukova ◆ On Numerical Simulation of Filtration Gas Combustion Processes on the Shared Memory Machines