https://doi.org/10.31449/inf.v45i2.3110 Informatica 45 (2021) 205–212 205 Statistics-Based Chain Code Compression with Decreased Sensitivity to Shape Artefacts David Podgorelec, Andrej Nerat and Borut Žalik University of Maribor, Faculty of Electrical Engineering and Computer Science Koroška cesta 46, SI-2000, Maribor, Slovenia E-mail: david.podgorelec@um.si Keywords: chain code, data compression, dynamic programming, pseudo-statistical model, shape artefact Received: April 3, 2020 Chain codes compactly represent raster curves, but there is still a lot of room for improvement by means of data compression. Several statistics-based chain code compression techniques assign shorter extra codes to frequent pairs of consecutive symbols. Here we systematically extend this concept to patterns of up to k > 2 symbols. A curve may be represented by any of the exponentially many overlapped chains of codes, and the dynamic programming approach is proposed to determine the optimal chain. We also propose utilization of multiple averaged hard coded pseudo-statistical models, since the exact statistical models of individual curves are often huge, and they can also significantly differ from each other. A competitive compression efficiency is assured in this manner and, as a pleasant side effect, this efficiency is less affected by the curve’s shape, rasterization algorithm, noise, and image resolution, than in other contemporary methods, which surprisingly do not pay any attention to this problem at all. Povzetek: V članku predstavimo novo metodo za statistično stiskanje verižnih kod, ki dodeli posebne kode pogostim nizom do k simbolov. Optimalno izmed eksponentno mnogo rešitev izbere z dinamičnim programiranjem. Uporablja več povprečenih psevdo-statističnih modelov, ki jih ne shranjuje skupaj s krivuljo. V primerjavi z drugimi sodobnimi metodami doseže konkurenčno stopnjo stiskanja, hkrati pa je manj občutljiva na obliko krivulje, posebnosti rasterizacijskega algoritma, šum in ločljivost slike. 1 Introduction Chain codes compactly represent curves in raster images. More than half a century ago, Freeman [3] used symbols  i  [0 .. 7] to represent each curve pixel p i with the azimuth direction ( i  45) from its predecessor p i–1 measured anticlockwise from the positive x-axis (Fig. 1a). Each symbol is then coded with 3 bits. Alternatively, only 2 bits per pixel are required if the representation relies on 4-connectivity, i.e. the azimuth p i – p i–1 is ( i  90), where  i  [0 .. 3] (Fig. 1b). Several alternative chain code representations were later introduced, but the concept remains the same as in the pioneering Freeman chain codes in eight (F8) or four (F4) directions: symbols from a relatively small alphabet are assigned to subsequent primitives along a curve. In different representations, a primitive may refer to a curve pixel (as in F8 or F4), a vertex between the considered curve pixel and adjacent pixels (Vertex Chain Code – VCC [2] or Three-Orthogonal chain code – 3OT [10]), an edge separating the curve pixel from a background pixel (Differential Chain Code – DCC [9]), or a rectangular cell of pixels (in quasi-lossless representation from [9]). Meanwhile, a symbol models some local geometric relation e.g. relative position of the observed primitive with respect to the previous one. With other words, it represents a command how to navigate from one primitive to the adjacent one along the curve. All these basic chain code representations describe a raster curve efficiently, as they use only 2 or 3 bits per primitive instead of coding grid coordinates with, for example, 2  16 bits per pixel. Nevertheless, numerous successful methods have been proposed to additionally compress raster curves. Figure 1: Freeman chain codes in 8 and 4 directions. Statistical (Huffman or arithmetic) coding is often utilized when the symbols’ probability distribution is significantly non-uniform. Further advances in statistics- 206 Informatica 45 (2021) 205–212 D. Podgorelec et al. based approaches were achieved by introducing extra codes for frequent pairs of primitives [7, 8], or by utilization of multiple statistical models in so-called context-based approaches [1], where the statistical model for coding a considered symbol is conditioned on the context of M (typically 1 or 2) previous symbols. On the other hand, non-statistical approaches perform various string transformations [11, 12], e.g. Burrows- Wheeler Transform (BWT) and/or Move-To-Front Transform (MTFT) to increase the number of 0’s and prepare the data for efficient run-length encoding (RLE) and/or binary arithmetic coding (BAC). BWT rearranges the sequence into runs of equal symbols, while MTFT utilizes local correlations to replace the data values with their indices from typically smaller repertoire. In this paper, we introduce a new statistics-based approach where extra codes may represent patterns of up to k = O(1) symbols. Our aim was to achieve a competitive compression efficiency, but an interesting pleasant side effect was encountered and brought into focus during the method development and testing. Namely, impacts of the curve shape, image resolution, rasterization algorithm, noise, and geometric transformations on the compression ratio are significantly reduced in comparison to other contemporary methods. This topic has been so far addressed indirectly within the context-based approaches and, partially, in the non-statistical approaches, while it was completely neglected in other related works. Section 2 illustrates the overall idea of the proposed approach. Section 3 describes the preparation and utilization of multiple averaged hard coded pseudo-statistical models, crucial for the minimization of the mentioned impacts. Section 4 experimentally confirms the compression efficiency and the reduced dependence on curve’s artefacts. Finally, Section 5 summarizes the presented work, and discusses future research challenges. Figure 2: Freeman’s chain difference coding. 2 New statistics-based method Some years after F8 and F4, Freeman also proposed the chain-difference coding (CDC) [6]. A pixel p i is encoded with the angle difference ∠(p i – p i–1, p i–1 – p i–2). Unlike F8 where all 8 symbols have practically the same probabilities, the 0° angle difference is usually much more frequent than other 7 symbols (Fig. 2). All 8 symbols in Fig. 1a have probabilities either 6/43  13.95% or 7/43  16.28%, and the probabilities of four symbols in Fig. 1b are either 16/63  25.40% or 15/63  23.81%. On the other hand, the probability of symbol 0 in Fig. 2 is 32/42  76.19% while the probabilities of other 7 symbols are all below 10%. Such non-uniform distribution provides a good basis for statistical coding. However, some tens of bits must be spent to store the best-fitted statistical model (BFSM) for an individual curve, which is, particularly with shorter curves, not negligible. Liu and Žalik [6] presented the directional difference chain coding (DDCC), where CDC BFSMs of over 1000 training curves are averaged into a suboptimal hard-coded statistical model (HCSM), which is then used for compression in non-training use cases. Some years later, the compressed DDCC (C_DDCC) [7] was introduced, where three extra codes for usually frequent pairs 45°, 45° and for patterns of 12 to 27 zeros were added into the HCSM. The two pairs were intuitively identified, as they periodically interrupt sequences of 0° symbols along oblique line segments (except those with slopes 45° or -45°). Here we take a step forward by systematically extending the DDCC coding scheme with extra codes for patterns of up to k = O(1) symbols, k  2. Furthermore, we group “similar” training curves into classes and derive HCSMs separately for each class. Although this concept looks straightforward, several non-trivial issues must be considered to achieve a feasible implementation and a competitive compression efficiency. These issues can be structured into two separate phases. 1. The training phase is performed by an expert/developer in order to calibrate the algorithm for future use. A representative repertoire of training curves is first provided, and the BFSM for each curve is extracted. Features for multicriteria classification of training curves are selected (intuitively in the current implementation), and the training curves are then assigned to the classes. HCSMs are derived afterwards by separately averaging BFSMs within the classes. The detailed description follows in Section 3. 2. The exploitation phase is run by end-users in order to compress non-training curves from concrete use cases. An input curve is first analysed to determine the feature values needed to heuristically select the most appropriate of the stored HCSMs. The chosen HCSM is then utilized to compress the curve with Huffman coding. The main challenge in designing this phase is the strategy for determination of the optimal sequence (chain) of codes, which is emphasized after the definitions. Statistics-Based Chain Code Compression with... Informatica 45 (2021) 205–212 207 2.1 Definitions Trail Ti, j = vi, ..., vj, i ≤ j, is a sequence of adjacent pixels (or corresponding graph vertices) along a raster curve. The length of the trail (in pixels) is h(i, j) = j – i + 1. The trail T 1, n = v 1, ..., v n corresponds to the entire raster curve of length n. Trail decomposition splits the trail into one or more nested trails, whose concatenation reassembles the original trail. A trail T u, v is nested in T i, j if i ≤ u ≤ v ≤ j. Symbol is a chain-code command aimed to be assigned to a single pixel along a raster curve. Pattern (of symbols) i, j = i, ..., j, i ≤ j, is a sequence of symbols aimed to be assigned to pixels of a trail of the same length h(i, j). Dynamic programming graph is an edge-weighted graph (G, w), where G = (V, E) is a directed graph, V = {v 1, ..., v n+1} is a vertex set, E = {e i,j} is an edge set, given by pairs of vertices e i,j = (v i, v j), i < j, and w : E → ℕ is a weight function. Vertices v 1, ..., v n correspond to pixels along the raster curve, edge e i,j represents a trail v i, ..., v j–1, and weight w i,j of an edge e i,j is the bit length of the corresponding Huffman code. An auxiliary end vertex vn+1 does not represent any curve pixel and, thus, there is no need to assign a symbol to it. However, this vertex enables introduction of edges e i,n+1, i ≤ n, corresponding to trails T i, n = v i, ..., v n. IN(i) is the set of start vertices of all graph edges with the end vertex v i. v j  IN(i)  e j,i  E. Vertex v j is a predecessor of v j and the latter is a successor of v j. OUT(i) is the set of end vertices of all graph edges with the start vertex v i. v j  OUT(i)  e i,j  E. Extra code is a Huffman code which replaces a pattern of two or more symbols in order to save some bits. p(i, j) is the probability of a pattern  i, j =  i, ...,  j in a considered statistical model. If the latter corresponds to the BFSM of a curve described with T 1, n = v 1, ..., v n, then we get equation (1): p( i, j) = f( i, j) / (n – h(i, j) + 1), (1) where f( i, j) be the number of appearances of  i, j in the pattern assigned to v 1, ..., v n. However, p( i, j) in some HCSM is obtained by averaging the corresponding probabilities from all participating BFSMs. Note that an edge e i,j, i < j – 1, is added into the graph only if an extra code exists for the corresponding pattern assigned to T i, j–1. On the other hand, edges e i, i+1 = (v i, v i+1) correspond to single-pixel trails T i, i = v i and they are unconditionally added to the graph. This assures that the algorithm of parsing the curve pixels will always reach the end vertex v n+1, as any other vertex has at least one successor, i.e. i ≤ n  |OUT(i)|  1. 2.2 Exploitation phase The existing chain code techniques construct the chain of codes by a greedy algorithm. A raster curve is parsed primitive by primitive, and each of them is immediately coded either alone or as a member of some longer pattern. If different possibilities for coding a primitive exist, the predefined priority is decisive. In C_DDCC, for example, extra codes for 45°, 45° pairs have higher priority than the corresponding single-pixel codes. However, such priority-based greedy algorithms cannot be simply adjusted to efficiently handle higher number of extra codes for longer patterns of symbols. In the proposed approach, each pixel can be coded with its own code or, theoretically, with one of k codes of longer trails. For k = 6 as used in the current implementation and tests (the decision for this value will be explained at the end of Section 3.2), these trails include two pairs, three triplets and so on till six sextets. A longer context of patterns before and behind the considered symbol determines which of the 1 + 2 + … + k = k (k + 1) / 2 possibilities (21 for k = 6) shall be used to code the pixel. We therefore have a combinatorial optimization problem where we look for an optimal chain from a large set of multiply overlapped chains. Unlike greedy algorithms, we found dynamic programming capable to provide an optimal choice. Its utilization also facilitates the so-called context dilution problem [1, 7]. Namely, introduction of extra codes for longer patterns of symbols usually extends codes of several symbols and other patterns. For example, introduction of four extra C_DDCC codes for patterns 0°, 45° and 45°, 0 prolongs by 1 bit the codes for 90°, 180°, RLE of zeros, 135° and/or –135°. Furthermore, examples of chains can be found where individual extra codes do not save any bits. The proposed dynamic programming approach is adaptation of the so-called exon chaining algorithm from the field of bioinformatics, the simplest of the so-called similarity-based gene prediction approaches [5]. The dynamic programming optimizes the Bellman equation (2), where s i represents the total bit length of the optimal chain from v 1 to v i–1, 1 < i ≤ n + 1. Additionally, s 0 is set to 0 to enable the recursive calculation of s 1. s i = min v j  IN(i)(s j + w j,i) (2) The vertex pred i  IN(i), which indeed participates to the minimum s i, is also memorized for each v i. The s n+1 represents the total bit length of the overall solution, and the optimal chain itself is then reconstructed by following the vertices pred i from v n+1 backwards to v 1. Bold edges in Fig. 3 represent the optimal chain for the given example. Trails T 1, 2, T 3, 5, and T 6, 9 are coded with 4 + 6 + 8 = 18 bits. Note that an equivalent solution with s n = 18 exists, where the first trail terminates with v 3, as demonstrated with a pair of dashed edges in Fig. 3. Figure 3: Dynamic programming graph. 208 Informatica 45 (2021) 205–212 D. Podgorelec et al. The remarkable performance of the dynamic programming-based optimization is highlighted with Theorems 1 and 2. Although the growth of the number of solution candidates is exponential in curve length, the algorithm runs in linear time. Theorem 1. The number of possible decompositions of a trail grows exponentially with the trail length (in pixels) if extra codes for patterns of up to k > 1 symbols are used. Proof. Let c i(k) be the number of possible decompositions of T 1, i, where the lengths of nested trails obtained by the decomposition do not exceed k. Each of these decompositions ends with the trail T i–l+1, i of l vertices, 1 ≤ l ≤ min(i, k), preceded with one of c i–l(k) possible decompositions of T 1, i–l. Note that for i < k, there are less than k symbols available and, thus, the upper bound for the length of the ending trail is min(i, k). The ending trail T i–l+1, i can span through the entire T 1, i (when i = l), resulting in an empty preceding T 1, 0. Unlike the definition of trail in Section 2.1, we exceptionally allow i > j here. This situation is indicated by c 0(k) = 1. Equation (3) defines the calculation of c i(k), i > 0. (3) Obviously, c 1(k) = c 0(k) = 1 as the first pixel of T 1, i can be preceded by an empty trail only in a single way. Similarly, c i(1) = 1 since T 1, i can be decomposed into single-pixel trails only in a single way. For k = 2, i > 1, equation (4) is obtained. c i(2) = c i–1(2) + c i–2(2), i > 1 (4) Let F i represent the i-th Fibonacci number. The Fibonacci sequence is defined by F 0 = 0, F 1 = 1, and F i = F i–2 + F i–1 for i > 1. This recursive formula gives F 2 = 1 and we may thus match c 0(k) = F 1 and c 1(k) = F 2. The equation (4) then gives: c 2(2) = F 2 + F 1 = F 3, c 3(2) = F 3 + F 2 = F 4, and c i(2) = F i + F i–1 = F i+1. As F i+1 > F i, i > 1, we thus get the inequality (5). c i(2) > F i, i > 1 (5) This result can be generalized to k > 2 by using the relation (6), which must be proved beforehand. c i(k + m) > c i(k), m  1, i > 1 (6) The proof is actually trivial. Due to the transitivity of “Is greater than”, it suffices to consider m = 1. The key observation is that all the decompositions counted by c i(k) are also counted by c i(k + 1) which, however, additionally counts the decompositions with at least one nested trail of length k + 1. The relation (5) may thus be generalized to the relation (7). c i(k) > F i, k > 1  i > 1 (7) As the Fibonacci sequence F i has the proven exponential growth, we may confirm that the sequence c i(k) also grows (at least) exponentially for k > 1. Theorem 1 is thus proved. □ Note that we assumed in the theorem, that all the trails of up to k pixels are represented by edges of the dynamic programming graph, but this is usually not a case due to the statistical model reduction (Section 3.2). The proved exponential growth therefore represents only the theoretical worst case. However, as BFSMs and particularly HCSMs typically contain the majority of the patterns of length 2 (k = 2 suffices for the exponential growth) and also quite a few longer patterns, the expected growth may also be considered exponential. An interesting finding is that the recursion in equation (3) can be solved easily for k  i. Namely, substitution c i–2(k) + … + c 0(k) = c i–1(k) transforms c i(k) = c i–1(k) + … + c 0(k) into c i(k) = 2 c i–1(k). We may then recursively progress with such substitutions, i.e. 2 c i–1(k) = 2 2 c i–2(k) = … = 2 i–1 c 1(k), towards the equation (8). c i(k) = 2 i–1 , k  i > 0 (8) The result in equation (8) is expected, as any decomposition is obtained by breaking apart the trail in some interruption points between successive pixels. Since k  i, there are no limitations in the length of nested trails obtained by the decomposition. All the combinations from i single-pixel trails to a single trail spanning through entire T 1, i are valid. There are i – 1 interruption points in a trail with i symbols and thus 2 i–1 possible decompositions. Theorem 2. Optimal chain detection, based on the dynamic programming and utilization of extra codes for patterns of up to k = O(1) symbols, runs in (n) time, where n is the curve length in pixels. Proof. The cardinality |IN(i)|, 1 < i ≤ n, cannot exceed k, as each v i may only represent the end of a trail (edge) of length between 1 and k. The upper bound for time complexity of calculating s i, 1 < i ≤ n, is thus O(k n) = O(n) time if k = O(1). The lower bound however is achieved if the statistical model contains only single- pixel symbols. But even in this case, the linear time is needed to parse the dynamic programming graph. The (n) time complexity is thus proved. □ 3 Training phase Linear performance proved in Theorem 2 is not the only reason for limiting the length of patterns with attached extra codes to k = O(1) symbols. This also reduces the size of the statistical model, which has a mitigating effect on the context dilution problem. In the proposed study, k = 6 have been chosen among different considered values. The reasons for this decision shall be explained in Section 3.2. Even in this way, the statistical model derived from the basic DDCC scheme can theoretically contain 8 + 8 2 + ... + 8 6 = 299,592 entries. Although many of these patterns never appear in practice, and even Statistics-Based Chain Code Compression with... Informatica 45 (2021) 205–212 209 if we manage to further reduce the size of the statistical model (to some tens entries in practice), there is the only practical possibility to use an averaged statistical model (or more of them). Its derivation requires a careful consideration of the following important issues. 3.1 Training set In the reported C_DDCC tests [7], relative compression ratio to F8 only slightly varies (between 0.46 and 0.55). This may lead to a conclusion that the derived HCSM serves well for all use cases. Furthermore, similar conclusions can be adopted for practically all existing methods, no matter whether they belong to statistics- based or non-statistical approaches, and whether, in the first case, they use a HCSM or BFSMs. However, we must be aware that the training sets and testing use cases in presentations of these methods usually follow some curve creation and rasterization methodology and, thus, they share some evident common artefacts. In C_DDCC tests, for example, there were a huge probability of shorter sequences of 0 symbols, relatively high probabilities of 45°, 45° pairs, and rather low probabilities of 90 symbols. In our method, we may expect even bigger impact of the curve’s shape on the compression efficiency, as the distributions of longer patterns from a bigger repertoire can vary considerably from curve to curve. An averaged statistical model can thus deviate significantly from both, the BFSMs of individual training curves used to construct it in the training phase, and the distributions of patterns used to code testing curves in the exploitation phase. Since the latter directly affects the compression efficiency, we decided to use multiple averaged statistical models and, consequently, to classify the training curves and testing use cases regarding some chosen measurable artefacts. In this manner, the method gains generality, as the compression efficiency becomes less dependent on the curve creation and rasterization methodology. Figure 4: Different levels of forcing the 4-connectivity. To provide an adequate training set and a relevant mixture of testing use cases, we have implemented a tool with functionalities of image rotation and scaling, manual inversions of binary values of selected pixels, and extraction of the boundary chain of a presented binary object. In this last operation, the parameter Force- 4-connectivity controls the amount of 90° symbols along oblique edges and, thus, simulates different rasterization methodologies. Value 0% (Fig. 4a) means that the boundary chain consists only of pixels which share edges with the object’s exterior. On the other hand, value 100% (Fig. 4b) adds into the chain all the pixels which are vertex-connected with the object’s exterior. Such pixel is 4-connected with both adjacent chain pixels. In Fig. 4c, half of possible pixels of this kind (coloured grey) are randomly chosen and inserted into the chain. Finally, a special scenario is supported (Fig. 4d) where a 4-connected pixel is only inserted if it represents a concave vertex between a horizontal and vertical edge (each at least two pixels long). Figure 5: Examples of training and testing objects. Basic shapes from the training set and use cases are shown in Fig. 5. They were mostly inherited from the tests made in [7, 11, 12]. Objects from the first two rows were used for testing (see Table 1), while the others belong to the training set. All together we used 50 basic shapes, i.e., 30 in the training set and 20 test cases. A variety of instances of these shapes in different orientations and scales, ranging between 150 and 20,000 boundary curve pixels, and with different levels of forcing the 4-connectivity were utilized in the experiments. There are 500 shapes in the training set. 3.2 Statistical model reduction The first step towards reducing huge amount of data in each BFSM and mitigation of the context dilution effect was already made by limiting k to O(1) symbols. We also do not have to consider patterns with probability 0. Furthermore, we may set even stronger conditions for the probability p( i, j) of a pattern to be accepted in a BFSM. Namely, a pattern  i, j =  i, ...,  j is inserted in the statistical model only if p( i, j) is higher than the product of probabilities (weighted with w 2) of any sequence of shorter patterns whose concatenation forms  i, j. To prevent insertion of too low probabilities, we use additional threshold w 1. The following statement considers patterns of length l = 3. 210 Informatica 45 (2021) 205–212 D. Podgorelec et al. if (p( 1, 3) > max(w 1, w 2 * max(p( 1, 1)p( 2, 2)p( 3, 3), p( 1, 1)p( 2, 3) , p( 1, 2)p( 3, 3)))) then insert (( 1, 3, w 3 * 3 * p( 1, 3)) into BFSM. As the patterns of lengths 2, 4, 5 and 6 must also be considered, as well as eventual future extensions, we generate all the concatenations algorithmically. A concatenation is obtained by breaking apart the pattern in some interruption points between successive symbols. There are l – 1 possible interruption points in a pattern of l symbols and thus 2 l–1 – 1 possible concatenations. Here the subtracted 1 represents the non-interrupted pattern. For patterns of lengths 2 to 6, we thus must test 1 + 3 + 7 + 15 + 31 = 57 products. Obviously, the method must first evaluate shorter patterns, as their probabilities are used in acceptance criteria for longer ones. As we mentioned at the end of Section 2.1, single-pixel symbols are unconditionally included in a BFSM. Note that the weights w 1, w 2, and w 3 offer a lot of possibilities for experimentation. They were also crucial for decision to use patterns of up to k = 6 symbols in our tests. As the probabilities are usually decreasing with the pattern length (with possible exceptions), the value of w 1 must be decreased if k = 7 is used instead of k = 6. However, this causes that additional shorter patterns of lengths 6, 5, 4 etc. are also accepted into a BFSM, increasing the size of the BFSM and emphasizing the context dilution effect in a negative way. On the other hand, this problem appears less evident when comparing k = 5 and k = 6. Although we have not performed a complete sensitivity analysis yet, the decision for k = 6 seems a reasonably good choice confirmed by the results in Section 4. 3.3 Statistical vs. pseudo-statistical model We do not wish (and neither we are able) to split probabilities of symbols and patterns among some longer patterns, as this would lead to the priority-based greedy approach, which we intentionally try to avoid. Each symbol consequently participates to probabilities of all the patterns, which include it. Strictly speaking, we use weighted probabilities (multiplied with w 3 * l) to reward longer patterns by assigning shorter codes to them. The sum of such weighted probabilities in a model may be as high as (1 + 2 + 3 + 4 + 5 + 6) * w 3 = 21w 3. It is however lower because the patterns are added selectively, but it still exceeds 1. We apparently do not deal with true statistical models but with pseudo-statistical models instead. We shall use the acronyms BFPSM and HCPSM instead of BFSM and HCSM from this point on. Nevertheless, all weighted “pseudo-probabilities” are involved in a single Huffman tree construction. 3.4 Averaging pseudo-statistical models Averaging is a two-stage process. During the extraction and reduction of the BFPSM of a considered training curve, several simply assessed curve artefacts are computed. These are then utilized for multicriteria classification, which assigns the curve into one of the pre-defined classes. From all the assessed features that will be used in future to algorithmically select optimal classification criteria, we currently use three intuitively chosen criteria listed below, each with a single threshold. • Average turn per pixel. Each 45° symbol participates 1 to this value, 90° symbols 2, 135° symbols 3, and 180° symbols 4. The sum is then divided with the curve length in pixels. This feature separates smooth curves from more winding and noisy ones. It is negatively correlated with the probabilities of 0° symbols and their longer runs. • Probability of 45°, 45° pairs is higher in curves with oblique segments than in those with mostly axis-aligned and/or ideally diagonal segments. • Probability of 90° symbols is usually higher in images of man-made objects than in natural objects. Three single-threshold criteria result in 8 classes with binary indices from 000 to 111, where the first bit represents the first criterion, and the third bit refers to the last criterion. 0’s signify values below the thresholds, and 1’s those above the thresholds. In the current setting, the thresholds were computed by averaging the described quantities over the BFPSMs of all training curves. It turns out that the classes with indices 010 (2) and 101 (2) are nearly twice more populated than others. In our training set with 500 shapes, there are 112 shapes in the class 101 (2) and 99 shapes in the class 010 (2), while the remaining six classes contain between 37 and 55 shapes. The testing use cases are also distributed in a similar way. This deviation can be explained by suboptimal training set, suboptimal thresholds selection and suboptimal classification criteria, which are all among the most important challenges for our future work. However, we may immediately establish that the currently used criteria are all correlated with the Force-4- connectivity value. Firstly, all additional 4-connected pixels are coded with 90° symbols and thus increase the third criterion value. Secondly, such pixels are often inserted in the middle of 45°, 45° pairs, changing them into 90°, 90°, 0° triplets. Finally, a pair 45°, 45° participates 2 to the first criterion (1 per pixel), while a 90°, 90°, 0° triplet participates 4 (1.33 per pixel). The first and the last criterion are thus positively correlated, and there is a negative correlation between them and the second one. The indices 010 (2) and 101 (2) of above-average populated classes also confirm this finding, as the second bit is in both cases the inverse of the other two. In the second stage, after the training curves are classified (into 8 classes in the current setting), HCSMs are derived by separately averaging BFSMs within each class. However, the BFSMs in a particular class may still significantly differ from each other, although expectedly (and confirmed by the testing results) not as much as the BFSMs from different classes. Consequently, the HCSMs must also be reduced by using the same acceptance criteria as in the BFPSM reduction (Section 3.2). Statistics-Based Chain Code Compression with... Informatica 45 (2021) 205–212 211 As we are aware, that the current classification is not optimal, we try to mitigate impacts of wrongly classified training curves by using soft borders between the classes. This means that averaging in an observed class also considers weighted probabilities from BFPSMs of all "adjacent" classes, distinct in one criterion from the considered one. For example, classes 001, 010, 100 are adjacent to the class 000, while, e.g., 011 is not. In the tests presented in Section 4, the probabilities are weighted in a manner that BFPSMs from an observed class contribute two thirds to the corresponding HCPSM, and those from the three adjacent classes contribute a third (a ninth each). 4 Results In this section, we compare some typical results of the proposed method and some state-of-the-art (SOTA) chain code compression methods. 3OT, VCC, C_DDCC, and three variants of MTFT+ARLE (Move-To-Front Transform + Adaptive Run-Length Encoding) [11], i.e., MTFT+ARLE VCC, MTFT+ARLE 3OT, and MTFT+ARLE NAD (four-symbol Normalised Angle- Difference chain code) [11] were used in the tests. The training set and use cases from Section 3.1 were used, and the weights w 1, w 2 and w 3 for the pseudo- statistical models reduction (see Section 3.2) were set to 0.02, 1.0 and 1.0, respectively. As we already stressed and explained, the length of patterns to be considered is limited to k = 6. The classification thresholds (Section 3.4) computed for the utilized training set were initialized to 0.92 for the average turn, 0.12 for p(45°, 45°), and 0.295 for p(90°). Object Transform Pixels bpp (SOTA) bpp (new method) Basic (“user friendly”) shapes Bird 100, 0, 0 4080 1.11 (1) 1.03 Butterfly 100, 0, 0 1122 1.45 (1) 1.33 Car 100, 0, 0 541 1.48 (1) 1.25 Circle 100, 0, 0 1831 1.13 (2) 0.99 Horse 100, 0, 0 2143 1.51 (3) 1.39 Shuttle 100, 0, 0 969 1.19 (1) 1.08 Spider 100, 0, 0 1770 1.20 (2) 1.04 Square 100, 0, 0 1088 0.30 (2) 0.35 Sophisticated instances Bird 10, 50, 70 671 1.60 (3) 1.31 Butterfly 140, 45, 100 2681 1.68 (3) 1.21 Car 200, 33, 50 1472 1.84 (3) 1.49 Circle 20, 0, 0 308 1.39 (2) 1.06 Horse 50, 15, 20 1284 1.93 (1) 1.51 Shuttle 100, 30, 0 980 1.31 (1) 0.94 Spider 120, 45, 25 2218 1.31 (1) 1.08 Square 100, 70, 30 1228 0.75 (3) 0.62 Table 1: Test cases and compression results [bpp]. The listed best SOTA results were obtained by C_DDCC (1) , MTFT+ARLE NAD (2) , or MTFT+ARLE VCC (3) . Table 1 shows the results for pairs of different instances of eight objects from the top two rows in Fig 5. Basic "user friendly" shapes refer to smooth, noiseless instances as being usually employed in testing the state- of-the-art (SOTA) chain code compression methods. The “sophisticated” instances were generated by transforming the basic ones with the scaling factor, rotation angle, and/or amount of additional 4-connectivity pixels different from 100%, 0°, 0%, respectively (see column Transform). The column bpp (SOTA) shows efficiency in bits per pixel (bpp) of the best of the compared SOTA methods. Comparison of the last two columns reveals that the new method is superior in most cases. The only exception is the basic axis-aligned square where all three MTFT-ARLE variants and also C_DDCC substantially benefit from long runs of 0’s. Ratios between the efficiencies of the new and best SOTA method are given in columns A and B of Table 2, separately for the basic and transformed instances. The new algorithm is mostly for 10 to 15% more efficient than SOTA in the basic configurations, and for additional 10% in the sophisticated cases. Columns C and D show ratios between the efficiencies for sophisticated and adequate basic configurations. SOTA is considered in column C, and the new method in column D. The results confirm that sophisticated curve artefacts much more affect SOTA methods (average ratio 1.22 means lower efficiency for 22%, compared to basic shapes) than the new method (average ratio 1.06). The latter even achieves better compression of some transformed shapes (butterfly and shuttle) in comparison to the basic ones. It also surpasses SOTA in the transformed square example, where all the considered methods achieve significantly worse results (omitted in the above average ratios) than in the axis-aligned instance. Object A B C D Bird 0.93 0.82 1.44 1.27 Butterfly 0.92 0.72 1.16 0.91 Car 0.84 0.81 1.24 1.19 Circle 0.88 0.76 1.23 1.07 Horse 0.92 0.78 1.28 1.08 Shuttle 0.91 0.72 1.10 0.87 Spider 0.87 0.82 1.09 1.04 Square 1.17 0.83 2.50 1.77 Table 2: Analysis of the compression results. 5 Conclusions In this paper, we introduce a new statistics-based chain code compression methodology by using multiple averaged pseudo-statistical models correlated with some measurable curve artefacts, and by heuristically selecting the most appropriate of these models prior to the compression. Furthermore, the introduced models contain extra codes for systematically selected patterns of up to k symbols (k = 6 in the presented tests), and the dynamic programming approach replaces the common greedy method in order to determine the optimal chain of patterns. The early results are promising, but there is a 212 Informatica 45 (2021) 205–212 D. Podgorelec et al. plenty of work left to ultimately affirm the proposed methodology. The methodology incorporates the training phase and the exploitation phase. The former obviously associates this research with machine learning, but classification of the training curves with respect to three intuitively pre- selected and even mutually correlated criteria is quite far from this paradigm. However, one of our future goals is to adapt the introduced methodology to other basic chain code representations (VCC, 3OT, F4, F8, and NAD), which shall certainly require more advanced and adjustable feature extraction, learning and selection, leading into optimized classification algorithms. This goal also requires an extensive sensitivity analysis by varying the number and values of classification thresholds, weights in the pattern acceptance criteria, etc. Other future goals include: • comparison to modern non-statistical methods on both, "standard" and less "user-friendly" cases, • improving the training set and preparation of rich repertoire of benchmarks, • utilization of arithmetic coding instead of Huffman codes, and • inclusion of RLE codes for longer patterns of 0’s. 6 Acknowledgements The authors acknowledge the financial support from the Slovenian Research Agency (Research Core Funding No. P2-0041). We are also grateful to our former colleague Denis Špelič for his contribution to the implementation. 7 References [1] Akimov A.; Kolesnikov A.; Fränti P. (2007). Lossless compression of map contours by context tree modeling of chain codes, Pattern Recognition, Elsevier Science, vol. 40, iss. 3, pp. 944-952. https://doi.org/10.1007/11499145_33 [2] Bribiesca E. (1999). A new chain code. Pattern Recognition, Elsevier Sci., vol. 32, iss. 2, pp. 235- 251. https://doi.org/10.1016/s0031-3203(98)00132-0 [3] Freeman H. (1961). On the encoding of arbitrary geometric configurations. IRE Transactions on Electronic Computers, IEEE, vol. EC10, iss. 2, pp. 260-268. https://doi.org/10.1109/tec.1961.5219197 [4] Freeman H. (1974). Computer processing of line drawing images. ACM Computing Surveys, ACM, vol. 6, iss. 1, pp. 57-97. https://doi.org/10.1145/356625.356627 [5] Jones N. C.; Pevzner P. A. (2004). An Introduction to Bioinformatics Algorithms. The MIT Press. [6] Liu Y. K.; Žalik B. (2005). An efficient chain code with Huffman coding. Pattern Recognition, Elsevier Science, vol. 38, iss. 4, pp. 553-557. https://doi.org/10.1016/j.patcog.2004.08.017 [7] Liu Y. K.; Žalik B.; Wang P.-J.; Podgorelec D. (2012). Directional difference chain codes with quasi-lossless compression and run-length encoding. Signal Processing: Image Commun., Elsevier Science, vol. 27, iss. 9, pp. 973-984. https://doi.org/10.1016/j.image.2012.07.008 [8] Liu Y. K.; Wei W.; Wang P.-J.; Žalik B. (2007). Compressed vertex chain codes. Pattern Recogn., Elsevier Science, vol. 40, iss. 11, pp. 2908-2913. https://doi.org/10.1016/j.patcog.2007.03.001 [9] Nunes P.; Marqués F.; Pereira F.; Gasull A. (2000). A contour based approach to binary shape coding using multiple grid chain code. Signal Processing: Image Communication, Elsevier Science, vol. 15, iss. 7-8, pp. 585-599. https://doi.org/10.1016/s0923-5965(99)00041-7 [10] Sánchez-Cruz H.; Bribiesca E.; Rodríguez-Dagnino R. M. (2007). Efficiency of chain codes to represent binary objects. Pattern Recognition, Elsevier Science, vol. 40, iss. 6, pp. 1660-1674. https://doi.org/10.1016/j.patcog.2006.10.013 [11] Žalik B.; Lukač N. (2014). Chain code lossless compression using Move-To-Front transform and adaptive Run-Length Encoding. Signal Processing: Image Commun., Elsevier Science, vol. 29, iss.1, pp. 96-106. https://doi.org/10.1016/j.image.2013.09.002 [12] Žalik B.; Mongus D.; Lukač N.; Rizman Žalik K. (2018). Efficient chain code compression with interpolative coding. Information Sciences, Elsevier Science, vol. 439-440, pp. 39-49. https://doi.org/10.1016/j.ins.2018.01.045