IMFM Institute of Mathematics, Physics and Mechanics JADRANSKA 19, 1000 LJUBLJANA, SLOVENIA Preprint series Vol. 48 (2010), 1119 ISSN 2232-2094 ASYMPTOTIC NUMBER OF ISOMETRIC GENERALIZED FIBONACCI CUBES Sandi Klavzar Sergey Shpectorov Ljubljana, May 21, 2010 Asymptotic number of isometric generalized Fibonacci cubes Sandi Klavzar Faculty of Mathematics and Physics University of Ljubljana, Slovenia and Faculty of Natural Sciences and Mathematics University of Maribor, Slovenia e-mail: sandi.klavzar@fmf.uni-lj.si Sergey Shpectorov School of Mathematics University of Birmingham, United Kingdom e-mail: s.shpectorov@bham.ac.uk c o i CO ^ CO CO CD CO Abstract For a binary word f, let Qd(f) be the subgraph of the d-dimensional cube Qd induced on the set of all words that do not contain f as a factor. Let Qn be the set of words f of length n that are good in the sense that Qd(f) is isometric in Qd for all d. It is proved that limn^TO \Gn\/2n exists. Estimates show that the limit is close to 0.08, that is, about eight percent of all words are good. Key words: hypercube; generalized Fibonacci cube; isometric embedding; combinatorics on words AMS subject classifications: 05A16, 05C12, 68R15 1 Introduction The n-cube Qn, also known as the n-dimensional hypercube, encompasses the binary strings of length n. Hence it is not surprising that the n-cubes form one of the most studied classes of graphs; that they are one of the fundamental models used in CD computer science, in particular for designing interconnection networks; and that they are applicable elsewhere. The edges of Qn intrinsically capture the (binary) Hamming distance between the binary strings. Consequently, hypercubes form one of the central classes in metric graph theory. However, a serious limitation of hypercubes for practical (and theoretical) purposes is that they are very rare—only one such graph exists to each power of 2. Hence different modifications of hypercubes and different subgraphs of a CD r1 CSI hypercubes were proposed such that the important properties (for whatever purpose) of hypercubes are preserved. The most important such class of graphs introduced by Graham and Pollack [5] back in 1971 is the class of partial cubes, the graphs defined as isometric subgraph of hypercubes. These graphs have metric properties analogous to those of hypercubes and can thus be equally well used for routing and similar tasks. Fibonacci cubes, introduced by Hsu [8] in 1993, are defined as graphs obtained from n-cubes by removing all vertices that contain two consecutive ones. In this way significantly smaller graphs than hypercubes are obtained but they still have nice "interconnection" properties, cf. [8]. Moreover, Fibonacci cubes are partial cubes, see [10], and have been studied from numerous points of view [6, 8, 11, 14, 17, 18]. Instead of removing all vertices that contain the factor 11, one may remove all vertices that contain an arbitrary fixed factor f. We say that the word f is good if the subgraph Qd(f) obtained from Qd by removing the vertices with factor f is isometric in Qd for all d. Note that if Qd(f) is not isometric then the same holds for all d' > d, hence for a bad word f, the subgraph Qd(f) can be isometric in Qd for only a finite number of dimensions d. Denoting with Gn the set of good words of length n and with Bn the set of bad words of length n, we are interested in the asymptotic behavior of \Qn\/2n and |Bn|/2n. In other words, if the words are long, is the number of good/bad words negligible comparing to the number of all words? We prove that neither of these cases occur by demonstrating that there are considerable number of both good and bad words. In fact, we show that, as the length n goes to infinity, the proportion of good words has a limit strictly between 0 and 1 (and also provide more precise estimates for the limit). Thus, the generalized Fibonacci cubes Qd(f) for good words f constitute a large new explicit family of partial cubes, see [1, 2, 3, 4, 15] for a sample of other classes of partial cubes. In order to estimate the limit proportion of good/bad words we will study r-error overlaps of words and demonstrate that, in particular, 2-error overlaps play a crucial CO role. Note that a similar concept has been studied before. The words that admit CO 0-error overlap are known in the literature as the non bifix-free words aka bordered words, see [7, 13]. The numbers of such words of length n are gives as sequence A094536 in [16] while A003000 of [16] gives the numbers of bifix-free words. The rest of paper is organized as follows. In the next section concepts needed in this paper and some preliminary observations are given. In Section 3 we introduce words with r-error overlap. We focus on the words called stutters which are defined as words that have an r-error overlap, r < 2, of length at least half the length of the word. We show that the proportion of the stutters among all words becomes negligible for large n. Then, in Section 4, we prove that the density of the set of all words of length n having a 2-error overlap converges to a limit value a. In the cd following section we prove that every bad word has a 2-error overlap and from this fact we deduce that the density of bad words converges to a as well. In Section 6 o i CSI 00 CSI CSI u a CD u we then prove that a lies between 0.919975 and 0.924156. 2 Preliminaries Let B = {0,1} and call elements of B bits. An element of Bd is called a binary word (or simply a word) of length d. A word f is a factor of a word x if f appears as the sequence of \f \ consecutive bits of x. We will use the product notation for words meaning concatenation, for example, 1s means 11... 1, the word of length s. Recall that all words of length d > 1 are vertices of the d-dimensional cube Qd, where two words are adjacent whenever they differ in exactly one position. For a word f of length n, let Qd(f) be the subgraph of Qd induced on the set of all words of length d that do not contain a factor f, see [9]. The graph Qd(f) is called the generalized Fibonacci cube defined by the forbidden word f. In this notation, Qd(11), d > 1, denotes the Fibonacci cubes. (We note that the name "generalized Fibonacci cubes" was earlier used for the restricted family Qd(1s), s > 1, see [12].) We will consider the usual shortest path distance and write dG(u,v) for the distance in a graph G between u and v. Recall that a subgraph of a graph is called isometric if the distance between any two vertices of the subgraph is independent of whether it is computed in the subgraph or in the entire graph G. We will write H h G to denote that H is an isometric subgraph of G. Call a word f bad if there exists a dimension d such that Qd(f) is not isometric in Qd (notationally, Qd(f) h Qd). For instance, it is proved in [9] that (10)s1 (s > 1), and 1r0s (r, s > 2), are bad words. o Lemma 2.1 Suppose that Qd(f) h Qd for some dimension d. Then Qd'(f) h Qd' for all d' > d. CO Proof. Let d' = d + r, r > 1. Since Qd(f) h Qd, there exist vertices u,v of Qd(f) such that dqdff)(u, v) > dQd(u, v). If the first bit of f is 1, set u = 0ru and v = 0rv, otherwise set u = 1ru and v = 1rv. Then u, v e Qd(f) and dQd'(f)(u,u) = dQd(f )(u,v) > dQd(u,v) = dQd'(u,u), hh hence Qd' (f) h Qd'. ■ So for every bad f there exists the smallest dimension d = d(f) for which non-isometricity holds. If we are not interested in the exact value of d(f), then we can always talk about the sufficiently large dimension d. The word f is called good if it is not bad. That is, f is good if Qd(f) h Qd for all dimensions d. For instance, 1s, 10s, and (10)s, are good words for any s > 1, see [9], hence in particular the Fibonacci cube Qd(11) isometrically embeds into Qd. Set finally Qn = {f e Bn \ f is good} and Bn = {f e Bn \ f is bad} . 3 3 r-Error overlaps and stutters For a word f of some length n, let fs,k be the factor of f of length k starting from position s + 1. Here, clearly, k < n and s £ {0,1,... ,n — k}. For example, bk(f) := fo,k is the beginning of f of length k. Similarly, ek(f) := fn-k,k is the end part of f of the same length k. Suppose that bk(f) and ek(f) agree in all but r positions. Then we say that f has an r-error overlap of length k. If f has an r-error overlap for some length k then we simply say that f has an r-error overlap. We say that f is a .stutter if f has an r-error overlap of length k, where r < 2 and k > n. Let Sn be the set of all stutters of length n. Lemma 3.1 The proportion of stutters among all words of length n tends to zero as n goes to infinity. That is, lim — =0. U a cd U n Proof. Suppose k > n and let s(k) be the number of stutters f in Sn that have an r-error overlap of length exactly k for some r < 2. It is easy to see that such a word f is fully specified by the r locations of the "errors" within bk(f) and by the bits in the last n — k positions of f. The number of choices of r < 2 positions within the k positions in bk(f) is 1 + k + k(kk2-) = k2+±2. Hence s(k) = (k2 + k + 2)2n-k-1. We can now estimate the number of stutters from above as (N n n |Sn| < Yh s(k) = Y (k2 + k + 2)2n-k-1 < n(2n2)2n/2-1 = n32n/2. k=nn k=\n} C^ 3 n/2 Therefore, the density of stutters tends to zero, since ^ni < 2 = n32-n/2. ■ £ CO 4 2-Error overlaps hh In this section we study the density of the set Tn of all words of length n having a 2-error overlap. We say that a word f £ Tn is split if f has a 2-error overlap of length k < n. In this case, the beginning and the end part of f, that realize the 2-error overlap, are disjoint. Let Tn be the set of all split words from Tn. Lemma 4.1 If f £Tn — Tn then f is a stutter. Proof. Follows directly from the definitions. The next result allows us to conclude that the set of split words with a 2-error overlap has a limit density as the length n goes to infinity. Proposition 4.2 ^e have IT+I > 2ITSI. Proof. Consider the mapping 0 : Bn+1 h Bn defined by erasing the bit in position LnJ + 1 = \n+r1 !• Clearly, if f e Bn+1 and 0(f) has a 2-error overlap of some length k < n then f also has a 2-error overlap of the same length k. In particular, 0 1(Tn) Q Since every fiber of this mapping has size two, the claim follows. ■ \TS I Corollary 4.3 The sequence an = is monotonically increasing and bounded from above by 1. In particular, it has a limit a < 1. co cd U a CD U In fact, this number a is also the limit density of the words with a 2-error overlap. Corollary 4.4 The sequence /3n = converges to the same limit value a. Proof. Let jn = . According to Lemma 3.1, the sequence jn tends to zero. Hence both an and an + Yn converge to the same limit, a. On the other hand, clearly, an < 3n, since TnS Q Tn, and also 3n < an + Yn by Lemma 4.1. So the claim follows. ■ 5 Density of bad words In this section we establish that the density of the set of bad words also tends to the same limit value a. For this sake the following result is crucial. 00 Theorem 5.1 If f is bad then f has a 2-error overlap. Proof. Suppose f is bad and choose d sufficiently large, so that Qd(f) h Qd. Let w and w' be vertices of Qd(f) such that dgd(f)(w,w') > dQd(w,w'). We will assume that m = dQd(w, w') is as small as possible. Clearly, m > 2. Let i1 1. To prove these facts much more work was needed in [9]. According to Theorem 5.1, Bn C Tn. We next show that in fact Bn nearly coincides with n. o o CSI i j-h CD CO u a CD U Proposition 5.2 If f has a 2-error overlap and it is good then f is a stutter. Proof. Let n be the length of f and suppose it has a 2-error overlap of length k. Then bk(f) and ek(f) disagree in some two positions i and j (out of k). Set d = 2n — k and let w,w' £ Bd be defined as follows. Let w be bn_k(f) followed by f(i) and, similarly, let w' be the same bn_k(f) followed by f (j). It is easy to see that w and w' disagree in positions n — k + i and n — k + j. So they are at distance 00 two in Qd. Notice, furthermore, that the two common neighbours of w and w' in Qd are the words w(n — k + i) = w'(n — k + j) and w(n — k + j) = w'(n — k + i). Both these words contain f; indeed, w(n — k + i) contains f as its end part (suffix of length n), while w(n — k + j) contains f as its beginning (prefix of length n). In particular, the common neighbours of w and w' are not contained in Qd(f). Since f is good, Qd(f) ^ Qd, which means that either w £ Qd(f), or w' £ Qd(f). We can now show that f is a stutter. In view of the symmetry between i and j (and hence between w and w') we may assume that w £ Qd(f), that is, w contains a factor f. This factor overlaps with either bn(w) or with en(w) (or with both) in s > n positions. As both these words differ from f in one position, we conclude that f has an r-error overlap of length s with r < 1 (in fact, one can see that r = 1). As s > n, we have that f is a stutter, as claimed. ■ m As a corollary of this we obtain that the limit density of bad words exists and it is equal to the same a, as in the preceding section. Let 5n = . Corollary 5.3 The density 5n converges to a, that is, lim 5n = a. n o £ CO CO Proof. By Theorem 5.1, Sn < ffn. On the other hand, by Proposition 5.2, Sn > ffn — Yn. Since Yn tends to zero, it follows that the sequence 5n has the same limit as ffn, namely, a. ■ Before we leave this section, we remark that the proof of Proposition 5.2 has a further consequence. Suppose f is a bad word, but it is not a stutter. By Theorem 5.1, f has a 2-error overlap of some length k. Set d = 2n — k and select vertices w and w' as in the proof of Proposition 5.2. Since f is not a stutter, the final argument in the proof means that both w and w' are contained in Qd(f), while, again as in the proof, the two common neighbours of w and w' are not. That is, we have 2 = dQd(w,w') < dqdff)(w,w'). This means that badness of nearly every bad word f can be established in a cube Qd, with d < 2n, with a pair of vertices at distance two from each other. This confirms observations from [9] where good and bad words of small length were considered. Namely, in most of the cases for which non-isometricity was established, the distance condition was shown to fail for distance 2. However, there exist bad words f for which distance 2 is preserved in Qd(f) but still Qd(f) ^ Qd(f). 6 Estimates for the limit value a We are now finding some estimates for the limit density a. Estimates from below are easy: we can use that the sequence an (densities of split words with a 2-error overlap) is monotonically increasing. Furthermore it is easy to see that a2k+1 = a2k so we only need to consider even n. Manifestly, a4 = 0.25. With the use of a computer a few further values were found, see Table 1. In particular, a3o = 0.919975... and hence we can say that a > 0.919975. So over 90% of all long words are bad. In fact, the data in Table 1 seems to suggest that the value of a is close to 0.92. The starting values of the sequence f n (involving words of length n having a 2-error overlap) are shown in Table 2. By Corollary 4.4 this sequence also converges to a. Note, however, that it is not monotone. To get un upper bound for a we will use similar ideas as above. Again we can only focus on the words of even length n = 2k. Let Tn be the number of nonsplit words of length n. If w is such a word then inserting two new bits in the middle produces a word of length n + 2 which is either again nonsplit or it has a 2-error overlap of length exactly k + 1. The number of words of the latter sort is 2k+1( k+1) because we can choose k + 1 bits arbitrarily and then the second half must be the same as the first half but with two positions changed. Therefore we can write Jh 4Tn < Tn+2 + 2k+1 (k + 1) Switching to the densities Vn = Tn/2n (notice that Vn = 1 — an) we get ^ + k(k + 1) v" < Vn+2 + 2 . 2k+1 cd £ 0 o 1 CO ^ CO CO co cd u CD co a CD n WI an = \rs\l2n 4 4 0.250000 6 34 0.531250 8 182 0.710938 10 830 0.810547 12 3518 0.858887 14 14538 0.887329 16 59074 0.901398 18 238534 0.909935 20 958714 0.914301 22 3845886 0.916931 24 15408114 0.918395 26 61689006 0.919238 28 246881258 0.919704 30 987815218 0.919975 Table 1: Some values of an which implies that Since an = 1 — in we get ln+2 > In — k(k +1) 2 • 2fc+1 k(k + 1) an+2 < an + 2 . 2fc+i . Combining these relations from n to n + 2m we get k+rn-1 Ei(i + 1) 2 . 2i+1 . i=k Now we take m to infinity and obtain the following estimate for the limit a: " i(i + 1) a < an + - n ' 7 '2 • 2i+i i=k Using computer with k = 15 we get that the sum here is at most 0.004181. Together with the value a30 = 0.919975 from Table 6 this yields an upper limit of 0.924156. Hence we have the following result: Theorem 6.1 The value of the limit a is between 0.919975 and 0.924156. As a consequence we see that for large n the number of good words is approximately 8% of all words of that length. £ 0 o 1 CO ^ CO CO n \Tn\ 3'n — \Tn\/2 4 8 0.500000 5 22 0.687500 6 46 0.718750 7 98 0.765625 8 210 0.820313 9 430 0.839844 10 886 0.865234 11 1790 0.874023 12 3638 0.888184 13 7350 0.897217 14 14830 0.905151 15 29758 0.908142 16 59802 0.912506 17 119802 0.914017 18 240362 0.916908 19 480966 0.917370 20 963302 0.918676 21 1927382 0.919047 22 3857746 0.919758 23 7715446 0.919753 24 15437078 0.920122 25 30873042 0.920088 26 61759618 0.920290 27 123512490 0.920240 28 247051278 0.920338 29 494077866 0.920292 30 988213906 0.920346 Table 2: Some values of 3n co cd $h CD co $h a CD Jh Acknowledgements We are grateful to Ciril Petr and Marko Petkovsek for the computation of the data in Tables 1 and 2 and also to Marko for pointing out the connection with bifix-free words. The work of the first author was supported by the Research Grant P1-0297 of Ministry of Higher Education, Science and Technology Slovenia. He is also grateful to the University of Birmingham for his stay there. References [1] L Beaudou, S. Gravier, K. Meslem, Isometric embeddings of subdivided complete graphs in the hypercube, SIAM J. Discrete Math. 22 (2008) 1226-1238. [2] B Bresar, Characterizing almost-median graphs, European J. Combin. 28 (2007) 916-920. o S [3] B. Bresar, W. Imrich, S. KlavZar, H.M. Mulder, R. Skrekovski, Tiled partial cubes, J. Graph Theory 40 (2002) 91-103. CSI [4] D. Eppstein, Cubic partial cubes from simplicial arrangements, Electron. J. Combin. 13 (2006) #R79, 14 pp. [5] R.L. Graham, H. Pollak, On the addressing problem for loop switching, Bell System Tech. J. 50 (1971) 2495-2519. [6] P. Gregor, Recursive fault-tolerance of Fibonacci cube in hypercubes, Discrete Math. 306 (2006) 1327-1341. [7] L.J. Guibas, M.A. Odlyzko, Periods in strings, J. Combin. Theory Ser. A 30 (1981) 19-42. [8] W.-J. Hsu, Fibonacci cubes—a new interconnection technology, IEEE Trans. CO Parallel Distrib. Syst. 4 (1993) 3-12. [9] A. Ilic, S. KlavZar, Y. Rho, Generalized Fibonacci cubes, manuscript, 2009. [10] S. KlavZar, On median nature and enumerative properties of Fibonacci-like cubes, Discrete Math. 299 (2005) 145-153. i [11] S. KlavZar, P. Zigert, Fibonacci cubes are the resonance graphs of Fibonaccenes, Fibonacci Quart. 43 (2005) 269-276. CSI [12] J. Liu, W.-J. Hsu, M.J. Chung, Generalized Fibonacci cubes are mostly Hamil-tonian, J. Graph Theory 18 (1994) 817-829. CO [13] M. Lothaire, Algebraic Combinatorics on Words, Encyclopedia of Mathematics CO and its Applications, 90. Cambridge University Press, Cambridge, 2002. [14] E. Munarini, N. Salvi Zagaglia, Structural and enumerative properties of the Fibonacci cubes, Discrete Math. 255 (2002) 317-324. [15] N. Polat, Netlike partial cubes. I. General properties, Discrete Math. 307 (2007) M 2704-2722. [16] N. J. A. Sloane, The On-Line Encyclopedia of Integer Sequences, http://www.research.att.com/~njas/sequences/ [17] A. Taranenko, A. Vesel, Fast recognition of Fibonacci cubes, Algorithmica 49 cd (2007) 81-93. i i [18] H. Zhang, L. Ou, H. Yao, Fibonacci-like cubes as Z-transformation graphs, Discrete Math. 309 (2009) 1284-1293. &