/^creative ^commor ARS MATHEMATICA CONTEMPORANEA Also available at http://amc-journal.eu ISSN 1855-3966 (printed edn.), ISSN 1855-3974 (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 10 (2016) 393-410 On convergence of binomial means, and an application to finite Markov chains* David Gajser IMFM, Jadranska 19, 1000 Ljubljana, Slovenia, and Department of Mathematics, FMF, University of Ljubljana, Slovenia Received 15 July 2014, accepted 21 December 2015, published online 13 April 2016 Abstract For a sequence {an}n>0 of real numbers, we define the sequence of its arithmetic means {an}n>0 as the sequence of averages of the first n elements of {an}n>0. For a parameter 0 < p < 1, we define the sequence of p-binomial means {an}n>0 of the sequence {an}n>0 as the sequence of p-binomially weighted averages of the first n elements of {an }n>0. We compare the convergence of sequences {an}n>0, {an}n>0 and {an}n>0 for various 0 < p < 1, i.e., we analyze when the convergence of one sequence implies the convergence of the other. While the sequence {an}n>0, known also as the sequence of Cesaro means of a sequence, is well studied in the literature, the results about {an}n>0 are hard to find. Our main result shows that, if {an}n>0 is a sequence of non-negative real numbers such that K}„>0 converges to a G R U {to} for some 0 < p < 1, then {an}n>0 also converges to a. We give an application of this result to finite Markov chains. Keywords: Sequence, convergence, Cesaro mean, binomial mean, finite Markov chain. Math. Subj. Class.: 00A05 1 Introduction For a sequence {an}n>0 of real numbers and for a parameter 0 < p < 1, define the sequence of its arithmetic means {a^ }n>0 and the sequence of its p-binomial means {an }n>0 as and an = ^ i "" jpiqn-iai, i=0 ^1 ' ai n + 1 ^ i=0 1 a *This work is partially funded by the Slovenian Research Agency, Research Program P1-0297. E-mail address: david.gajser@fmf.uni-lj.si (David Gajser) ©® This work is licensed under http://creativecommons.org/licenses/by/3.0/ 394 Ars Math. Contemp. 10 (2016) 393-410 where q = 1 - p. We see that a« is a uniformly weighted average of the numbers ao, —i,.. ., an and an is a binomially weighted average of the numbers ao, ai,. .., an. In this article, we will analyse the relationship between the convergence of sequences {an}n>0, {a«}n>0 and {a«}n>0. Our results are presented in the following table. {an}n>0 K1 }n>0 K2 }n>0 {an}n>0 {an}n>0 - - - K1 }n>0 /s ? an>0 an>0 - - K2 }n>0 /s an>0 —^ - - {an}n>0 /s /s /s —^ —^ —^ Table 1: The table shows whether the convergence of a sequence in the leftmost column implies the convergence of a sequence in the first row, for 0 < p < p2 < 1. The symbol means that the implication holds, and the symbol means that there is a counterexample with an e {0,1}, for all n G N. If there is a condition above , then the implication does not hold in general, but it holds if the condition is true. If there is a ? before the condition, we do not know whether the condition is the right one (an open problem), but the implication does not hold in general. The sequence {a«}n>0 is also known as the sequence of Cesaro means and is well studied in the literature [1, 4]. On the other hand, information about the convergence of p-binomial means is hard to find. Also, the notion of p-binomial means is coined especially for the purpose of this article. However, there are a few definitions that are close to ours [1, 4, 5]. First, we have to mention the Hausdorff means [1, 4]: the p-binomial means as well as the arithmetic mean are its special cases. Unfortunately, the Hausdorff means are a bit too general for our purposes in the sense that the known results that are useful for this paper can be quite easily proven in our special cases. One of the closest notions to the k-binomial mean is the one of k-binomial transform [5]: = E (n)k"ai, which coincides with {an }n>0 for k = p = 0.5, but is different for other p and k. Another similar definition is given with Euler means [4, pages 70, 71]: - f (n +1 an =2«+^ U +1 i=0 v Some results, like the first row and the first column of Table 1, are not hard to prove (Section 3). Other results (Sections 4 and 5) require more careful ideas. This is true especially for the main result of this paper, Theorem 5.1, which proves, using the notation from Table 1, that {a«}n>0 ==>■ {a«}n>0. In Section 6 we give an application of this theorem to finite Markov chains. D. Gajser: On the convergence of binomial means . 395 2 Preliminaries Let N, R+ and R+ be the sets of non-negative integers, positive real numbers and nonnegative real numbers, respectively. For a e R, let |_aj be the greatest integer not greater than a and let [a] be the smallest integer not smaller than a. We will allow a limit of a sequence to be infinite and we will write a < to (which means exactly a e R) to emphasize that a is finite. For functions f, g : N ^ R+ we say that • f (n) = O(g(n)) if there is some C > 0 such that f (n) < Cg(n) for all sufficiently large n, • f (n) = ©(g(n)) if there are some Ci, C2 > 0 such that Cig(n) < f (n) < C2g(n) for all sufficiently large n, f(n) f (n) = o(g(n)) if g(n) is non-zero for all large enough n and lim —— n^TO g(n) 0. The following lemma will be useful later. Lemma 2.1. Let u : N ^ R\{0} and k : N ^ R be functions such that lim u(n)k(n) = lim u(n) = 0. Then n^w lim n (1 + u(n)) k(n)/u(n) ek(n) 1. Proof. Because ex = ^ ^y- and ex > 1 + x, there is an analytic function g : R ^ R+ such that ex = 1 + x + g(x)x2 and g(0) = 1. Hence, if we omit writing the argument of functions u and k, lim n (1 + u)k/u lim n eu — g(u)u2 2 \ k/u ukg(u) lim n 1 g(u)u2 \ S(")"' g( u) u2 Because lim -= 0 and because lim(1 — x)1/x = e-1, we have n^w eu x^Q lim 1 g(u)u2\ s(«)u2 _1 From the result follows. lim ukg(u) n^w eu □ Some properties of probability mass function of binomial distribution Let X be a random variable having a binomial distribution with parameters p e (0,1) and n e N. For q =1 — p and i e Z, we have by definition Pr[X = i] = Bn (P) = f (nyqn-i if 0 < i < else. 2 k u u e e e u e 0 n 396 Ars Math. Contemp. 10 (2016) 393-410 In this subsection, we state and mathematically ground some properties that can be seen from a graph of binomial distribution (see Fig. 1). The results will be nice, some of them folklore, but the proofs will be technical. 0.10 0.08 0.06 0.04 0.02 0.00 0 50 100 150 200 250 300 Figure 1: Binomial distribution with n = 300 andp = 0.2 (red), p = 0.5 (green), p = 0.7 (blue). The graphs show Bln(p) with respect to i. It is well known (see some basic probability book) that the expected value of X is E(X) = pn. First, we will prove that also the "peak" of the probability mass function is roughly at pn. Lemma 2.2. For p e (0,1), n e N and for 0 < i < n, Bn (p) > Bn-1(p) ^ i < (n +1)p. Proof. The expression Bn (p) = (n - i + 1)p Bn-1(p) i(1 - p) is at least 1 iff i < p(n + 1). □ Next, we state a Chernoff bound proven in [3, inequalities (6) and (7)], which explains why the probability mass function for binomial distribution "disappears" (see Fig. 1), when i is far enough from pn. Theorem 2.3. Let X be a binomially distributed random variable with parameters p e (0,1) and n e N. Then for each 6 e (0,1), Pr [|X - np| > np6] < 2e-52np/3. We will only use the following corollary of the theorem. It is not difficult to prove and the proof is omitted. Corollary 2.4. For p e (0,1), let a : N ^ R+ be some function such that a(n) < p^n for all n. Then, for all n e N, it holds ^ Bn(p) < 2e-a2(n)/(3P). i: \i-np\">^fna.(n) B3 oo(0.2) B3 oo(0.5) B3 oo(0.7) D. Gajser: On the convergence of binomial means . 397 This corollary also tells us that, for large n, roughly everything is gathered in an O( -,/n) neighborhood of np. What is more, the next lemma implies that in o(v/n) neighborhood of np, Bn (p) does not change a lot. Lemma 2.5. Let p G (0,1) be a parameter and let ß(n) : N ^ R be a function such that |ß(n)| = O(y/n) and lim |ß(n)| = to. Then, for all large enough n, it holds n^w Bjinpj(p) 1 Lß(n)J2 Proof. For all large enough n for which ^(n) > 0, we have Snnpj-Lß(n)j(p) Wj-W)j) (1 - p)Lß(n)J = LßTT 1 (n - Lnp_l + Lß(n)J - i)p (2 1) ¿=o (LnpJ - i)(1 -p) . Lß(n)J-1 / i I M M < TT i+ 1 • IM V p(1 - p) n In the last inequality we used the fact that (n - |npj + (n)j - i)p ^ 1 + 1 _ L^(n)j (|_npj — i)(1 — p) _ p(1 — p) n holds for large enough n, which is true because it is equivalent to LnPj (np - InpJ) + (Iß(n)J - i)p + ¿(1 - p) +--< -Iß(n)J, pn np where • np — |_npj < 1, • (L^(n)j — i)p + ¿(1 — p) < L^(n)j • max{p, 1 — p}, since i < L^(n)j and • ^^ = O(1), since £(n) = O(^). Using the fact that (1 + x) < ex for all x G R, we see that ^ (p) < ^ M + 1 ^(n)j BnnpJ-Lß(n)J(p) " ,-=0 V p(1 - p) n Lß(n)J-1 n1 Lß(n)J e p(1-p) n i=0 1 Lß(n) J 2 = e p(1—p) n . 398 Ars Math. Contemp. 10 (2016) 393-410 For all large enough n for which fi(n) < 0, we write b(n) = ||_fi(n)J| and we have g,i"pJ(p) _ (Lnpj)(i—p)b(n} B^^^fr) (L„pJ+6(„})pb(n} b(n}-1 , , _ TT (LnPJ + b(n) - i)(l - p) (n — |_npJ — i)p b(n} —1 , , < TT (nP + b(n) — i)(1 — p) < i=o (n(1 — p) — i)p < 1 (n — Ln(1 — p)j +b(n) — i)(1 — p) i=0 (|n(1 — p)J — i)p which is the same as (2.1) in the case fi (n) > 0, only that p and (1 — p) are interchanged. □ Now we know that the values of Bn (p) around the peaks in Fig. 1 are close to the value of the peak. The next lemma will tell us that the peak of Bn (p) is asymptotically %/2nP(1-p}n Lemma 2.6. For 0 < p < 1, it holds lim v/2np(1 — p)nBnnpJ(p) _ 1. n—^^o Proof. Using Stirling's approximation n ! lim n—TO V2nn (n)n we see that lim v/2np(1 — p)nBnnpJ(p) V2np(1 — p)n ■ n)"pLnpJ(1 — p)n-Lnp lim e n^oo lim LnPj ^-;-rr- y2ninpJ (inp1) np ■ V2n(n — LnpJ) ( nnpLnpJ (1 — p)n-L"pJ n—TO |_npJ L"pj ■ (n — LnpJ)n-L"PJ np I / \ n — np np n — np lim | npJ n — | np . np — |_npJ \ LnpJ / np — |_npJ ^ n Lnp n—TO \ [npJ y y n — |_npJ _ lim e"P-L"PJ . e-(nP-L"pJ} _ 1 n—TO where the last line follows by Lemma 2.1. □ 1 D. Gajser: On the convergence of binomial means . 399 3 Comparing convergence of {an}n>0 with convergence of {aPn}n>o and R}n>o In this section we show that the convergence of {an}n>0 implies the convergence of {<}„>0 and {<}„>o to the same limit. It is well known [4] that if {a„}„>0 converges to a € R U {to}, then so does {an}n>0. The next theorem tells us that in this case, {an}n>0 also converges to the same limit. Theorem 3.1. If {an}n>0 converges to a € R U {to}, then {an}n>0 and {an}n>0 converge to a for all 0 < p < 1. Proof. The case a = to is straightforward to handle, so suppose a < to. Take any e > 0 and such N that |an — a| < e for all n > N. Then, for n > N, k- « 1 < n + 1 1 - a) ¿=0 n e i«i - ai n +1 ¿=0 1 A, . 1 < —T E - «1 + —T • e(n - N). n + 1 f—n +1 ¿=0 The last line converges to e when n goes to infinity, which implies that {an}n>0 converges to a. To prove the convergence of binomial means, denote q = 1 — p. For n > N, we get i«n - «i E (n)piqn-i(«i - «) ¿=0 ^1 ' < E (n)piqn-ii«i - «i+e È ('/^¿q" ¿=0 ^ ' i=N+ N / \ 0 also converges to «. □ One does not need to go searching for strange examples to see that convergence of {«"}n>0 or {a"}n>0 does not imply the convergence of {an}n>0. We state this as a proposition. Proposition 3.2. There exists a sequence {an}n>0 of zeros and ones that does not converge, whereas {a"}n>0 and {a"}n>0 converge for all 0 < p < 1. Proof. Define 0 if n is odd, an = 1 if n is even. 400 Ars Math. Contemp. 10 (2016) 393-410 Then {an}n>0 does not converge and {an}n>0 converges to 1, as can easily be verified. Next, we will prove that {an}n>0 converges to 1. First, we see that, for 0 < p < 1 and q = 1 - p, the value of q - p is strictly between -1 and 1, thus (q - p)n converges to 0 when n goes to infinity. Hence, E inVqn-i - £ inVq"-i = (q - p)n converges to 0. Because i is odd E Op' we have that {an }n>0 converges to 2. i is odd + E I )p*qn-i = □ 4 Comparing convergence of binomial means In this section we compare convergence of sequences {an}n>0 for different parameters p € (0,1). We will see that if 0 < pi < p2 < 1, then the convergence of {a^2 }n>0 implies the convergence of {an }n>0 to the same limit, while the convergence of {a^1 }n>0 does not imply the convergence of {an }n>0 in general. We leave as an open problem whether for an > 0 it does. First, let us prove the main lemma in this section, which tells us that the sequence of p2-binomial means of the sequence of p1-binomial means of some sequence is the sequence of (pip2)-binomial means of the starting sequence. Lemma 4.1. For 0 0, let {bn}n>0 be the sequence ofpi-binomialmeansof {an}n>0, i.e., bn = an1 for all n. Then Vff = aniP2 for all n. Proof Denote qi = 1 - pi and q2 = 1 - p2. We change the order of summation, consider (j)(nt = enten-it for i < j and replace j by k = j - i: bn2 = E< ,p i j n-j p2 q2 j ai 7 = 0 ¿=0 E< i=0 n PiP2 i j -i j n-j Piqi p2?2 n E j=i j- « j—i j—i n—j qi p2 q2 j i=0 Em ) P1P2 Er-1 ) (qiP2 )k qn-i-k k=0 ^ = E M i^)pip2(qip2 + q2 )n ¿=0 = E aM n) (pip2)i(1 - pip2)n ¿=0 V«/ The last line equals aniP2. □ n j j n j n — i D. Gajser: On the convergence of binomial means . 401 The next theorem will now be trivial to prove. Theorem 4.2. For 0 < p1 < p2 < 1 and for a sequence {an}n>0, if {an,2 }n>o converges to a G R U {to}, then {an1 }n>0 also converges to a. Proof. From Lemma 4.1 we know that {an }n>0 is the sequence of ^-binomial means of the sequence {an }n>0. By Theorem 3.1, it converges to a. □ The next proposition tells us that the condition 0 < p1 < p2 < 1 in the above theorem cannot be left out in general. Proposition 4.3. For 0 < p1 < p2 < 1, there exists a sequence {an}n>0, such that {a^1 }n>0 converges to 0, but {an,2 }n>0 does not converge. Proof. Denote q1 = 1 - p1 and define {an}n>0 as an = an for some parameter a G R. If a > -1, {an}n>0 converges (possibly to to), so let us examine the case when a < -1. In this case we have an1 = E f^aVlqr' = (api + qi)n = (pi(a - 1) + 1)n, i=0 ^% ' which converges iff p1 < j^a. So we can choose such an a that p1 < t^ < P2, i.e., 1 - < a < 1 - .It follows that {an,1 }n>0 converges to 0, but {an }n>0 does not converge. □ The sequence {an}n>0 in the above proof is growing very rapidly in absolute value and the sign of its elements alternates. We think that this is not a coincidence and we state the following open problem. Open problem 4.4. Let {an}n>0 be a sequence of non-negative real numbers. Is it true that, for all 0 < p1?p2 < 1, the sequence {an1 }n>0 converges to a iff {an }n>0 converges to a? If the answer is no, is there a counterexample where an G {0,1}? Note that the condition an > 0 is also required for the main result of the paper, Theorem 5.1. If the answer on 4.4 were yes, then we would only have to prove Theorem 5.1 in a special case, e.g. for p = 1. The (possibly negative) answer would also make this paper more complete (see Table 1). In the rest of this section we will try to give some insight into this problem and we will present some reasons for why we think it is hard. Suppose we have 0 < p1 < p2 < 1 and a sequence {an}n>0 of non-negative real numbers such that {an }n>0 converges to a G R (the case when {an }n>0 converges is covered by Theorem 4.2). The next lemma implies that {an}n>0 has a relatively low upper bound on how fast its elements can increase, ruling out too large local extremes. Lemma 4.5. Let {an}n>0 be a sequence of non-negative real numbers and let 0 < p < 1. If {an }n>0 converges to a < to, then an = O^^/n). Proof. We know that an > a i npi (p), where(p) « , 1 „ by Lemma 2.6 L j i/2np(1-p)n and an « a for large n. Hence, a|npj = O(^n). □ 402 Ars Math. Contemp. 10 (2016) 393-410 To see whether {af2 }„>0 converges, it makes sense to compare apf/pi j with apf/p2j, since the peaks of the "weights" B[f/pi j (pi) and B*n/P2j (p2) (roughly) coincide at n (see Fig 2). Now the troublesome thing is that, for large n, the peaks are not of the same height, but rather they differ by a factor 1 - P2 1 - pi by Lemma 2.6. Because the weights B[f/pi j(p1) and B*n/P2j(p2) are (really) influential only in the O(v/n) neighborhood of n (Corollary 2.4 and Lemma 2.5), where the p1-weights are only a bit "downtrodden" p2-weights, it seems that the convergence of {af1 }n>0 could imply the convergence of {af2 }„>0. Figure 2: The graphs show B[f/pij (p1) and j (p2) with respect to i in the neighborhood of n for n = 300, p1 = 0.4 (red) andp2 = 0.7 (green). On the other hand, one could take an = 0 for all except for some n where there would be outliers of heights ©( Jn). Those outliers would be so far away from each other that the weights Bf(p1) could "notice" two consecutive outliers, while the weights Bf(p2), which are slimmer, could not (in Fig. 2, the two outliers could be at 280 and 320). Then {af1 }n>0 could converge because there would be a small difference between [when the weights Bf(p1) amplify one outlier] and [when they "notice" two outliers] (these two events seem to be the most opposite). On the other hand, {af2 }n>0 would not converge. From Chernoff bound (Corollary 2.4) and from Lemma 2.5 it follows that the (horizontal) distance between outliers should be roughly CJn for some C. What C would be the most appropriate? 5 Comparing convergence of {apn}n>o with convergence of {a*n}n>o This section contains the main result of this paper, which is formulated in the next theorem. The proof will be given later. Theorem 5.1. Let {a„}„>0 be a sequence of non-negative real numbers such that {af }f>0 converges to a G R U {to} for some 0 < p < 1. Then {af}„>0 converges to a. An example of how this theorem can be used is given in Section 6.1. Here we give an example where {aj^^o converges to a G R U {to} for all 0 < p < 1, but {af}„>0 does not converge. D. Gajser: On the convergence of binomial means . 403 Proposition 5.2. For the sequence {an}n>0 given by an = ( — 1)nn, {an }n>0 converges to 0for all 0 < p < 1 and {an}n>0 does not converge. Proof. Take 0 < p < 1 and denote q =1 — p. It holds = Ê (—1)ii(n )pV-i = —np £ ( —1)i-1( i — J) pi-1qn-1-(i-1) = —np(—p + q)n 1. Because q — p is strictly between —1 and 1, {an }n>0 converges to 0. However, the induction shows that a|n+1 = — 1 and a|n = ^n+r, which implies that {an}n>0 does not converge. □ Next, we show that we cannot interchange {an}n>0 and {an}n>0 in Theorem 5.1. Proposition 5.3. There exists a sequence {an}n>0 of zeros and ones such that {an}n>0 converges to 0 and {an }n>0 diverges for all 0 < p < 1. Proof. Define 1 if there is some k G N such that In — 22k | < 2kk an 0 else. So {on|„>o has islets of ones in the sea of zeros. The size of an islet at position N is ©(%/N log(N)) and the distance between two islets near position N is ©(N). It is easy to see that the sequence a*n converges to zero. Now let 0 < p < 1. By Chernoff bound (Corollary 2.4) we see that Bln (p) is concentrated around i = |_-pj and that, for |i - np| > V—log(n), we have roughly nothing left. It is easy (but tedious) to show formally that |ap22fc/pj j ^ converges to 1 and that |ap22fc-i/p^ > converges to 0, which implies that {an}n>0 diverges. □ Now we go for the proof of Theorem 5.1. First, for a sequence {an}n>0 and 0

0 as a sequence of arithmetic means of the sequence {an}n>0. We get 1n an = £ ap n+1 j=oj -i ^ j / • > 1 - ^ \ .-iqj-i n j n +1 EE ¿)pq' j=0i=0 v 1 n n nrr E ai E i=0 j=i v ' where q = 1 — p. p an 404 Ars Math. Contemp. 10 (2016) 393-410 It makes sense to define weights wln(p) = J2 j=i (j)piqj-i, so that it holds 1 n < = nrr^ wn (p)ai. i=0 Figure 3: The graph shows w\00(0.3) with respect to i. We see a steep slope at i = 90 plunging from height approximately 03 to 0. We can see in Fig. 3 that the weights wln (p) have a very specific shape. They are very close to P for i < np — e(n) and very close to 0 for i > np + e(n), for some small e(n). Such a shape can be well described using the next lemma (and its corollary), which gives another way to compute wni ( p) . 50 100 150 200 250 300 Lemma 5.4. For 0

2 1 else. Now the following corollary holds. Corollary 5.5. For 0 1 — n- 0(l°g(n)), p wLnp+^(n)J (p) < n- 0(l°g(n)). Proof. Use the Chernoff bound (Corollary 2.4) on the expression for wn (p) from Lemma 5.4. □ For 0

0, define sequences {an(p)}n>o, ian(p)} }n>0 j x=q p j 406 Ars Math. Contemp. 10 (2016) 393-410 and {<(p)}n>o as Lpn-e(n)J (P) = (P)ai i=0 Lpx+e(x)J-1 an(P) = Y^ WX (P)ai i= |_px- e(x)J + 1 axn < (p) = wX (p)ai - |_px+e(x)J Hence, we have n + 1 (aX(p)+ aX(p) + aX(p^. From Corollary 5.5 we see that the weights in aX(p) are very close to p, which suggests that x+YaX (p) can be very close to a*xpj (see Lemma 5.8 below). From the same corollary we see that x+iaX(p) can be very close to 0 (see Lemma 5.7 below). And because we have a sum of only ©(e(n)) elements in aX(p), x+iaX(p) could also be very close to 0 (see Lemma 5.6 below). We have just described the main idea for the proof of the main theorem, which we give next. It will use three lemmas just mentioned (one about aX(p), one about aX(p) and one about aX(p)), that will be proven later. Proof of Theorem 5.1. Suppose that an > 0 for all n and suppose that {aX}X>0 converges to a € R U {to} for some 0 < p < 1. We know that this implies the convergence of {aX*}X>0 to a (Theorem 3.1). First, we deal with the case a = to. We can use the fact that wlx(p) < Y for all i (see Lemma 5.4), which gives aP* ax 1x wX (p)a n + 1 • 0 i=0 1 ^ 1 <-y —ai n + 1 f—' p i=0 p Hence, {aX}X>0 converges to a = to. In the case a < to, we can use Lemma 5.6 and Lemma 5.7 to see that and | x+YaX(p)} > converge to 0. Hence, j x+iaX(p)} > converges to a. Lemma 5.8 tells us that in this case {aX}X>0 also converges to a. □ Now we state and prove the remaining lemmas. Lemma 5.6. Let 0 < p < 1 and let {ax}x>0 be a sequence of non-negative real numbers such that {aX}X>0 converges to a < to. Then j x+iaX(p) j > converges to 0. 1 aP* = x a i y a x+i x x0 D. Gajser: On the convergence of binomial means . 407 Proof. Fix e > 0, define 5(n) = |_log2(n)J and let k : N ^ N be such that pn — e(n) < k(n) < pn + e(n) — S(n) holds for all n. We claim that k(n) + S(n) ai = O(V'n), i=k(n) where the constant behind the O is independent of k. To prove this, define N = N(n) = k(n) P . It follows that N = n ± ©(e(n)). Note that, for large enough n, k(n)+5(n) N ^ OiBNr(p) < ^ fliSNr(p) < a + ?, i=k(n) i=0 because {an}n>0 converges to a. From Lemma 2.5 which bounds the coefficients BN (p) around i = pN it follows that, for all k(n) < i < k(n) + S(n), BiN(p) > e-o(1)BNNpJ(p). Using N = n ± ©(e(n)) and the bound bNNpJ(p) 1 ©(%/N ) from Lemma 2.6, we get k(n) + S(n) ^ ai < (a + e)eo(1) ©("n) = O("^). i=k(n) Next, we can see that Lpn+e(n)J-1 , £ ai = O^ n \ log n y i= Lpn-e(n)J + 1 Just partition the sum on the left-hand side into ^(y sums of at most S(n) elements. Then we have Lpn+e(n)J-1 / e(n) ^ \ „ I n ai = " ■ w i I <^(n) I log nj Lpn-e(n)j + 1 V V 7 7 V 6 7 Now using wn (p) < p from Lemma 5.4, we get p Lpn+e(n)J-1 -, -, Lpn+c(n)J 1 -, / an(p) < \ V ai = \ O ( — n +1 n(p) " (n +1)p ^ i (n + 1)p V^gn i= Lpn-e(n)J+ 1 which implies the convergence of j n+1 a\h (p)} > to 0. □ 408 Ars Math. Contemp. 10 (2016) 393-410 Lemma 5.7. Let 0 < p < 1 and let {an}n>0 be a sequence of non-negative real numbers such that {<}„>0 converges to a < <. Then j n+rja^(p) j > converges to 0. Proof. From Lemma 5.4 we see that the weights wln(p) decrease with i, so *(P) " (n +1) ai. n +1 " (n +1) , Lpn+e(n)J Corollary 5.5 gives us w„ (p) — n 0(l°s(n)), while Lemma 4.5 implies a = O(%/I). Hence, j n+r «n(p) f converges to 0. □ ^ + J n>0 Lemma 5.8. Let 0 < p < 1 and let {an}n>0 be a sequence of non-negative real numbers such that \ n+r«n(p) f converges to a < to. Then {an}n>0 converges to a. ^ + J n>0 Proof. Because the weights wln(p) are bounded from above by p (Lemma 5.4), we have «n(p) (n+i)p < * n +1 ^ [pn - e(n)J + 1 - Lpn-e(n)J' where the left side converges to a. Because the weights wn (p) decrease with i (Lemma 5.4) and because wnnp-e(n)^ (p) > ' - n- 0(l°g(n)) (Corollary 5.5), we have p «X (p) n +1 ^ nv _ 1 Lpn—e(n)J - n + 1 ^ (P - n- 0(l°g(n))) • ([pn - e(n)J + 1)' where the right side converges to a. Hence, a*pn_e(n)j is sandwiched between two sequences that converge to a. It follows that {an}n>0 converges to a. □ 6 Application of Theorem 5.1: a limit theorem for finite Markov chains For a stochastic matrix1 P, define the sequence {Pn}n>0 as Pn = Pn. As in the one-dimensional case, we define the sequence {P,t}n>0 as P^ = n+rj J2"=0 Pn. We say that {Pn}n>0 converges to A if, for all possible pairs (i, j), the sequence of (i, j)-th elements of Pn converges to (i, j)-th element of A. In this section, we will prove the following theorem. Theorem 6.1. For any finite stochastic matrix P, the sequence {P^ }n>0 converges to some stochastic matrix A, such that AP = PA = A. This theorem is nothing new in the theory of Markov chains. Actually, it also holds for (countably) infinite transition matrices P. Although we did not find it formulated this way in literature, it can be easily deduced from the known results. The hardest thing to show 1A stochastic matrix is a (possibly infinite) square matrix that has non-negative real entries and for which all rows sum to 1. Each stochastic matrix represents transition probabilities of some discrete Markov chain. No prior knowledge of Markov chains is needed for this paper. D. Gajser: On the convergence of binomial means . 409 is the convergence of {P^}n>0 [2, page 32]. After we have it, we can continue as in the proof of Theorem 6.1 below. We will give a short proof of Theorem 6.1, using only linear algebra and Theorem 5.1. First, we prove a result from linear algebra. Lemma 6.2. Let P be a finite stochastic matrix and let P = 1 ^P + IJ. Then a) for all eigenvalues A of P, it holds |A| < 1, b) for all eigenvalues A of P for which |A| = 1, it holds A = 1, c) the algebraic and geometric multiplicity of eigenvalue 1 of P are the same. Proof. Since the product and convex combination of stochastic matrices is a stochastic matrix, Pn and Pn are stochastic matrices for each n e N. First, we will prove by contradiction that, for all eigenvalues A for P, it holds |A| < 1. Suppose that there is some eigenvalue A for P such that |A| > 1. Let w be the corresponding eigenvector and let its ¿-th component be non-zero. Then |(Pnw)j | = | An | • | w^ |, where the right side converges to œ and the left side is bounded by maxj |wj | (since Pn is a stochastic matrix). This gives a contradiction. Hence, for all eigenvalues A for P, it holds |A| < 1. Because P is also stochastic, the same holds for P. We see that we can get all eigenvalues of P by adding 1 and dividing by 2 the eigenvalues of P. Because P has all eigenvalues in the unit disc around 0, P has all eigenvalues in a disc centered in 1 of radius Hence, for all eigenvalues A of P, for which |A| = 1, it holds A = 1. 2 2 For the last claim of the lemma, suppose that the algebraic and geometric multiplicity of eigenvalue 1 of P are not the same. Then, by Jordan decomposition, there is an eigenvector v for eigenvalue 1 and a vector w, such that Pw = v + w. Then, for each n e N, we have Pnw = nv + w. Because v has at least one non-zero component and because all components of Pnw are bounded in absolute value by maxj |wj |, we have come to contradiction. Hence, the algebraic and geometric multiplicity of eigenvalue 1 of P are the same. □ Proof of Theorem 6.1. For the matrix P = + /), let P = XJX-1 be its Jordan decomposition. From Lemma 6.2 a) and b) it follows that the diagonal of J consists only of ones and entries of absolute value strictly less than one. From Lemma 6.2 c) it follows that the Jordan blocks for eigenvalue 1 are all 1 x 1. It follows that Jn converges to some matrix J0 with only zero entries and some ones on the diagonal. Hence, Pn converges to A = XJ0X-1. Since Pn is a stochastic matrix for all n, the same is true for A. Using Pn = Pn, we see that {Pn}n>0 is just a sequence of 0.5-binomial means of the sequence {Pn}n>o, hence by Theorem 5.1 {P*}n>0 also converges to A. Thus, we have AP = ( lim —1—— V P M P \ n^TO n +1 / V i=o / = lim n+2 (j- VP. __L_I) n^TO n +1\ n + 2^ n + 2 I = A. The same argument shows also PA = A. □ 410 Ars Math. Contemp. 10 (2016) 393-410 An application of Theorem 6.1 in formal language theory. The following application was suggested by an anonymous reviewer. To each formal language L C E* where E is a finite alphabet, we can assign the sequence = E n Ll fn (L) = |£n| of relative frequencies of words of length n in L. If this sequence is convergent, then its limit can be taken as a measure for the size of L, which provides interesting information about L. Unfortunately, the sequence fn(L) can be divergent even if L is a regular language, such as, for example, the language E of all words of even length. But using Theorem 6.1 we can show that f* (L) converges for every regular L as follows. If L is regular, it is recognised by some deterministic finite automaton (Q, qo, F, E, 6) where Q = {qo, q1,..., qm-1} is the set of states, qo e Q is the starting state, F C Q is the set of final states, and 6 : Q x E ^ Q is the transition function. Define the matrix T e Qmxm with elements = |{a e E; 6(ft,a) = }|, i,j =0,1m - 1. i,j |E| ' ,J ' ' ' Then T is stochastic and fn(L) = J2q.eF(Tn)o,j, so by Theorem 6.1, f*(L) is convergent and we can define ^(L) = f*(L) to be the (finitely additive) measure of L. For example, returning to the language E of words of even length, we find that ^(E) = 0.5. Acknowledgements. The author wishes to thank the anonymous reviewer for a very constructive review, and Sergio Cabello, MatjaZ Konvalinka and Marko Petkovsek for valuable comments and suggestions. All figures were done with Mathematica 8.0.1.0. References [1] J. Boos and P. Cass, Classical and Modern Methods in Summability, Oxford University Press, 2000. [2] K. L. Chung, Markov Chains with Stationary Transition Probabilities, Springer, 1960. [3] T. Hagerup and C. Rub, A guided tour of Chernoff bounds, Inf. Process. Lett. 33 (1990), 305-308, doi:10.1016/0020-0190(90)90214-I, http://dblp.uni-trier.de/ db/journals/ipl/ipl33.html\#HagerupR90. [4] G. H. Hardy, Divergent Series, Clarendon Press, 1st edition, 1956. [5] M. Z. Spivey and L. L. Steil, The fc-binomial transforms and the Hankel transform, J. Integer Seq. 9 (2006).