ARS MATHEMATICA CONTEMPORANEA ISSN 1855-3966 (printed edn.), ISSN 1855-3974 (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 14 (2018) 129-156 https://doi.org/10.26493/1855-3974.996.675 (Also available at http://amc-journal.eu) On the distribution of subtree orders of a tree* We investigate the distribution of the number of vertices of a randomly chosen subtree of a tree. Specifically, it is proven that this distribution is close to a Gaussian distribution in an explicitly quantifiable way if the tree has sufficiently many leaves and no long branchless paths. We also show that the conditions are satisfied asymptotically almost surely for random trees. If the conditions are violated, however, we exhibit by means of explicit counterexamples that many other (non-Gaussian) distributions can occur in the limit. These examples also show that our conditions are essentially best possible. Keywords: Subtrees, normal distribution, homeomorphically irreducible trees, random trees. Math. Subj. Class.: 05C05 1 Introduction By a subtree of a tree, we mean any nonempty connected subgraph; obviously, such a subgraph is again a tree. The distribution of the number of vertices of a randomly chosen subtree of a tree was first studied by Jamison in two papers [5,6], in which he investigates the average subtree order of a tree, i.e. the mean number of vertices of a subtree. Among his main results is the fact that the average order of subtrees of an n-vertex tree is at least (n+2)/3, with equality only for the path. The problems that Jamison proposed in his papers received considerable attention recently [4,14,16], as did other aspects of subtrees in trees, specifically extremal problems, whose study was initiated by Szekely and Wang [12,13]. Jamison's question whether the average order is always at least n/2 for homeomorphically irreducible trees, i.e. trees without vertices of degree 2, was only answered (affirmatively) very recently by Vince and Wang [14], who also showed that the average subtree order of such a tree is less than 3n/4. *We would like to thank an anonymous referee for many useful comments. t This material is based upon work supported by the National Research Foundation of South Africa under grant number 96236. E-mail addresses: naina@sun.ac.za (Dimbinaina Ralaivaosaona), swagner@sun.ac.za (Stephan Wagner) Dimbinaina Ralaivaosaona, Stephan Wagner t Department of Mathematical Sciences, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa Received 17 December 2015, accepted 15 March 2017, published online 22 July 2017 Abstract ©® This work is licensed under http://creativecommons.org/licenses/by/3.0/ 130 Ars Math. Contemp. 14 (2018) 117-128 Many other of Jamison's questions remain open to date. A question of his that was also discussed in the 2011 edition of the Combinatorics REGS [7] reads as follows: Question 1.1. Given a tree T of order n, let sk (T) denote the number of subtrees of order k. When is it true that the numbers s2 (T),... ,sn (T) form a unimodal list (weakly increasing at first, then weakly decreasing)? In particular, is it unimodal when T has no vertices of degree 2? It should be noted here that s1(T) = n and s2 (T) = n - 1 for every tree T of order n, so s1(T) cannot be included if a unimodal list is to be obtained. The question seems to be fairly hard, and we will not actually answer it in this paper. However, we provide a related result: if a tree has sufficiently many leaves and no long branchless paths (this will be made more precise later), then the distribution of the subtree orders is close to a Gaussian distribution in an explicitly quantifiable way. In particular, this is the case for trees without vertices of degree 2. Moreover, the conditions we impose are usually satisfied: for random trees, they are valid asymptotically almost surely. Asymptotic normality of the distribution does of course not imply unimodality, nor the other way around, but the two are clearly connected, so our result provides evidence that the answer to Question 1.1 might be affirmative. It should also be pointed out that our main result parallels a classic theorem of Godsil [3] on matchings: if G1, G2,... is a sequence of graphs, then the distribution of the size of matchings in Gn (suitably renormalised) converges to a Gaussian distribution, provided that the variance tends to infinity. See [8] for a recent extension. Godsil's theorem is based on properties of the matching polynomial, in particular the fact that all its zeros are real. Indeed, it is well known that a polynomial with positive coefficients and only real zeros has log-concave (and thus unimodal) coefficients, so Question 1.1 could be answered affirmatively if all zeros of the polynomial n ESk(T )uk k=2 were real for every T. This "subtree polynomial" was already considered by Jamison himself in [5]. More recently, Yan and Yeh [18] studied a weighted version, and Martin et al. [9] considered a bivariate generalisation involving the number of leaves. Unfortunately, the subtree polynomial does not have real roots only as the matching polynomial does, so the situation for subtrees of trees turns out to be more intricate than for matchings of graphs. As a simple concrete counterexample, consider the star S4 with four vertices: we have 4 E sk(S4)uk =4u + 3u2 + 3u3 + u4, k= i a polynomial with two non-real roots. Even if the first coefficient is removed, we get 4 E sk(S4)uk = 3u2 + 3u3 + u4, k=2 which still has two non-real roots. D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 131 However, we obtain a central limit theorem for the distribution of subtree orders analogous to Godsil's theorem under some technical conditions. Our approach is of a rather different nature, and we hope that it might also prove useful to deal with other problems, such as a conjecture of Alavi, Malde, Schwenk and Erdos [1] concerning the independence polynomial of trees that parallels Question 1.1. Our main theorem can be stated as follows: Theorem 1.2. Let T1,T2,... be a sequence of trees such that \Tn\, i.e. the number of vertices of Tn, goes to infinity, the proportion of leaves among all vertices is bounded below by a positive constant, and the length of the longest branchless path in Tn is at most \Tn \1/2 e for some fixed e (and sufficiently large n). Then the order distribution of the subtrees of Tn (suitably renormalised) converges weakly to a Gaussian distribution. It is easy to find both examples and counterexamples for the normal distribution: for instance, if Tn is an n-vertex star, then the distribution of the subtree orders is essentially a binomial distribution, which converges to a Gaussian law. On the other hand, if one considers the sequence of n-vertex paths, then the limit distribution is quite different. This and other examples and counterexamples will be discussed in Section 2, where we also show that the technical conditions of Theorem 1.2 are indeed important and also essentially best possible. The main part of the paper is organised as follows: we first obtain some auxiliary results and prove two versions of our main theorem (a central and a local limit theorem, see Theorem 4.8 and Theorem 4.10 respectively) for rooted trees in Section 4 before passing on to unrooted trees in Section 5. Rooted trees are more accessible because one can use a recursive approach, and we will see that an appropriate root can always be chosen in such a way that most subtrees contain the root. In Section 6, we will see that in the "generic" case of random trees, the conditions of our main theorem are satisfied, so that the Gaussian distribution is indeed the typical limit distribution of subtree orders. Notation. Throughout this paper, we make frequent use of the Vinogradov symbol C interchangeably with the O-notation: f (T) < g(T) or f (T) = O(g(T)) means that f (T) < Kg(T) for a suitable positive constant K and all (sufficiently large) trees T. If further variables are included in an O-term, the constant K is always independent of those, unless mentioned otherwise. 2 Examples and counterexamples For a tree T, we let S(T) denote the set of all subtrees of T, i.e. all connected induced subgraphs of T. The polynomial associated with this set, which we call the subtree polynomial of T, is denoted by S(T,u): S(T,u)= ^ u|T|. t eS(T) The total number of subtrees is clearly S(T, 1), for which we will simply write S(T). Our goal will be to prove central and local limit theorems for the coefficients of this polynomial. Note also that S„(T, 1) = S(T, u) I is the total number of vertices in T's subtrees, so Su(T, 1)/S(T) is the mean subtree order. Likewise, the variance is given by Suu(T, 1) + Su (T, 1) f Su(T, 1)\2 S(T) V S(T) J . (.) 132 Ars Math. Contemp. 14 (2018) 117-128 Before we get to the proof of the main theorem, let us briefly discuss some examples and counterexamples to illustrate its statement. 2.1 The star If T = Sn is a star of order n, then every subtree either consists of the centre and an arbitrary set of leaves, or it is a single leaf. Thus we have and in particular S(T) = 2n 1 + n - 1. We see that the distribution of subtree orders is essentially a binomial distribution, which gives us a Gaussian distribution in the limit. 2.2 The path The distribution of subtree orders of a path Pn turns out to be quite different: every subtree is again a path and uniquely characterised by its endpoints. We obtain If we divide the subtree orders by n and take the limit, we obtain a distribution whose density is given by f (t) = 2(1 - t) on the interval [0,1]. The examples that we consider in the following are all constructed by suitably combining paths and stars. Depending on how this is done, a variety of different limit distributions can be obtained. Of course, there does not even have to be a limit distribution at all: this is not the case, for example, if we consider a sequence of trees of increasing orders, alternating between paths and stars. 2.3 The broom The simplest possible combination of a star and a path is the broom, consisting of a path of k vertices and i leaves attached to one of its ends (the "centre" of the broom, denoted v in Figure 1). Here, the limit as k + i ^ to depends very much on the relative sizes of k and i. If k is fixed, then there is very little difference to a star, and we obtain a Gaussian limit distribution. On the other hand, if i is fixed, then we have essentially the same order distribution as for a path (and exactly the same in the limit). As soon as i grows faster than log2 k, almost all subtrees contain the broom's centre v (i.e., the proportion of such subtrees tends to 1). This is because there are k2e subtrees containing it, as opposed to O(k2 + i) not containing it. Subtrees containing the centre v have a distribution that is a convolution of a binomial distribution (stemming from the leaves attached to v) and a discrete uniform distribution (stemming from the path). In the limit, the distribution with greater variance dominates. Since the variances are of order k2 and i respectively, we have three phases: (i) k2/i ^ 0: the leaves dominate, and a suitably renormalised version of the order distribution converges to a normal distribution. (ii) k2/i ^ a > 0: the limit distribution is a convolution of a (continuous) uniform distribution and a Gaussian distribution. k=2 n S(Pn,u) = ^(n - k + 1 )uk. k=1 D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 133 (iii) k2/i ^ to (but k/2£ ^ 0): the long path dominates, and the renormalised order distribution converges to a uniform distribution. k vertices t leaves Figure 1: The broom. 2.4 The extended star Figure 2 shows an extended star, obtained by attaching d (> 3) paths of length k to a common vertex v. For fixed d, we obtain (by the same argument as in the previous example) a convolution of d uniform distributions in the limit as k ^ to. As soon as d also tends to infinity, however, the limit is Gaussian again (showing that the conditions of Theorem 1.2 are important, but not strictly necessary). A LLA d paths of lengh k Figure 2: The extended star. 2.5 A discontinuous limit distribution By suitably choosing the parameters of a double-star (see Figure 3), we can even obtain a discontinuous limit distribution. Such a tree consists of a path of length k and leaves attached to the two endpoints vi and v2 (i and r leaves, respectively). We set i = 3n, r = n + c for some constant c, and k = 2n. The same argument that we used for the broom shows that almost all subtrees contain vi in this case. The probability that v2 is contained as well is easily found to be 2c/(1 + 2c) in the limit. In this case, the subtree order is 2n + O(n). Otherwise, it essentially follows a discrete uniform distribution on the interval [1, 2n] (the leaves attached to v1 only playing a minor role). So if we divide the subtree orders by 2n, we obtain a limit distribution that is a mix of the uniform distribution on [0,1] and a point measure at 1, which means that its distribution function has a discontinuity at 1. We remark that another choice of parameters is interesting as well: if we set i = r = 3n and k = 2n, then almost all subtrees contain both v1 and v2 (and the probability that this is not the case is as low as O(4-n)). Thus the subtree order distribution is essentially a convolution of two binomial distributions, and the variance is O(n). This shows that 134 Ars Math. Contemp. 14 (2018) 117-128 £ leaves length k r leaves Figure 3: The double-star. the variance of the subtree order distribution can be as low (in order of magnitude) as the logarithm of the order of the underlying tree, and we conjecture that it cannot be less, i.e. that (2.1) is bounded below by K log |T| for some positive constant K. On the other hand, the order of magnitude of the variance can be as high as |T|2, as the example of the path shows. 2.6 Short branchless paths are insufficient The two conditions of Theorem 1.2 (short branchless paths, many leaves) ensure that the trees Tn are not too "path-like". However, as we exhibit now, neither of the two conditions suffices on its own to ensure a Gaussian limit distribution. The broom is a simple example showing that even a proportion of leaves tending to 1 may not be enough: if we choose k and I such that I = ak2 for some fixed constant a, then we obtain a convolution Gaussian-Uniform in the limit rather than a pure normal distribution. This example also explains why \/|Tn| is the threshold for the length of branchless paths. Finding a counterexample that satisfies the condition on paths, but does not have sufficiently many leaves, is a little bit more complicated. It can be constructed as follows (see Figure 4): fix positive constants a, P,y such that P < a < 1, a + y =1 and 2a > p + 7. Start with a central vertex v, which is connected to I +1 = [nYJ vertices w0,wi,w2,... ,w£ by paths of length [naJ. To each of these vertices except w0, we attach [n/ J leaves. Note that the order of the resulting tree Tn is |Tn | ~ na ■ n< = n, so that there are no branchless paths of length |Tn |1/2 e if e < 1 - a and n is sufficiently large. On the other hand, the number of leaves is L(Tn) ~ n/ ■ nY = o(n) (note, however, that the exponent P + 7 can be made arbitrarily close to 1 with an appropriate choice of a, p, 7). The limit distribution is not Gaussian in this case: the same argument that we used in previous examples shows that v,w1,w2,... ,we and thus also the paths connecting them are part of almost all subtrees. The remaining "random" part is the same as for a broom consisting of a path of length (approximately) na and (approximately) leaves. Since 2a > p + y by our choice, we are in the situation where the limit distribution as n ^ to is uniform. 3 Preliminary results Before we start with the actual proof of our main result, let us fix some notation and prove some auxiliary inequalities. 3.1 Definitions and notation Most of the time, we will be working with rooted trees, since they allow for a recursive approach. Thus we first define an analogue of the polynomial S(T, u) for rooted trees. D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 135 inp j leaves wo v length |_naJ Figure 4: The final counterexample. Consider a tree T with root v0, and let S* (T) be the set of all subtrees of T containing v0. The generating polynomial for subtrees containing the root is denoted by S* (T, u): S *(T, u) = ^ u|T |. t eS'(T) The main reason for considering this polynomial is the fact that it can be computed recursively from the root branches. For a vertex v of T, we let T(v) be the branch of T rooted at v (consisting of v and all its descendants). Suppose that vi, v2,..., vd are the root's children. It is not hard to see that S* (T, u) satisfies the following recursive formula: d S*(T, u) = S*(T(vo),u) = u n (1 + S*(T(vj),u)). (3.1) j=i This follows from the fact that a subtree of T that contains the root v0 induces either the empty tree or a subtree that contains vj in the branch T(vj) for each vj. Notation. For the convenience of the reader we list some further notation that is used throughout this paper: (i) L(T) and L(T) are the set and the number of leaves, respectively. (ii) I(T) and I(T) are the set and the number of interior vertices, respectively. (iii) By a branchless path or 2-path, we mean a path in which all vertices, except for the endpoints, have degree 2. We let P (T) denote the maximum length of a 2-path of T. Moreover, we use c0, c1, c2,... to denote absolute constants (that do not depend on the specific tree or any of its parameters). 3.2 Two inequalities We begin with the following simple but useful lemma, which provides two inequalities that will be used repeatedly in the following section. 136 Ars Math. Contemp. 14 (2018) 117-128 Lemma 3.1. If T is a rooted tree with |T | > 2, then S'(T) > 2l(t) and L(T) > JT^. Proof. Every subset A of L(T) gives rise to a subtree obtained as the union of all paths connecting the leaves in A to the root. If A is empty, we take the subtree consisting only of the root as the corresponding subtree. This proves the first inequality. For the proof of the second inequality, we let V2 (T) be the number of non-root vertices of degree 2 and V>3 (T) the number of non-root vertices of degree at least 3. Consider all maximal 2-paths (not containing the root as an inner vertex in case that the root has degree 2). To each such path, we can uniquely associate its endpoint that is further away from the root, which is either a leaf or a (non-root) vertex of degree at least 3. Thus there are L(T) + V>3(T) such paths. Since the total number of edges, which is |T| - 1, is at most P (T) times the number of maximal 2-paths, we obtain (L(T) + V>3(T))P(T) > |T| - 1. On the other hand, the handshake lemma gives us 2(L(T) + V2(T) + V>3(T)) = 2(|T| - 1) > L(T) + 2V2(T) + 3V>3(T) + 1, the final 1 being the trivial lower bound for the root degree. Thus L(T) > V>3(T) +1, and consequently 2L(T)P(T) > (L(T) + V>3(T) + 1)P(T) > (L(T) + V>3(T))P(T) + 1 > |T|, which is equivalent to the second inequality in the statement of the lemma. □ 4 Rooted trees 4.1 The moment generating function In order to prove the central limit theorem for the order distribution of subtrees, we study the associated moment generating function, first only for rooted trees. Note that S'(T,u) = S '(T,u) = 1 y^ |t | S'(T, 1) S'(T) S'(T) ^ U t es* is the probability generating function for the order of a randomly chosen subtree of T that contains the root. Likewise, S'(T, ef') S'(T) is the moment generating function. For our purposes, it turns out to be useful to consider an auxiliary function, denoted F(T, t). We define it recursively by F(T, t) = log(1 + el) if |T| = 1 and F (T, t) = £ F (T (vj ),t) + f (T,t), (4.1) where f (T,t)=t+log(1 + S^). (4.2) D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 137 Here and in the following, log will always denote the principal branch of the logarithm. In view of (3.1), we have 1 + S *(T,e4) = eF (T-4), (4.3) as can be seen by a simple induction. As a first step, we show that S*(T, e4) is bounded away from 0 if t is sufficiently small, so that we can actually take the logarithm in (4.2). Lemma 4.1. There exist absolute constants S > 0 and c0 > 0 with the following property: if T is a tree such that the lengths of the 2-paths of T are all bounded above by some positive integer P (which can be a function of T), then we have |1 + S*(T,e4)|> eCoL(T) (4.4) whenever |t| < P. Moreover, the function f (T, t) as defined in (4.2) is analytic in the disk defined by the inequality |t| < P. Remark 4.2. It is important to bear in mind that t is complex in this context. If we were to consider only real values of t, it would e.g. be trivial that |1 + S*(T, e4)| = 1 + S*(T, e4) > 1. Proof. We will show that the statements of the lemma hold for the following explicit constants: S = 0.001 and c0 = 0.012. So we assume throughout this proof that S is as defined above. We show first that the inequality (4.4) implies analyticity of the function f (T, t) in (4.2). Note that 1 + = (1 - 1 S •(T,et) V 1 + S*(T,et) If |1 + S*(T, e4)| > eCoL(T) > ec0, then 1 - 1+S.(T et) lies inside the disk with centre 1 and radius e-Co. Thus the reciprocal (1 - 1+S.1(T et)) 1 lies inside the disk with centre ^e-2co and radius 1 . The principal branch of the logarithm is an analytic function inside this disk, so f (T, t) is analytic. Now we prove (4.4). Let P be an arbitrary positive integer, and let T be a tree such that no 2-path of T has length greater than P. The lemma is satisfied for |T| = 1: in this case, we have S* (T, e4) = e4, and it is easily verified that |1 + S*(T, e4)| = |1 + e4| > 1 + e-<5 > eCo holds for |t| < S. If 2 < |T| < 12P, then for every subtree t of T we have |t||r| < 12S. It follows that Re(et|T|) > e-mcos(12S). Therefore, I et|T 1 1+ et|T| > 1+ ^ Re (e t£s»(t) t£s»(t) > e-125 cos(12S)(1 + S*(T)). (4.5) 138 Ars Math. Contemp. 14 (2018) 117-128 Applying Lemma 3.1 to estimate the right side of (4.5), we have |1 + S*(T,et)| = 1+ ^ et|T| > e-m cos(12£)2i(T) t eS"(T) > eco2L(T)-1 > ecoeco(L(T)-1) = ecoL(T). So the proof is complete in this case, and we assume from now on that |T| > 12P. For each vertex v of T, we define m(v,t) = |1 + S *(T (v),et)|. Let v1; v2,... be v's children. Using (3.1), we find that for |t| < P, m(v,t) > e-S/P n m(vj ,t) - 1. (4.6) j Let A be the set of vertices in "small branches", defined formally as the set of all vertices w of T for which |T(w)| < 12P. Thus for every w e A, the bound in (4.5) applies to the branch T(w), and we have m(w,t) = 1+ ^ et|T 1 > e-m cos(12J)(1 + S*(T(w))). (4.7) t eS" (T (w)) We define m0 = 2e-m cos(12^) « 1.976, so we can deduce from (4.7) that for every w e A m(w,t) > mo. (4.8) The rest of the proof is divided into two parts: in the first part, we prove that m(v, t) cannot be too small when v is outside of A. In the second part, we use recursion (4.6) to complete the proof of (4.4). Part 1: We claim that m(v, t) > 3P for all v e T \ A. Assume that the claim is not true, and let w e T \ A be a counterexample (i.e., m(w, t) < 3P) with maximum distance from the root. In addition, let w0 = w, w1,..., wr be the longest sequence of vertices (possibly, r = 0) such that none of these vertices lies in A, wj+1 is wj's only child for 0 < j < r, and wr has either more than one child or a single child that lies in A. Now consider two different cases: (i) Suppose that all of wr's children, which we denote by x1, x2,..., xd, lie in A. Since wr e A, we have |T(wr)| > 12P, so at least one of these children is the root of a branch of order at least 12P/d. Without loss of generality, |T(x1)| > 12P/d. We have m(x1,t) > m;0 (1 + s*(t (X1))) > m;0 (1 + |t (X1)|) > m;0 • ^ by (4.5); the inequality S*(T(x1)) > |T(x1)| simply follows from the fact that we can associate the path from the root, which is also a subtree, to each vertex. Moreover, we know that m(x2, t),..., m(xd, t) > m0 by (4.8). Now (4.6) gives us i S/P mo 12P d-1 ! « S/P m0 D m(wr,t) > e • • • m0 — 1 = 6e • • P — 1. D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 139 Using the numerical values of S and m0, one easily verifies that md - d for all d - 1 and 6e-5/P > 6e-5 > If. Hence, . 11P 1 9P m(wr,t) >--1 > -. ( r, ) - 2 > 2 (ii) Otherwise, wr has at least 2 children xf, x2,..., xd, at least one of which (without loss of generality, xf) does not lie in A. By our choice of w as a counterexample to our claim with maximum distance from the root, we have m(xf, t) — 3P. Moreover, m(xj, t) — min(3P, m0) = m0 for all other children (the lower bound 3P applies if Xj ^ A, the lower bound m0 otherwise). It follows that m(wr, t) - e-5/P • 3P • md-1 - 1 - 3m0e-5/P • P - 1 - y P - 1. Again, we obtain n 9P m(wr, t) — y . Now note that w0, wf,..., wr is a branchless path, so that r < P by definition. We apply (4.6) repeatedly to wr-f, wr-2,..., w0 = w to obtain r-f /n„-5 ) r , ,s " _5fc/P . -5 r—f /0 —5 \ i(w0,t) - (e-5/P)rm(wr,t) e-5k/P - e-5m(wr ,t) - P - ( —--1] P. The last expression is greater than 3P by our choice of S, and we reach a contradiction. So the claim is proven. Part 2: Now we complete the proof of (4.4). Taking the logarithm of inequality (4.6), we obtain logm(v,t)+log(l+ ) - Vlogm(vj,t) - -. (4.9) \ m(v, t) y P Note that the set A can be written as a disjoint union of the vertex sets of certain trees T(yf),T(y2),... rooted at yf,y2,.... Iterating (4.9) from the root v0 to the vertices yf, y2,... and applying (4.7) and Lemma 3.1 yields 1 \ S|T \ A| iog m(vo,t) - V log m(yj,t) - V log(1 + mVt)) ■-(v,tW P j v£T \A v V ' U - E log (^S*(T(yj))) - V log (1+ 1 ^ ST \ A| 2 V \»3>>) ^ t> i m(v,tw P - elog (^2L(T<«j») - E *(i + mj^j) 1 S| T \ A| ■-(v,tW P £T \A v V ' // Furthermore, since m0 < 2 and the trees T(yf), T(y2),... contain all leaves of T, we have E log (m2L(T^))) - E log (mL(T))) jj = log(mo) E L(T(yj)) =log(mo)L(T). 140 Ars Math. Contemp. 14 (2018) 117-128 Now recall that m(v, t) > 3P for all v ^ A, which gives us £ A - 0+mb)+^ * (1+3P)+P > \ A Putting these bounds together, we obtain * ( 3+') ^ logm(vo,t) > log(mo)L(T) - ( IT. 3 P From Lemma 3.1, we know that L(T) > ( ) > 2P Hence, we finally have log | 1 + S*(T,t)| = logm(vo,t) > ^log(mo) - | - 2^ L(T). The proof of (4.4) is completed by applying the exponential function on both sides of the latter inequality and by noting that the constant log(m0 ) -§ - 2J is greater than c0 = 0.012 (defined at the beginning of the proof). □ We have shown that f (T,t) and consequently F(T, t) can be regarded as complex analytic functions in a disk around zero, so F (T, t) admits a Taylor expansion near zero, which we are now going to investigate further. By (4.3), we have and m(t ) = |F (T,t) d2 dt2 = SU(T ) t=o 1 + S^ (T ) SMM(T ) , ,/rm , 2 I ff2(T) = -2F(T,t) + M(T) - m2(T), t=o 1 + S^(T ) where we use S^(T) as a shorthand for S£(T, 1) = JuS*(T, u)| 1 in the same way as S*(T), and SU„(T) is defined analogously for the second derivative. The intuition behind the notation ^(T) and a2 (T) is that these two quantities are essentially the average order of subtrees in S• (T) and the variance respectively, if we include an additional dummy subtree of order 0 in the count (compare also the considerations at the beginning of Section 2). This is asymptotically irrelevant and simplifies the following calculations. For the rest of this section, we let 7 be a fixed positive real number, let P be a positive integer that represents an upper bound on the length of all 2-paths in T, and set A 5 2P where 5 = 0.001 is as defined in the proof of Lemma 4.1. It also follows from Lemma 4.1 that for every vertex v in T, the function F(T(v), t) is analytic in the disk centred at zero with radius 2A. So we can define the quantity r(T ) = sup o<|i| Moreover, the recursion (4.1) also yields t2 F(T, t) - F(T, 0) - v(T)t - a2(T) - = = S (f (T (vj ),t) - F (T (vj), 0) - p(T (vj ))t - a2(T (vj)) j t2 + f (T, t) - f (T, 0) - f '(T, t)t - f "(T, t) -, so by the triangle inequality f (T, t) - f (T, 0) - f '(T, 0)t - f ''(T, 0) f r(T) P1+7. In addition, we know that the lengths of all branchless paths in T(v) are bounded above by P since T(v) is a branch of T, so by Lemma 3.1 we have Therefore, L(T(v)) > ITM > 2PY. sup |f'''(T(v),t)| « A-3e-^« P3(1+Y)e-1, |t| e-5 cos(5)S*(T(v)), (4.15) <4 z — P D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 143 which in turn is strictly greater than 1 by the choice we have made for S and by the fact that S• (T(v)) > 2 since v G I(T). Now for any t such that |t| < A, let C(t, R) be the circle centred at t with radius R = 2\t(v)\ . Note that C(t, R) lies in the region of analyticity of the function f (T(v), z), since if z G C(t, R), we have SSS |z|<|t| + |z -1|< 2PS+7 + . Thus, by Cauchy's integral formula, we have 3! f '"(T (v),i) = ^/ Jc f (T (v),z) - z dz, IC(t,R) (z - t)4 from which we deduce the bound |f'"(T(v), t)| < 48 S-3|T(v)|3 sup |f (T(v), z) - z|. zeC(t,R) The right side can be estimated using (4.15): sup |f (T(v),z) - z| = sup zeC(t,R) zeC(t,R) log(1 + s^mbz) ) < i S- (T (v)) uniformly for |t| < A. Therefore, we obtain sup |f'''(T(v),t)|<< S-jTnr |t| A|T | for some fixed constant A > 0. We have |T(v)|2 a2(T) >>|T| n(v) S- uel(T )nB S -(T (v))- (4.17) The implied constant only depends on A. Proof. Iterating (4.11) (and noting that a2(T) = 1 > 0 if |T| = 1), we obtain ^2(t) > £ n(v)+ E n(v) S- m(t (v))2 ue£(T ) vei(T ) S-(T (v)) > E n(v)+ n(v) m(T (v))2 ve£(T ) vel(T )nB S -(T (v))- 144 Ars Math. Contemp. 14 (2018) 117-128 It was shown by Jamison in [5] that the average cardinality of a subtree containing the root of a rooted tree of order n is at least (n + 1)/2, so SU (T (v)) IT (v)l + 1 S'(T(v)) - 2 ' which implies that (T( )) = SU(T(v)) 1 IT(v)| + 1 1 = |T(v)| M( ( )) S'(T(v)) 1 + S'(T(v))-1 - 2 1 + |T(v)|-1 2 ' So it remains to show that ^ n(v) > |T|. (4.18) veL(T) To this end, we define a set of "exceptional branches" in such a way that n( v) is bounded below by an explicit constant unless v lies in one of these branches. Choose two constants P € (0,1) and K > (A/2)-1/^, and let z1; z2,..., zM be the vertices that satisfy L(T(zj)) < |T(zj)|1-^ and |T(zj)| - K and are closest to the root with this property (in the sense that no vertex on the path from the root to zj satisfies both inequalities). We set Ej = T(zj) and let E be the union of all Ej. Now take any leaf v that does not lie in E, and let v' be its ancestor closest to the root that satisfies |T(v') | < K (possibly, v' = v). Now we split the product that defines n(v) as follows: n(v) = n 1 1 + S '(T (w))-1 weV(v) v v " n i+s'(t(w))-i n 1 + S'(T (w))-1 11 1 + S'(T (w))-1' weP(v)\P(v') v v we?(v') v v " There are at most K vertices in P(v) \ P(v') since the set P(v) \ P(v') lies entirely in T(v'). In addition, for every w we have the trivial bound 1 + S' (T(w))-1 < 2. Therefore, TT _1_> 2-k J-J- 1 + S'(T (w))-1 - . we?(v)\?(v') v v yy Furthermore, for every vertex w on the path from the root to v', we must have |T(w) | — K by the choice of v', and L(T(w)) > |T(w)|1-^ since v does not lie in E. Recall from Lemma 3.1 that S'(T(w)) - 2L(T(w)). Hence we have n(v) - 2-k n (1 + 2-L(T(w)))-1 we? (v') - 2-k n (1+2-it(w)|-)-1 we? (v') - 2-k n (1+2-ji-")-1 j>K 1 D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 145 Note that the infinite product converges. So we can deduce that n(v) is bounded below by a constant that only depends on fi and K unless v e E. Consequently, E n(v) >\L(T) \ E\. (4.19) veL(T) We will see that E cannot contain more than half of the leaves. We may assume that E is non-empty, for otherwise this statement is trivial. So let us assume that EL(Ej) > 2\T\. j=i By the definition of the branches E1, E2,..., EM, this gives us M ]T|Ej> -|T|. A, \Ej> j=i On the other hand, since E1, E2,..., EM are pairwise disjoint, we also have M l!\ ElEj l<|T |. j=i Since we are assuming that E is non-empty, we have M = 0. Hence, by Jensen's inequality, A M /vM |E-|\1 P f|T| A\T\ < E \Ej^ A/r ' \Ej \ ' ^ AJ ' \T \ It follows that - |T |< e |Ej < m( < m (M M > (A/2)1/^\T\. On the other hand, each Ej contains at least K vertices, so we have M \T\ > E \Ej \ > MK. j=1 Combining the last two inequalities, we obtain K < (A/2)-1/^, (4.20) which contradicts the choice of K. This means that \E\ < L(T)/2, so (4.19) finally yields E n(v) > L(T) >\T\, ve£(T) which completes the proof. Note that the implied constant does indeed only depend on A (and our choice of fi and K, which was arbitrary). □ To make use of the previous lemma, we also need to bound n(v) from below for v e I(T) n B, which is achieved by the following lemma: 146 Ars Math. Contemp. 14 (2018) 117-128 Lemma 4.5. For every vertex v G T and every vertex v' G P(v), we have |T (v)| n(v) > n(v') 2|T(v')| Proof. The statement is void if v is the root v0, so we assume from now on that v is not the root. Let v' = w0, w1, w2,..., wk = v be the vertices of the path connecting v' and v (which form part of the path connecting v0 and v). By definition, we have n(v) = t-1 S*(T (wj)) n(v') /=01 + s*(T(wj)). Clearly, S*(T(w/)) > 1 + S*(T(wj+1)) for j = 0,1,..., k - 1; iterating further, we obtain S*(T(w/)) > k - j + S* (T(v)). So we have, for j = 0, 1, . . . , k - 1, S*(T(w/)) ^ k - j + S*(T(v)) 1 + S*(T(w/)) " 1 + k - j + S*(T(v)), and it follows that n(v^ T-r k - j + S*(T(v)) = 1 + S* (T(v)) s*(T(v)) n(v') > M; 1 + k - j + S*(T(v)) 1 + k + S*(T(v)) " k + S*(T(v)). j=0 Now we consider two cases: (i) First, if S*(T(v)) > k then n(v^ S* (T(v)) > ^ |T(v) n(v') - k + S*(T(v)^ 2- 2|T(v')| (ii) Otherwise, if S*(T(v)) < k, then n(v) > S*(T(v)^ |T(v)| n(v'^ 2k - 2|T(v')|' The last inequality holds because S*(T(v)) > |T(v)| and |T(v')| > k (the latter since T(v') contains the k +1 vertices w0, w1,..., wk). Hence, the lemma follows. □ The bound on n(v) is now used to bound r(T) in terms of a2(T). Lemma 4.6. Suppose that L(T) > A|T | for a fixed constant A > 0. We have r(T) < P1+7a2(T). The constant implied in this estimate depends on A and 7, but nothing else, in particular not on P. D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 147 Proof. Recall that B consists of all vertices w for which T(w) < P1+7. We write B as the disjoint union of branches T(y1), T(y2),. .. If v lies on the path connecting the root v0 and one of the y, then by definition we have |T(v)| >P 1+Y. By Lemma 3.1, this implies L(T(v)) > ^ >_^_= i|T(v)|y/(i+y) L(T (v)) - 2P " 2|T(v)|i/(i+Y) 2|T (v)| . Using this inequality, we can argue as in the proof of (4.19) that n(yj) is bounded below by an absolute constant for every j. Applying Lemma 4.5, we deduce that for v G T(yj), n(v) »#i-|T | | T (y, )| " P ■ Therefore, n(v) S'(T(v)) ^ n(v) c.( S*(T(v)) ^ ^ 'S'(T(v)) vel(T)nB v j vel(T(yj)) v v > p-, E iTMP S'(T (v))" vel(T)ns v v The desired statement now follows from Lemma 4.3 and Lemma 4.4. □ As a consequence of Lemma 4.6, we now obtain the required information on the Taylor expansion of F(T, t). Proposition 4.7. Let S = 0.001 be as previously defined, and let A, 7 > 0 be fixed constants. If L(T) > A|T|, then we have t2 F (T, t) = F (T, 0) + M(T )t + ct2(T ) - + O (P (T )1+Y a2(T ) |t|3) (4.21) for |t| < 2P(y)i+Y, where the constant implied in the O-term only depends on A and 7. Proof. This statement follows directly from Lemma 4.6 and (4.10). □ 4.2 Central limit theorem We are now ready to prove the central limit theorem for the order distribution of subtrees. Theorem 4.8. Let T1, T2,... be a sequence of rooted trees such that |Tn| ^ to as n ^ to and the following two conditions are satisfied for all sufficiently large n: (i) P(Tn) < |Tn|1 — for some constant e > 0, (ii) L(Tn) > A|Tn| for some constant A > 0. 2 148 Ars Math. Contemp. 14 (2018) 117-128 Then the distribution of the random variable X*, defined as the order of a randomly chosen subtree of Tn containing the root, is asymptotically Gaussian. More precisely, if $*(x) denotes the distribution function of the renormalised random variable Y' = X - MTO *(Tn) ' then we have the following estimate for the speed of convergence: sup ^(x) - 1 V2n. 2/2dt = O (|T„|- (4.22) for every positive constant a < e/3. The constant implied in the O-term only depends on a and A. Proof. For ease of notation, we drop the dependence on n. Recall that the moment generating function of X* = Xn* is E e \ = S*(T, e*) S*(T) . Instead of working directly with X*, we use the modified random variable X* = X* that also includes an empty dummy subtree. The moment generating function of this random variable is given by E fe^) = 1 + S*(T,et), V J 1 + S*(T) , and if Y* = Y„* = (X* - ^(T„))/a(T„) is the associated renormalised random variable, it is easy to see that the distribution functions of Y* and of Y* differ only by very little: (x) - $*(x)| < 1 1 + S '(T ) (4.23) for all x G R, so it is sufficient to prove the estimate for instead of $'. The condition L(T) > A|T| implies -2(T) » |T| by Lemma 4.4, in particular a(T) ^ œ as |T | ^ œ. The moment generating function of the renormalised random variable Y * is E (etY*) = e^(T>E (eiX*/ff(T>) m(t )t = exp r(T ) + FT, r(T ) - F(T, 0) . The expansion in Proposition 4.7 gives us t FT and thus 'a(T ) E F(T, 0) + MTh , t^ fPiIl1+7,,,3 -(T)t +2+ ^ ^Tj |t|3 tY " = exp — + O P (T ) 1+Y 2 V vlT (4.24) x t e x 2 3 D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 149 if Itl < — 2P(T)1+'y. Note that we can choose 7 freely here (the choice affects the O-constant, though). The condition P(T) — |T|1 — allows us to choose 7 in such a way that P (T ) 1+Y Therefore, E tY ' -)• 0. et2/2 for any fixed t as n ^ to, which would already prove a central limit theorem. For the precise error estimate, we use the following Berry-Esseen type inequality [10, Theorem 5.1]: sup xe R $*(x) - 1 v^n. 2/2dt — C1 ,-M -M VT (t) - e-t2/2 dt+MM for certain absolute constants c1, c2, where (t) = i eity(y) = E (eitY*) . J — TO In view of (4.24), we have Vt(t) - e-t2/2 < |t|3e 3e-t2/2 P (T )1+Y if |t|3 = O /P (T )1+^ .Therefore, sup xR (x) - v^n. 2/2dt / p (t )1+7 1 V VT + M for any M satisfying M3 = O (-/jTÏ/P (T)1+^ . We choose M P (T )1+Y 1/3 and y in such a way that 1 M (P(T)m 1/L |(1+y)(1/2-^ 1/3 = y(1 Iv^j ^ vm J 11 -2e)/6-e/3 — |t — . Note finally that the difference between $*(x) and $*(x) is uniformly bounded above by S• (T) —1 in view of (4.23). Since S• (T) > |T| > |T|a, this completes the proof. □ x t e x 1 t e 150 Ars Math. Contemp. 14 (2018) 117-128 4.3 Local limit theorem Now that we have established a central limit theorem, it is natural to ask whether a local limit theorem for single coefficients of S* (T, u) also holds. To be precise, given a sequence of rooted trees T1, T2,... satisfying both properties of Theorem 4.8, can we give an estimate for the number of subtrees of order k, for values of k around the mean ^(T„)? In this section, we show that it is indeed possible to obtain such a result. Before we come to the proof, an estimate for |S* (T, u) | when u lies on the unit circle is required. This is precisely what we state in the next lemma. Lemma 4.9. Let A, 7 > 0 be fixed constants, and suppose that L(T) > A|TThere exist constants ¿1,03,04 depending on A, 7 such that, with A ¿1 2P (T )1+y' we have |1 + S*(T,ejt)| < j e-c3t2a(T )2 if t e [-A1, Ai], 1 + S*(T) <| e-c4t2|T| forall t G [-n,n]. Proof. The bound corresponding to |t| < A1 follows easily from Proposition 4.7 for sufficiently small S1 < S (= 0.001). Thus it suffices to prove the second bound. Recall that we have d S*(T,e") = H(1 + S *(T (vj ),e")) j=i if v1, v2,..., vd are the root's children, and consequently d |1 + S*(T,e")| < 1 + n|1 + S*(T(vj),e")|. (4.25) j=i This motivates the definition of a polynomial R(T, x) (for positive real x) that is similar to S * (T, u): it is given by R(T, x) = x for |T | = 1 and the recursion d R(T(v),x) = 1 + ^ R(T(vj),x). (4.26) j=i In view of (4.25), we have |1 + S*(T,eit)1 < R(T, |1 + e"|) (4.27) and 1 + S*(T, 1) = 1 + S*(T) = R(T, 2). Note that R(T, x) is a polynomial of degree L(T) with positive coefficients. Therefore, it is a strictly increasing function of x, and it admits the trivial lower bound R(T, x) > xL(T) (4.28) for all positive x. We also define the function G(T, x) = log(R(T, x)), which satisfies the recurrence G(T, x) = £ G(T(vj), x) - log (1 - rT^)) , (4.29) D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 151 where G(T, x) = log x (and thus G'(T, x) = x-1) if T only has one vertex. In order to estimate S* (T, eJi) by means of (4.27), we establish a bound for the difference G(T, 2) -G(T, x) for x in the interval [%/2,2]. By the mean value theorem, there exists some y € [x, 2] such that G(T, 2) - G(T, x) = (2 - x)G'(T, y). It is not hard to see from (4.26) that the derivative G'(T, y) satisfies G'(T, y) = RRT(,^,)y- 1 E G'(T(vj), y). (4.30) We essentially use the same argument as in the proof of Lemma 4.4 to bound G' (T, y) from below. Iterating (4.30) starting from the root of T down to the leaves, we obtain, with J 1 if v is the root of T, Uv, y) = yL(T(v)) > 2l(t(v))/2, the same argument that gave us (4.18) now yields G'(T, y) > 2 E £(v,y) >> E ^(v,y) >|T|. ve£(T) ve£(T) This implies that there exists a positive constant c5 such that G(T, x) - G(T, 2) < C5(x - 2)|T|. Equivalently, if %/2 < x < 2, then R(T,x) < eC5(x-2)|T| (431) R(T, 2) < e . (4.31) To complete the proof, recall that (by (4.27)) |1 + S*(T, eJi)| is bounded above by R(T, |1 + e"|) while R(T, 2) = 1 + S*(T). For |t| < n/2, we have |1 + e"| > V2 and 2 |1 + e"|- 2 = 2(cos 2 - 1) <--.t2, 2 n2 thus |1 + S*(T,e")| R(T, |1 + e"|) ^/n^iT < -c^m 1 + S*(T) < R(T, 2) < < if we choose c4 < 2c5/n2. For the case that |t| > n/2, we simply note that R(T, x) is an increasing function of x, so that |1 + S*(T, eit)| R(T, |1 + e"|) R(T, ^2) < e-C5(2-V2)|T| < e-c4i^|T| 1 + S*(T) < R(T, 2) < R(T, 2) < < if we choose c4 < (2 - -\/2)c5/n2. This completes the proof. □ 152 Ars Math. Contemp. 14 (2018) 117-128 Now we have all required ingredients for a local limit theorem. In the following, we let sk (T) denote the number of subtrees of order k in T that contain the root, so that | T| S*(T, u) = Y sk(T)uk. k=1 Theorem 4.10. Suppose that the sequence T1, T2,... of rooted trees satisfies the conditions of Theorem 4.8. If k = ^(T„) + xa(T„), then we have sk (Tn) e-x2/2 S*(Tn) v/2na(Tn)' uniformly for x in any fixed compact interval as n ^ . Proof. Once again, we drop the index n for convenience. By Cauchy's integral formula, the number sk (T) can be expressed as Sk(T^ io,1) (1 + S*MzSl, where C(0,1) is the unit circle. If we set z = eJi, then we obtain 1 sk(T)=2n/n (1 + S*(T,eJi^e iM --iki dt. Choose 7 > 0 and k > 0 in such a way that 7/2 + 3k < e, and set M = |T|K/a(T). We split the integral into two parts: the central part 1 rM / ) — (1 + S*(T, ei()) e-iktdt, 2n J-m v y and the rest. Recall that we are assuming P(T) < |T|1/2-e and that we have already establisheda(T)2 > |T|. Since A1 = ^(T) ,T| 1/2-«-(1/2-e)(1+7^ |T|e-7/2-K M 2P (T)1+Y |T |k ^ | | ^ | | is greater than 1 for sufficiently large |T|, we have M < A1 = 2P(t^+t, so we can apply Proposition 4.7, which gives us, for |t| < M, 1 + S*(T,e") = exp(F (T,it)) = exp (F(T, 0) + V(T)t — ct2(t)^ + O(|T|3k+(1/2-,)(1+t)-1/2)) = exp (F(T, 0) + V(T)t — ct2(t))(1 + O(|T|-(^-y/2-3k))) . We plug in k = ^(T) + xa(T) and obtain 1 r m / ) 1 r m 1 (1 + S*(T,e"))e-iktdt =— eF(T.°)-i-(T)t--2(T)t2/2 dt 2n./-m v y 2n J-M + O |-(£-y/2-3k) jM eF(T,°)--2(T)t2/2 dtj . D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 153 Since we have ,-M /Ml p eF(T,0)-ixa(T)t-CT2(T)t2/2 dt = eF(T,0)-x2/2 + pf eF(T,0)-CT2(T)t2/2 dt\ -M CT(T) WM ' = ^IneF(T,0)-x2/2 + p ^eF(T,0)-a2(T)M2/2^ = eF(T,0)-x2/2 + p ^F(T,0)-|T|2k/2 j , and eF(T'0) = 1 + S* (T), we end up with 1 r / \ e-x /2 / \ - J-m (1 + S*(T, ei4))e-iktdt = S*(T)(1 + p(|T|-(-/2-3K))) . For the remaining integrals, where |t| > M, we use the estimates from Lemma 4.9. For |t| < A1 = 2P(T)1+Y, they give us |1 + S*(T, ei4 1 + S*(T) and for |t| > A1, we get < e-C3M2ct(T)2 = e-C3|T|2 |1 + S*(T,eit)| e-C4A2|T| < e-52C4|T|1e(1e2«)(1+Y)/, e-52C4|T|2«-Y/4 1 + S*(T) M will only contribute to the error term. In summary, we have S'(T) = e ^ /2 (1 +n (| T|-(e-7/2-3«))\ S*(T) V2na(T„)^ + PUJ | which completes the proof. □ Remark 4.11. Theorem 4.10 provides a positive answer to Question 1.1 in an asymptotic sense for large rooted trees (and as we will see in the following section, also unrooted trees) without vertices of degree 2, since both technical conditions are trivially satisfied in this case. 5 Unrooted trees Now that we have established both a central and a local limit theorem for the number of subtrees containing the root of a rooted tree, we would like to carry the results over to unrooted trees as well. This is achieved by means of the following lemma, which guarantees the existence of a vertex that is contained in most subtrees: Lemma 5.1. For every tree T, there exists a vertex v of T such that the proportion of subtrees of T that do not contain v is at most |T|2-L(T)/2. Proof. Let v be a vertex that minimises the sum of the distances to all leaves, i.e. the expression J2weaT) d(v, w) attains its minimum (this is called a "leaf centroid" in [17], 154 Ars Math. Contemp. 14 (2018) 117-128 in analogy to the centroid). Let T, T2,..., Tk be the branches of T, rooted at v, and vi, v2,..., vk the corresponding neighbours of v. The important observation about v is that none of the branches can contain more than half of the leaves: if Tj contains more than L(T)/2 leaves, then we have E we£(T ) d(v, w) > ^^ d(v we£(T ) since d(vj, w) = d(v, w) — 1 if w is in Tj, and d(vj, w) = d(v, w) + 1 otherwise. This would contradict the choice of v. Let t be a subtree of T that does not contain v. It must then be completely contained in some branch Tj. It has a unique vertex closest to v, which we denote by w. We can associate 2|L(T)n(T\Tj)| > 2L(T)/2 subtrees to t that contain v, obtained by adding the path from w to v as well as all non-leaves not contained in Tj and any subset of the |L(T) n (T \ Tj) | leaves that do not lie in Tj. Finally, we root the resulting subtrees at w. Let the total number of subtrees of T be denoted by S(T) and the number of those subtrees not containing v by S°(T). The construction above yields at least 2L(T)/2 rooted subtrees of T associated with every subtree t that does not contain v. The original tree t can be recovered uniquely from such a tree a: it consists of the root w of a and all vertices for which the unique path from v passes through w. Thus our construction is an injection to the set of rooted subtrees of T (whose cardinality is clearly at most |T|S(T)), and we obtain the inequality S°(T) • 2l(t)/2 < |T| • S(T), from which the statement of the lemma follows. □ Our main theorem now follows immediately both in the central and local version: Theorem 5.2. Let T1, T2,... be a sequence of trees such that |Tn| ^ to as n ^ to and the following two conditions are satisfied: (i) P(Tn) < |Tn|1 for some constant e > 0, (ii) L(Tn) > A|T„| for some constant A > 0. Then the distribution of the random variable Xn, defined as the order of a randomly chosen subtree of Tn, is asymptotically Gaussian. More precisely, if $n(x) denotes the distribution function of the renormalised random variable X„ - E(X„) v^xm ' then we have the following estimate for the speed ofconvergence: sup œe R $„(x) - van J— 2/2dt O (|Tn|- (5.1) for any positive constant a < e/3. The constant implied in the O-term only depends on a and A. Moreover, if k = E(Xn) + x^JV(Xn) G N, then we have the local limit theorem P(X„ = k) e-x2/2 V/2nV(X„) ' uniformly for x in any fixed compact interval. ), j 1 a e D. Ralaivaosaona and S. Wagner: On the distribution of subtree orders of a tree 155 Proof. As in the proofs of Theorem 4.8 and Theorem 4.10, we suppress the dependence on n for ease of notation. Choose v as in Lemma 5.1, and let X(v) be the random variable defined as the order of a randomly selected subtree of T containing v. By Lemma 5.1, the total variation distance between the two random variables X = Xn and X(v), which is defined as sup |P(X(v) g A) - P(X g A)|, A is O(|T|/2l(t)/2). In view of our assumption on the number of leaves, this goes to 0 even at an exponential rate. Letting ^(T) and a2(T) be defined as before for the tree T rooted at v, it is also easy to see by the same argument that E(X) = ^(T) + O(1) and V(X) = ct2(T) + O(1) (in fact, both error terms can be made exponentially small). The two statements now follow directly from Theorem 4.8 and Theorem 4.10. □ 6 Random trees The technical conditions of Theorems 4.8, 4.10 and 5.2 are not satisfied for all possible sequences of trees, but they do hold for "generic" (randomly chosen) trees. In fact, it was shown in [11] that the length of the longest branchless path of a random labelled tree of order n is concentrated around log n for large n (with a limit distribution of double exponential type), and the number of leaves of a random labelled tree of order n is concentrated around n/e (with a Gaussian limit distribution, see e.g. [2, Section 3.2.1]). Analogous statements (with different constants) hold for other families of random trees (e.g. random plane trees, random binary trees). If Tn denotes a random labelled tree of order n for n = 1,2,..., then a simple application of the Borel-Cantelli Lemma shows that the conditions of Theorem 5.2 with arbitrary e < 2 and A < 1 are satisfied for all but finitely many T/ almost surely (for both conditions, it is not difficult to obtain bounds for the probability that they are not satisfied that go to 0 faster than any power of n). Thus we obtain the following theorem: Theorem 6.1. Let T1, T2,... be a sequence of uniformly random labelled trees, where the order of Tn is n, let Xn denote the order of a randomly chosen subtree of Tn, and let $n be the distribution function of the renormalised random variable Xn - E(Xn) We have sup 1 iX 2 $n(x)--= e-t2/2dt v2W-to 0 as n ^ to almost surely. Informally, this means that the distribution of subtree orders is close to a Gaussian distribution for almost all trees. We remark that the average subtree order of a random labelled tree Tn of order n was shown to follow a Gaussian limit distribution itself (see [15] for details). X References [1] Y. Alavi, P. J. Malde, A. J. Schwenk and P. Erd6s, The vertex independence sequence of a graph is not constrained, Congr. Numerantium 58 (1987), 15-23, http://www.mta.renyi.hu/ ~p_erdos/198 7-33.pdf. 156 Ars Math. Contemp. 14 (2018) 117-128 [2] M. Drmota, Random Trees, Springer, Vienna, 2009, doi:10.1007/978-3-211-75357-6. [3] C. D. Godsil, Matching behaviour is asymptotically normal, Combinatorica 1 (1981), 369-376, doi:10.1007/bf02579458. [4] J. Haslegrave, Extremal results on average subtree density of series-reduced trees, J. Combin. Theory Ser. B 107 (2014), 26-41, doi:10.1016/j.jctb.2014.02.003. [5] R. E. Jamison, On the average number of nodes in a subtree of a tree, J. Combin. Theory Ser. B 35 (1983), 207-223, doi:10.1016/0095-8956(83)90049-7. [6] R. E. Jamison, Monotonicity of the mean order of subtrees, J. Combin. Theory Ser. B 37 (1984), 70-78, doi:10.1016/0095-8956(84)90046-7. [7] R. E. Jamison, Mean size of subtrees of a tree, 2011, REGS in Combinatorics (University of Illinois), http://www.math.uiuc.edu/~west/regs/meantree.html. [8] J. Kahn, A normal law for matchings, Combinatorica 20 (2000), 339-391, doi:10.1007/ pl00009835. [9] J. L. Martin, M. Morin and J. D. Wagner, On distinguishing trees by their chromatic symmetric functions, J. Combin. Theory Ser. A 115 (2008), 237-253, doi:10.1016/j.jcta.2007.05.008. [10] V. V. Petrov, Limit Theorems of Probability Theory, volume 4 of Oxford Studies in Probability, The Clarendon Press, New York, 1995. [11] H. Prodinger and S. Wagner, Bootstrapping and double-exponential limit laws, Discrete Math. Theor. Comput. Sci. 17 (2015), 123-144, https://www.dmtcs.org/dmtcs-ojs/ index.php/dmtcs/article/view/27 81.1.html. [12] L. A. Szekely and H. Wang, On subtrees of trees, Adv. in Appl. Math. 34 (2005), 138-155, doi:10.1016/j.aam.2004.07.002. [13] L. A. Szekely and H. Wang, Binary trees with the largest number of subtrees, Discrete Appl. Math. 155 (2007), 374-385, doi:10.1016/j.dam.2006.05.008. [14] A. Vince and H. Wang, The average order of a subtree of a tree, J. Combin. Theory Ser. B 100 (2010), 161-170, doi:10.1016/j.jctb.2009.05.006. [15] S. Wagner, Additive tree functionals with small toll functions and subtrees of random trees, in: N. Broutin and L. Devroye (eds.), 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), volume AQ of DMTCS Proc., pp. 67-80, 2012, https://hal.inria.fr/hal-01197234/. [16] S. Wagner and H. Wang, On the local and global means of subtree orders, J. Graph Theory 81 (2016), 154-166, doi:10.1002/jgt.21869. [17] H. Wang, Centroid, leaf-centroid, and internal-centroid, Graphs Combin. 31 (2015), 783-793, doi:10.1007/s00373-013-1401-1. [18] W. Yan and Y.-N. Yeh, Enumeration of subtrees of trees, Theor. Comput. Sci. 369 (2006), 256268, doi:10.1016/j.tcs.2006.09.002.