Also available at http://amc.imfm.si ISSN 1855-3966 (printed edn.), ISSN 1855-3974 (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 2 (2009) 77–92 Vulnerability bounds on the number of spanning tree leaves Gábor Salamon ∗ Department of Computer Science and Information Theory Budapest University of Technology and Economics 1117 Budapest, Magyar tudósok körútja 2., Hungary Received 28 September 2008, accepted 3 February 2008, published online 6 April 2009 Abstract Hamiltonicity and vulnerability of graphs are in a strong connection. A basic necessary condition states that a graph containing a 2-leaf spanning tree (that is, a Hamiltonian path) cannot be split into more than k + 1 components by deleting k of its vertices. In this paper we consider a more general approach and investigate the connection between the number of spanning tree leaves and two vulnerability parameters, namely scattering number sc(G) [10] and cut-asymmetry ca(G) [16]. We prove that any spanning tree of a graph G has at least sc(G)+1 leaves. We also show that if X ⊂ V is a maximum cardinality independent set of G = (V,E), such that the elements of X are all leaves of a particular spanning tree then |X| = ca(G) + 1 = |V | − cvc(G), where cvc(G) is the size of a minimum connected vertex cover of G. As a consequence we obtain a new proof for the following results: any spanning tree with independent leaves provides a 2-approximation for both the MAXIMUM INTERNAL SPANNING TREE [16] and the MINIMUM CONNECTED VERTEX COVER [17] problems. We also consider the opposite point of view by fixing the number of leaves to q and looking for a q-leaf subtree of G that spans a maximum number of vertices. Bermond [2] proved that a 2-connected graph on n vertices always contains a path (a 2-leaf subtree) of length min {n, δ2}, where δ2 is the minimum degree-sum of a 2-element independent set. We generalize this result to obtain a sufficient condition for the existence of a large q-leaf subtree. Keywords: Spanning tree leaves, vulnerability, Hamiltonian path. Math. Subj. Class.: 05C38, 05C85 ∗Research is supported by Grant No. 67651 of the Hungarian National Science Fund (OTKA). E-mail address: gsala@cs.bme.hu (Gábor Salamon) Copyright c© 2009 DMFA 78 Ars Math. Contemp. 2 (2009) 77–92 1 Introduction Hamiltonicity and vulnerability of graphs are closely related [5]. The most basic necessary condition of hamiltonicity claims that a graph containing a Hamiltonian path cannot be split into more than k + 1 components by deleting k of its vertices. This gives the inten- tion for examining the existence of Hamiltonian cycles and paths as a function of different vulnerability parameters, such as e.g. toughness or scattering number. For a survey on this topic the reader is referred to [1]. A closely related approach is to generalize hamiltonicity to explore bounds on the length of the longest cycle or path of a graph. As an example we mention here a generalization of Ore’s theorem [13] given by Bermond [2]. The length of the longest cycle in a 2-connected graph G on n vertices is at least min {n, δ2(G)}, where δ2(G) is the minimum degree of a 2-element independent set of G. As an imme- diate consequence, a 2-connected graph G always has a path (a 2-leaf subtree) of length min {n, δ2(G)}. In Section 6 we generalize the result of Bermond proving that if G is an arbitrary graph on n vertices then there exists a q-leaf subtree of G spanning at least min {n, ρq,2 + q − 1} vertices, where ρq,2 is the minimum degree-sum of the two highest degree vertices of a q-element independent set. This is also a generalization of a result of Broersma and Tuinstra [4] that gives a sufficient degree condition on the existence of a q-leaf spanning tree. The HAMILTONIAN PATH problem is among the first ones turned to be NP-hard [7]. However, its practical importance needs efficient approximation algorithms and heuristics for finding different generalizations of Hamiltonian paths. Several problems of this kind were investigated. Karger et al. [11] dealt with the LONGEST PATH problem and proved that there is no constant factor approximation algo- rithm unless P=NP. Furer and Raghavachari [6] considered the MINIMUM DEGREE SPAN- NING TREE problem and gave an algorithm that approximates the optimal solution within an additive constant of 1. Gargano et al. [9] looked at the MINIMUM BRANCHING SPAN- NING TREE problem, a degree-based generalization of Hamiltonian paths. They proved that it is NP-complete to decide whether a spanning tree with at most k branchings (ver- tices of degree at least 3) exists (for any fix k). However, the approximability properties of this approach are still undiscovered. In this paper we consider the following optimization problems: Problem 1: MINIMUM LEAF SPANNING TREE Input: A connected graph G. Goal: Find a spanning tree of G with a minimum number of leaves (1-degree vertices). Problem 2: MAXIMUM INTERNAL SPANNING TREE Input: A connected graph G. Goal: Find a spanning tree ofG with a maximum number of internal vertices (non-leaves). Clearly, both of these problems are generalizations of the HAMILTONIAN PATH prob- lem, as a Hamiltonian path is, on one hand, a 2-leaf spanning tree, and on the other hand, a spanning tree with n − 2 internal vertices. From the optimization point of view, these problems are equivalent as only the cost function is complemented by counting internal nodes instead of leaves. However, the two problems have different approximability behavior. Using the result of Karger et al. [11], Lu and Ravi [12] proved that there is no constant factor approximation for the MINIMUM LEAF SPANNING TREE problem unless P=NP. G. Salamon: Vulnerability bounds on the number of spanning tree leaves 79 Complementing the cost function yields a better situation as the MAXIMUM INTERNAL SPANNING TREE problem can be constant-factor approximated. A 2-approximation can be done in linear time [16] and a 7/4-approximation in O(n4) time [15]. Both algorithms are based on finding a spanning tree whose leaf-set has a large independent subset. This pushes us to examine the spanning trees with many independent leaves. Our approach uses vulnerability parameters to bound the number of leaves and the number of independent leaves in a spanning tree. We mention here that Prieto and Sloper [14] investigated the MAXIMUM INTERNAL SPANNING TREE problem from a parameterized complexity point of view. They gave an O(n3) kernel showing FPT-membership of the problem. Besides dealing with the MAXIMUM INTERNAL SPANNING TREE problem, we also consider the following problem: Problem 3: MINIMUM CONNECTED VERTEX COVER Input: A connected graph G. Goal: Find a minimum size subset X of V (G) such that X spans a connected subgraph and meets each edge of G. Savage gave a 2-approximation algorithm for this problem [17]. We prove that any spanning tree with independent leaves provides a 2-approximation for both the MAXIMUM INTERNAL SPANNING TREE and the MINIMUM CONNECTED VERTEX COVER problems. Although this does not improve directly the current approximation ratios, we believe that our new approach leads to better approximations in the future. The rest of the paper is organized as follows. In Section 3 we show a connection between scattering number and the minimum number of leaves. In Section 4 we prove that cut-asymmetry exactly determines the size of a maximum independent subset of leaves. Several properties of cut-asymmetry are also presented, among others a sufficient condition for traceability. Namely, we prove that if ca(G) ≤ 1 then G has a Hamiltonian path. We believe that this result has its own interest as cut-asymmetry and scattering number are very similar graph parameters and sc(G) ≤ 1 is a well-known necessary condition of traceability. The algorithmic aspects of our results are presented in Section 5. We show how our approach can be used to directly obtain linear-time 2-approximation algorithms for both the MAXIMUM INTERNAL SPANNING TREE and the MINIMUM CONNECTED VERTEX COVER problems. In Section 6 we deal with large q-leaf subtrees of the input graph (given a fix parameter q). We provide a sufficient degree-based lower bound on the number of nodes that can be spanned by a q-leaf subtree. 2 Notation By a graph G = (V,E) we mean an undirected simple connected graph on n = |V | vertices. We let dG(v) (or d(v)) denote the degree of vertex v in G, and for a set X of vertices dG(X) = ∑ v∈X dG(v). We use eG(X,Y ) to denote the number of edges between vertex disjoint sets X and Y of V (G). NG(v) is the set of neighbors of a vertex v. Let Vi(G) (V≥i(G)) be the set of vertices having degree exactly i (at least i). Let G[X] denote the subgraph of G induced by X . A vertex-set X is called independent if G[X] has no edges. α(G) stands for the size of a maximum cardinality independent set of G. The number of components of a subgraph H is denoted by comp(H). Let T be a spanning tree of G. The edges of T are called tree-edges, all other edges of G are non-tree edges. We denote by PT (x, y) the unique path in T between vertices x 80 Ars Math. Contemp. 2 (2009) 77–92 and y. A vertex in V1(T ) is called a leaf, a vertex in V2(T ) is called a forwarding vertex, and a vertex in V≥3(T ) is called a branching. Branchings and forwarding vertices are the internal vertices of T . For a leaf l of T the branching being closest to l is denoted by b(l). If (l, u) is a non-tree edge then u→l is the neighbor of u along the path PT (u, l). The path PT (l, b(l)) is called the branch of l, and b−(l) is the neighbor of b(l) in PT (l, b(l)). The tree-edges that are not in branches are called trunk-edges. If the leaves of T form an independent set of G then T is called an independence tree. G is called traceable if it has a Hamiltonian path. The minimum number of leaves of a spanning tree of G is denoted by ml(G). (Note that ml(G) = 2 if and only if G is traceable.) For the sake of simplicity we use X + x, and X − x instead of X ∪ {x}, and X \ {x}, respectively. Similarly, if X is a vertex-set of G = (V,E) then G−X denotes G[V −X]. 3 The Scattering Number and the Minimum Number of Leaves Scattering number is a vulnerability parameter which measures how much “structural dam- age” can be caused by removing some “important” vertices of the graph [10, 18]. It is defined as follows: Definition 3.1. [10] The scattering number of a non-complete graph G = (V,E) is sc(G) = max X⊂V,X 6=∅ {comp(G[V \X])− |X| : comp(G[V \X]) ≥ 2} . By definition, sc(Kn) = −∞. This definition immediately implies that sc(G) ≤ 1 if G is traceable. This can be interpreted as follows: Proposition 3.2. If G has a 2-leaf spanning tree (i.e. a Hamiltonian path) then ml(G) ≥ sc(G) + 1. We show that every spanning tree of an arbitrary graph G has at least sc(G) + 1 leaves. Firstly, we prove this for trees. Lemma 3.3. Let T be a tree with q leaves. Then q ≥ sc(T ) + 1. Proof. To prove the upper bound let X = {x1, x2, . . . , xk} be the vertex-set providing the maximum in Definition 3.1. If several such sets exist, let us choose one of minimum cardinality. Thus by definition sc(T ) = comp(T [V \ X]) − |X|. Moreover, each vertex xj ∈ X is a branching of T , that is d(xj) ≥ 3 (for j = 1, 2, . . . , k). Otherwise, if some xj ∈ X had degree less than 3 then for X ′ = X − xj we would have comp(T [V \X ′]) ≥ comp(T [V \X])− 1, and so we should have chosen X ′ instead of X . Now let X0 = ∅ and let Xj = Xj−1 + xj (for j = 1, 2, . . . , k). Furthermore let Tj = T [V \ Xj ] (for j = 0, 1, . . . , k). That is, Tj is the forest obtained from T by removing the vertices of Xj . Then comp(Tj) ≤ comp(Tj−1) + d(xj)− 1, and hence comp(Tk) = comp(T [V \X]) ≤ 1 + ∑ xj∈X d(xj)− |X|. (3.1) G. Salamon: Vulnerability bounds on the number of spanning tree leaves 81 On the other hand, let Y denote the set of branchings not in X , namely Y = V≥3(T ) \ X . Then counting the vertices of T according to their degree we obtain: |V (T )| = q + |V2(T )|+ |X|+ |Y | = q + |V2(T )|+ ∑ i≥3 |Vi(T ) ∩X|+ ∑ i≥3 |Vi(T ) ∩ Y |. (3.2) Counting the total degree of vertices in T we have: 2|V (T )| − 2 = q + 2|V2(T )|+ ∑ i≥3 i|Vi(T ) ∩X|+ ∑ i≥3 i|Vi(T ) ∩ Y |, (3.3) Equations (3.2) and (3.3) yield∑ xj∈X d(xj) = ∑ i≥3 i|Vi(T ) ∩X| = ∑ i≥3 (i− 2)|Vi(T ) ∩X|+ 2|X| = q − 2− ∑ i≥3 (i− 2)|Vi(T ) ∩ Y |+ 2|X| ≤ q − 2 + 2|X|. Using (3.1) this implies sc(T ) = comp(T [V \X])− |X| ≤ q − 1 proving the upper bound on the scattering number. The following theorem generalizes Lemma 3.3 for arbitrary graphs. Note that Theorem 3.4 is also a generalization of Proposition 3.2. Theorem 3.4. If G is a connected graph then ml(G) ≥ sc(G) + 1. Proof. Let T be a minimum leaf spanning tree ofG, that is |V1(T )| = ml(G), and letX be a set maximizing comp(G[V \X])− |X|. Then as comp(G[V \X]) ≤ comp(T [V \X]), by Lemma 3.3 we have: sc(G) = comp(G[V \X])− |X| ≤ comp(T [V \X])− |X| ≤ sc(T ) ≤ |V1(T )| − 1 = ml(G)− 1. The scattering number of a tree T can also be used to form an upper bound on the number of leaves of T as the following theorem shows. Theorem 3.5. Let T be a q-leaf tree on at least 3 vertices. Then q ≤ 2 sc(T ). Proof. We construct a subset X of internal vertices of T such that each component of T [V \X] contains at most one leaf of T . Then we prove that |X| ≤ q/2. Combining these with the definition of scattering number directly yields the theorem. Let us call a spanning tree that has at most one branching a spider. We build a sequence {T1 = T, T2, . . . , Tk} of trees such that Tk is a spider. If, for some i, Ti is not a spider we construct Ti+1 as follows: let xi be a branching of Ti with exactly one trunk-edge 82 Ars Math. Contemp. 2 (2009) 77–92 l xi = b(l) yi TiTi+1 Figure 1: Constructing Ti+1 from Ti by removing the branching xi (xi, yi) incident to it (such an xi exists if Ti is not a spider). Let T ′i be the component of Ti[V (Ti) − xi] that contains yi. If yi is not a leaf of T ′i then Ti+1 = T ′i . Otherwise, Ti+1 is obtained by deleting the branch of yi from T ′i (Figure 1). Observe that the leaves of Ti+1 form a subset of leaves of Ti, and that a leaf l is in V1(Ti) \ V1(Ti+1) if and only if b(l) = xi. If Ti is a spider (thus i = k) then we define xk to be the branching of the spider. (If Tk is a path then xk is chosen to be any internal vertex of Tk.) Let X = {x1, x2, . . . , xk}. Observe that this construction ensures that each leaf of T is in a different component of T [V \X], thus q ≤ comp(T [V \X]). To see that q ≥ 2|X| let bi denote the number of those branches of Ti that end in xi. Then bi = dTi(xi)− 1 and bi ≥ 2 for 1 ≤ i ≤ k − 1. This implies that k∑ i=1 bi ≥ 2k − 1. As xi cuts down exactly bi leaves from Ti, and as Tk is a spider with bk + 1 leaves we have |V1(Ti+1)| = |V1(Ti)| − bi, and |V1(Tk)| = bk + 1. Putting these together gives q = k∑ i=1 bi + 1 ≥ 2k = 2|X|. Thus sc(T ) ≥ comp(T [V \X])− |X| ≥ q − q 2 = q 2 , which finishes the proof. Notice that the bounds of Lemma 3.3 and Theorem 3.5 are sharp. On one hand, if T1 is a q-leaf spider then |V1(T1)| = q = sc(T1) + 1. On the other hand, let T2 be a tree on vertices V (T2) = {v1, v2, . . . , vk, u1, u2, . . . , uk, w1, w2, . . . , wk} having edges E(T2) = {(vi, vi+1)}i=1,2,...,k−1 ∪ {(vi, ui)}i=1,2,...,k ∪ {(vi, wi)}i=1,2,...,k (see Figure 2). Then sc(T2) = k = |V1(T2)| 2 . G. Salamon: Vulnerability bounds on the number of spanning tree leaves 83 v1 u1 w1 u2 v2 w2 wk vk uk Figure 2: A graph proving that the bound of Theorem 3.5 is sharp. These examples show that a difference of factor 2 can arise between the number of leaves and the scattering number of trees. In Section 4 we present cut-asymmetry as a vulnerability parameter which exactly determines the number of leaves when applied on a tree. Finally, we give two basic properties of the minimum leaf spanning trees. They are used throughout the next sections. Lemma 3.6. If T is a minimum leaf spanning tree of G then T is either a Hamiltonian path or an independence tree. Proof. Suppose for a contradiction that T is not a Hamiltonian path and there exist two leaves l1, l2 of T such that (l1, l2) ∈ E(G). The spanning tree T ′ with edge-set E(T ′) = E(T ) + (l1, l2)− (b(l1), b−(l1)) has less leaves than T contradicting the minimality of T (Figure 3(a)). Recall that if T is a tree then x→l1 denotes the successor of x along the path PT (x, l1). Lemma 3.7. Let G be a non-traceable graph, and T be a minimum leaf spanning tree of G. Let l1 be an arbitrary leaf of T , and (l1, x) ∈ E(G) \ E(T ) be a non-tree edge. Then dT (x→l1) = 2 and if l2 6= l1 is a leaf of T then (x→l1 , l2) 6∈ E(G). Proof. Let us consider the spanning tree T ′ with edge-set E(T ′) = E(T ) + (l1, x) − (x→l1 , x). Observe that x→l1 is an internal vertex of T . If dT (x→l1) > 2 then V1(T ′) = V1(T )− l1 and so T ′ has less leaves than T , a contradiction (Figure 3(b)). If dT (x→l1) = 2 then V1(T ′) = V1(T )−l1+x→l1 , that is T and T ′ has the same number of leaves. However, the edge (x→l1 , l2) connects two leaves of T ′ and thus (by Lemma 3.6) neither T ′ nor T is a minimum leaf spanning tree (Figure 3(c)). We have the following trivial corollary of Lemma 3.6: Corollary 3.8. If G is a non-complete connected graph then ml(G) ≤ α(G). 4 The Cut-asymmetry and the Maximum Number of Independent Leaves In the previous section we have seen that scattering number provides both lower and upper bounds on the number of leaves of a tree. In this section we use another vulnerability measure, namely cut-asymmetry. Recall that scattering number shows how much structural 84 Ars Math. Contemp. 2 (2009) 77–92 l1 b−(l1) b(l1) l2 (a) x→l1 x l1 (b) l1 x→l1 l2 x (c) Figure 3: Decreasing the number of leaves by local improvements damage can be caused by removing some individual vertices from the graph. We define cut- asymmetry in a similar way but we count the connected subgraphs (instead of individual vertices) to be removed. We prove that this model exactly characterizes the number of leaves in a tree and can be used to determine the maximum number of independent leaves of a spanning tree of an arbitrary graph. Besides, we point out a connection between cut- asymmetry and the minimum size of a connected vertex cover. Let us recall the definition of cut-asymmetry. Definition 4.1. [16] The cut-asymmetry of a graph G = (V,E) is ca(G) = max X⊂V,X 6=∅ {comp(G[V \X])− comp(G[X])} . This definition immediately implies that ca(G) ≥ max {sc(G), 0}, and that ca(G) ≤ n − 2 (equality holds if and only if G is a star). We mention here some further properties of cut-asymmetry. The following theorem is a consequence of Theorems 4.11 and 4.15, however a direct proof is also given below. Theorem 4.2. ca(G) = 0 if and only if G is either a complete graph or a cycle. Proof. If G is complete or a cycle then its cut-asymmetry is trivially 0. To see the other direction let G be a non-complete graph such that ca(G) = 0. Let Z = {z1, z2, . . . , zk} be an arbitrary independent set. As ca(G) = 0 the graph G[V \ Z] has k components G. Salamon: Vulnerability bounds on the number of spanning tree leaves 85 C1, C2, . . . , Ck. Let us contract each component Ci to a single vertex ci. By this transfor- mation we obtain the connected bipartite graph G′ with classes Z and C = {c1, c2, . . . , ck}. Proposition 4.3. G′ is an even cycle. Proof. For every 1 ≤ j ≤ k we have dG′(zj) = 2. Otherwise if dG′(zj) = 1 for some j then comp(G[Z−zj ]) = k−1 and comp(G[V \(Z−zj)]) = k would imply ca(G) ≥ 1. If dG′(zj) ≥ 3 for some j then comp(G[Z−zj ]) = k−1 and comp(G[V \(Z−zj)]) ≤ k−2 would imply ca(G) ≥ 1. A similar reasoning shows that dG′(cj) = 2 for each 1 ≤ j ≤ k. As a result G′ is a 2-regular connected bipartite graph, that is, an even cycle. Proposition 4.4. If vertex v is in the component Ci (for some i) such that |V (Ci)| ≥ 2 then v has at most one neighbor in Z (in graph G). Proof. Suppose that zj1 ∈ Z and zj2 ∈ Z are both neighbors of such a v. Then comp(G[Z + v]) = k − 1 and comp(G[V \ (Z + v)]) ≥ k implies ca(G) ≥ 1, a contradiction. Proposition 4.5. There is at most one edge in G between vertex zj ∈ Z and component Ci for all i and j. Proof. Suppose that there exist vertices u, v ∈ Ci and a vertex zj ∈ Z (for some i and j), such that both (u, zj) and (v, zj) are edges of G. Then by Proposition 4.4, comp(G[Z − zj+u]) = k as u has no other neighbors in Z. We also have comp(G[V \ (Z−zj+u)]) = k − 1 as zj is neighboring to some other component Ci′ . This implies ca(G) ≥ 1, a contradiction. As a result of Propositions 4.3 and 4.5 we obtain that exactly two edges ofG leave each component Ci. For a fixed i let these edges be (zj1 , ui) and (zj2 , vi) (ui, vi ∈ V (Ci)). (Note that, by Proposition 4.4, ui = vi if and only if Ci has a single vertex.) Suppose that Ci has a vertex xi such thatCi−xi has a ui−vi path. In this case comp(G[Z−zj1 +xi]) = k using Propositions 4.3 and 4.4, and comp(G[V \ (Z − zj1 + xi)]) = k − 1 implying ca(G) ≥ 1, a contradiction. We conclude that component Ci is either a single vertex or a simple path connecting ui and vi. This, together with Proposition 4.3, proves the theorem. Let us recall that sc(G) ≤ 1 is a necessary condition for the existence of a Hamiltonian path. The following theorem shows a sufficient condition for traceability by means of cut- asymmetry. Its proof is postponed here, as the theorem is a direct corollary of Theorems 4.11 and 4.15. Theorem 4.6. If ca(G) ≤ 1 then G has a Hamiltonian path. Unfortunately the traceability of G does not imply a low value of ca(G) as shown by the following theorem. Theorem 4.7. If G = (V,E) is a traceable graph on n vertices then ca(G) ≤ bn−12 c. Moreover, if k is an arbitrary integer such that 0 ≤ k ≤ bn−12 c then there exists a traceable graph G on n vertices for which ca(G) = k. 86 Ars Math. Contemp. 2 (2009) 77–92 Proof. Property 3.2 implies that sc(G) ≤ 1, that is, for any proper subset X ⊂ V we have comp(G[V \X]) ≤ |X|+1. As comp(G[V \X]) ≤ n−|X|we obtain comp(G[V \X]) ≤ bn+12 c. Thus ca(G) ≤ b n−1 2 c. To see the second part let us fix k ≤ bn−12 c. A graph on n vertices whose cut- asymmetry is k is constructed as follows. If k = 0 then according to Theorem 4.2 the graph G = Kn is a good example. So we suppose that k ≥ 1. Let P = x1, x2, . . . , xn be a path of length n. Now we obtain G by adding the edges of the complete graph on x2, x4, . . . , x2k to P . It is easy to check that ca(G) = k. The following result of Salamon and Wiener determines the cut-asymmetry of a tree. Lemma 4.8. [16] If T is a tree then |V1(T )| = ca(T ) + 1. Thus for any tree T we obtain sc(T ) + 1 ≤ |V1(T )| = ca(T ) + 1 ≤ 2 sc(T ). For a connected graph G we have proved so far sc(G) + 1 ≤ min {ml(G), ca(G) + 1} . In what follows we prove that ml(G) ≤ ca(G)+1 holds for all graphs but the complete graph Kn and the cycle Cn. For this purpose we introduce a new parameter called leaf- independence. Definition 4.9. Let G be any connected graph and T be a spanning tree of G. The leaf- independence of T in G (denoted by liG(T )) is the cardinality of a maximum independent subset of V1(T ). The leaf independence li(G) of the graph G is the maximum of liG(T ) over all spanning trees of G. It is important to point out that a spanning tree T can have more than liG(T ) leaves. Indeed, liG(T ) = |V1(T )| if and only if T is an independence tree. Note that li(G) is defined for all connected graphs, however, not all connected graphs have an independence tree. Böhme et al. [3] characterized the graphs that have no independence trees. (These graphs are the complete graph, the cycle and the complete bipartite graph Kn,n.) If an independence tree with αt leaves exists then obviously li(G) ≥ αt. The following conjec- ture would imply that leaf independence is a generalization of the concept of independence trees: Conjecture 4.10. If a graph G has an independence tree then G has an independence tree with li(G) leaves. The definition of leaf-independence directly yields li(G) ≤ α(G) whenever G has at least 3 vertices. Also, the leaf-independence of a tree T (on at least 3 vertices) is equal to the number of its leaves. Thus, by Lemma 4.8, ca(T ) = li(T )− 1. The following theorem states that this equality holds for any connected graph. Theorem 4.11. Let G be a connected graph. Then ca(G) = li(G)− 1. Proof. Let T be a spanning tree of G, such that liG(T ) = li(G). Let Z be a maximum independent set of V1(T ), and let X = V \ Z. Then on one hand comp(G[V \X]) = |V \X| = li(G), G. Salamon: Vulnerability bounds on the number of spanning tree leaves 87 and on the other hand 1 ≤ comp(G[X]) ≤ comp(T [X]) = 1. Therefore ca(G) ≥ comp(G[V \X])− comp(G[X]) = li(G)− 1. To prove the reverse direction let us choose X∗ to be the set giving the maximum in Definition 4.1. If several such sets exist we take one of maximum cardinality. First we show that in this case V \ X∗ is an independent set of G. Assume that this is not the case, and there is an edge (u, v) in G[V \X∗]. Then considering X ′ = X∗ + u we have comp(G[X ′]) ≤ comp(G[X∗]), and comp(G[V \X ′]) ≥ comp(G[V \X∗]). This forms a contradiction as comp(G[V \X ′])−comp(G[X ′]) ≥ comp(G[V \X∗])−comp(G[X∗]), and |X ′| > |X∗|. In the follows we show that X∗ spans a connected subgraph of G. Suppose for a contradiction that G[X∗] has more than one components. Then, since V \ X∗ is an independent set, there must exist two components C1 and C2 of G[X∗], and a vertex u ∈ V \X∗ such that G[V (C1)∪ V (C2) + u] is connected. Hence for the independent set V \X ′ = (V \X∗)− u we have: comp(G[V \X ′]) = |V \X ′| = |V \X∗| − 1 = comp(G[V \X∗])− 1, and comp(G[X ′]) ≤ comp(G[X∗])− 1. Thus comp(G[V \X ′])− comp(G[X ′]) ≥ comp(G[V \X∗])− comp(G[X∗]), that is a contradiction as |X ′| > |X∗|. To finish the proof let T ∗ be a spanning tree of G[X∗]. We then connect each vertex of V \ X∗ to T ∗. Thus we obtain a spanning tree of G in which all vertices of V \ X∗ are leaves. Thus li(G) ≥ |V \X∗| = comp(G[V \X∗])− comp(G[X∗]) + 1 = ca(G) + 1, using that V \X∗ is independent and that X∗ spans a connected subgraph of G. In proof of Theorem 4.11 we have seen that there exists an independent setZ∗ = V \X∗ such that G[V \ Z∗] is connected and that comp(G[Z∗]) − comp(G[V \ Z∗]) = ca(G). This yields us an alternative definition of cut-asymmetry, that is Corollary 4.12. Let Z be a maximum size independent set of V for which G[V \ Z] is connected. Then ca(G) = |Z| − 1. Recall that a connected vertex cover ofG is a vertex-set that spans a connected subgraph and that meets all edges of G. Let us denote by cvc(G) the size of a minimum cardinality connected vertex cover of G. The set Z∗ = V \X∗ of the above proof is a maximum-size vertex-set that is independent and whose complementX∗ spans a connected subgraph. This means that X∗ is a minimum-size connected vertex cover of G, and proves the following: Corollary 4.13. For any connected graph G on n vertices li(G) + cvc(G) = n. 88 Ars Math. Contemp. 2 (2009) 77–92 Note that actually the above arguments also show that the complement of a minimum connected vertex cover is always a maximum size independent subset of leaves of a span- ning tree. A minimum connected vertex cover is NP-hard to compute [8]. Thus Corollary 4.13 and Theorem 4.11 imply that it is also NP-hard to compute both cut-asymmetry and leaf- independence. In what follows we show that the leaf-independence provides an upper bound on the minimum number of leaves in almost all graphs. By Lemma 3.6 we obtain that if G has no Hamiltonian path then the leaves of any minimum leaf spanning tree of G are independent. In this case ml(G) ≤ li(G) is straightforward. If G is traceable then ml(G) = 2 and so ml(G) ≤ li(G) is true if and only if G has a spanning tree with at least two independent leaves. We use a result of Böhme et al. to show that this condition is satisfied whenever G is neither the complete graph Kn nor the cycle Cn. Theorem 4.14. [3] If G is traceable and each Hamiltonian path of G is the part of a Hamiltonian cycle then G is either the complete graph Kn or the cycle Cn or the complete bipartite graph Kn,n. This theorem immediately implies that li(G) ≥ 2 if G is not isomorphic to Kn, Cn or Kn,n. It is easy to see that for these special graphs we have li(Kn) = li(Cn) = 1 and li(Kn,n) = n− 1. Thus we obtain the following theorem. Theorem 4.15. Let G be any connected graph but the complete graph Kn and the cycle Cn. Then ml(G) ≤ li(G). Summarizing the above results we have Corollary 4.16. For any connected graph G but Kn and Cn: sc(G) + 1 ≤ ml(G) ≤ li(G) = ca(G) + 1 ≤ α(G) In the next section we use these inequalities to prove approximation ratios on two NP- hard optimization problems. 5 Algorithmic Aspects In this section we consider two NP-hard optimization problems. The MAXIMUM INTER- NAL SPANNING TREE problem [16] is about finding a spanning tree of the input graph that maximizes the number of internal vertices. The MINIMUM CONNECTED VERTEX COVER problem [17] is about finding a minimum size subset of vertices that spans a con- nected subgraph and meets all edges of the graph. Both cited papers give a linear-time 2- approximation algorithm for their problem under investigation. The common point is that firstly an independence tree is found by a slight modification of depth first search. Then the authors prove that such a tree provides a 2-approximation for their problem. It turned out that this property of independence trees is a direct consequence of Corollary 4.16. Thus we can hope that the approximation ratio of 2 can be improved in the future for some spe- cial graph classes by further investigating the graph parameters present in Corollary 4.16. Currently, the best known approximation factors are 2 for the MINIMUM CONNECTED VERTEX COVER problem [17], and 7/4 for the MAXIMUM INTERNAL SPANNING TREE problem [15]. G. Salamon: Vulnerability bounds on the number of spanning tree leaves 89 We need the following proposition to show that independence trees provide 2-approxi- mation for both problems. Proposition 5.1. If G is a graph on n vertices then 2(n− α(G)) ≥ n− sc(G) Proof. Let X be an independent set of size α(G). Then sc(G) ≥ comp(G[X])− |V \X| = α(G)− (n− α(G)). Subtracting both sides from n yields the proposition. Therefore by Corollary 4.16 we have Corollary 5.2. For any graph G on n nodes but Kn and Cn: 2(n− α(G)) ≥ n− (sc(G) + 1) ≥ n−ml(G) ≥ n− li(G) ≥ n− α(G) Recall that by definition n − ml(G) is the maximum number of internal vertices in a spanning tree, and that by Corollary 4.13 n − li(G) is the cardinality of a minimum size connected vertex cover of G. Now let T be any independence tree of G. Then ml(G) ≤ |V1(T )| ≤ li(G), and Corollary 5.2 shows that n−|V1(T )| ≥ 12 (n−ml(G)) and that n−|V1(T )| ≤ 2(n−li(G)). This means that T is a 2-approximation for both MAXIMUM INTERNAL SPANNING TREE and MINIMUM CONNECTED VERTEX COVER problems. 6 Spanning Many Vertices with a q-Leaf Tree Up to now we were focusing on spanning trees with few leaves. In this section we do the contrary, we fix the number of leaves to q and examine how many vertices can be spanned by a q-leaf subtree. This approach is a generalization of finding a long path in a graph, as a path is just a 2-leaf subtree. Let δq(G) denote the minimum degree-sum of a q-element independent subset of V (G), that is δq(G) = minX {∑ v∈X d(v) : |X| = q,X is independent } . Let us recall Ore’s theorem: Theorem 6.1. [13] LetG be a graph on n vertices. If δ2(G) ≥ n thenG has a Hamiltonian path. For 2-connected graphs, Bermond proved the following generalization: Theorem 6.2. [2] Let G be a 2-connected graph on n vertices. Then there exists a path of length min {n, δ2(G)} in G. Broersma and Tuinstra used a different point of view to generalize Ore’s result. They examined the existence of q-leaf spanning trees and obtained the following: Theorem 6.3. [4] Let G be a graph on n vertices. If δ2 ≥ n − q + 1 for some integer 2 ≤ q ≤ n− 1 then G has a q-leaf spanning tree. In what follows we give a common generalization of the above results. We give a sufficient condition on the existence of a q-leaf subtree that spans many vertices. 90 Ars Math. Contemp. 2 (2009) 77–92 To formulate our statement on subtrees we use ρq,k to denote the minimum degree- sum of the k highest degree vertices of a q-element independent set (2 ≤ k ≤ q). For- mally, ρq,k = minX {∑k i=1 d(xi) : X = {x1, x2, . . . , xq} , d(x1) ≥ d(x2) ≥ · · · ≥ d(xq), X is independent } . Clearly δq(G) = ρq,q and for k ≤ q: δqq ≤ ρq,k k . Theorem 6.4. Let G = (V,E) be a connected graph and let 2 ≤ q < α(G) be an integer. Then G has a subtree with at most q leaves that spans at least min {ρq,2 + q − 1, |V |} vertices of G. Proof. Let T be a maximum cardinality subtree of G with at most q leaves. If T spans G then we are done. Otherwise let R = V (G) \ V (T ) 6= ∅. As T is maximal, none of its leaves is adjacent to R and thus for each leaf l we have dG[V (T )](l) = dG(l). Moreover T must be a minimum leaf spanning tree of G[V (T )]. Suppose that this is not the case and there exists a tree T ′ spanning V (T ) with less than q leaves. Let e be an edge between V (T ) and R. Then T ′ + e is a tree with at most q leaves that spans more vertices than T . Thus T must have all properties of a minimum leaf spanning tree. According to Lemma 3.6, the set L of leaves of T is independent in G. Let l1 and l2 be the first and the second highest degree vertices in L. For the leaf l2 we define the set S = { v : ∃u ∈ V (T ) s.t. (l2, u) ∈ E(G) \ E(T ), v = u→l2 } . By Lemma 3.7 we obtain that the vertices of S are forwarding vertices of T and so by the definition of S we have |S| = dG(l2)− 1. Moreover, also by Lemma 3.7, the vertices of S are not adjacent to l1, that is S and NG(l1) are disjoint. Therefore |V (T )| ≥ |V1(T )|+ |S|+ |NG(l1)| = q + dG(l2)− 1 + dG(l1) ≥ ρq,2(G) + q − 1, using the fact that the leaves of T form a q-size independent set. The above bound is sharp as shown by the complete bipartite graph G = (A∪B,E) = Kδ,n−δ (for any δ = δq q < n/2). To see this, let T be any non-spanning subtree of G having exactly q leaves and t = |V (T )| vertices. If the leaves (being independent in G) are all in B then |E(T )| = t− 1 = eT (A,B) = eT (A,B ∩ V≥2(T )) + q, and each internal vertex of B has at least 2 neighbors in A, so we have 2|B ∩ V≥2(T )| ≤ eT (A,B ∩ V≥2(T )) = t − q − 1. This, combined to t ≤ q + |B ∩ V≥2(T )| + δ results t ≤ 2δ + q − 1 = 2 δqq + q − 1. If the leaves are all in A then we take G′ = G − V1(T ) and a subtree T ′ = T − V1(T ). It is easy to see that T ′ has all of its q′ ≤ q leaves in B and following to the above argument we have |V (T ′)| ≤ 2(δ − q) + q′ − 1 ≤ 2δ − q − 1 and so t = |V (T ′)|+ q ≤ 2δ − 1. As a result, at most 2 δqq + q − 1 ≤ ρq,2(G) + q − 1 vertices of G can be spanned by a subtree of at most q leaves. Putting Corollary 3.8 and Theorem 6.4 together yields the following: Corollary 6.5. Given a connected graph G = (V,E) and an integer 2 ≤ q. If q ≥ α(G) or ρq,2(G) ≥ n− q + 1 then G has a spanning tree with at most q leaves. This is a generalization of the result of Broersma and Tuinstra [4] as ρq,2(G) (if defined) is an upper bound on δ2(G). G. Salamon: Vulnerability bounds on the number of spanning tree leaves 91 7 Concluding Remarks In this paper we discussed the connection between vulnerability of a graph G and the num- ber of leaves of its spanning trees. We gave a lower bound on the number of leaves by means of scattering number sc(G) and an upper bound on the number of independent spanning tree leaves by means of cut-asymmetry ca(G). These bounds are used to prove that any independence tree provides a 2-approximation for both MAXIMUM INTERNAL SPANNING TREE and MINIMUM CONNECTED VERTEX COVER problems. We also gave some new cut-asymmetry related results. We proved that the leaf-set of an arbitrary spanning tree can contain at most ca(G) + 1 independent vertices and that this bound is sharp, that is, li(G) = ca(G) + 1. Clearly any independence tree has at most li(G) leaves. However, it is an open question whether there always exists an independence tree with exactly li(G) leaves. If the answer is positive then the notion of leaf-independence is a generalization of the concept of independence trees. At last, we gave a lower bound—by means of vertex degrees of q-element independent sets—on the number of vertices spanned by a q-leaf subtree. Acknowledgments Author thanks to Gábor Wiener and to the anonymous referees for their valuable comments. References [1] D. Bauer, H. Broersma and E. Schmeichel, Toughness in Graphs – A Survey, Graphs and Combinatorics 22 (2006), 1–35. [2] J-C. Bermond, On Hamiltonian Walks, in: Proc. of the Fifth British Combinatorial Conference, 1975, 41–51. [3] T. Böhme, H. J. Broersma, F. Göbel, A. V. Kostochka and M. Stiebitz, Spanning trees with pairwise nonadjacent end-vertices, Discrete Mathematics 170 (1997), 219–222. [4] H. Broersma and H. Tuinstra, Independence Trees and Hamilton Cycles, Journal of Graph Theory 29 (1998), 227–237. [5] V. Chvátal, Tough graphs and on Hamiltonian circuits, Discrete Mathematics 5 (1973), 215– 228. [6] M. Furer and B. Raghavachari, Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree, in: Proc. of the 3rd Annual ACM-SIAM Symp. on Discrete Algorithms, 1992, 317–324. [7] M. R. Garey and D. S. Johnson, Computers and intractability: a guide to NP-completeness, Freeman, 1979. [8] M. R. Garey and D. S. Johnson, The rectilinear Steiner tree problem is NP-complete, SIAM Journal of Applied Mathematics 32 (1977), 826–834. [9] L. Gargano, P. Hell, L. Stacho and U. Vaccaro, Spanning Trees with Bounded Number of Branch Vertices, in: Proc. of ICALP’02, Springer LNCS Vol. 2380, 2002, 355–365. [10] H. A. Jung, On a class of posets and the corresponding comparability graphs, Journal on Com- binatorial Theory, Series B 24 (1978), 125–133. [11] D. Karger, R. Motwani and G. D. S. Ramkumar, On Approximating the Longest Path in a Graph, in: Proc. of WADS’93, 1993, 421–432. 92 Ars Math. Contemp. 2 (2009) 77–92 [12] H.-I. Lu and R. Ravi, The Power of Local Optimization: Approximation Algorithms for Maximum-leaf Spanning Tree (DRAFT), Technical report number CS-96-05, Department of Computer Science, Brown University, Providence, Rhode Island, 1996. [13] O. Ore, Note on Hamiltonian Circuits, American Mathematical Monthly 67 (1960), 55. [14] E. Prieto and C. Sloper, Either/or: Using vertex cover structure in designing FPT algorithms – the case of k-internal spanning tree, in: Proc. of WADS 2003, LNCS 2748, 2003, 465–483. [15] G. Salamon, Approximation Algorithms for the Maximum Internal Spanning Tree Problem, in: Proc. of MFCS 2007, LNCS 4708, 2007, 90–102. [16] G. Salamon and G. Wiener, On Finding Spanning Trees with Few Leaves, Information Pro- cessing Letters 105 (2008), 164–169. [17] C. Savage, Depth-first search and the vertex cover problem, Information Processing Letters 14 (1982), 233–235. [18] S. Zhang and Z. Wang, Scattering number in graphs, Networks 37 (2001), 102–106.