Also available at http://amc.imfm.si ISSN 1855-3966 (printed edn.), ISSN 1855-3974 (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 6 (2013) 83–88 On the minimum rainbow subgraph number of a graph Ingo Schiermeyer Institut für Diskrete Mathematik und Algebra, Technische Universität Bergakademie Freiberg, 09596 Freiberg, Germany Received 19 October 2011, accepted 9 April 2012, published online 1 June 2012 Abstract We consider the MINIMUM RAINBOW SUBGRAPH problem (MRS): Given a graph G whose edges are coloured with p colours. Find a subgraph F ⊆ G of minimum order and with p edges such that each colour occurs exactly once. This problem is NP-hard and APX-hard. For a given graph G and an edge colouring c with p colours we define the rainbow subgraph number rs(G, c) to be the order of a minimum rainbow subgraph of G with size p. In this paper we will show lower and upper bounds for the rainbow subgraph number of a graph. Keywords: Edge colouring, rainbow subgraph. Math. Subj. Class.: 05C15, 05C35 1 Introduction and motivation We use [2] for terminology and notation not defined here and consider finite and simple graphs only. Our research was motivated by the following problem from bioinformatics. The prob- lem data consist in a set G of p genotypes g1 , g2 , . . . , gp corresponding to p individuals in a population. Each genotype g is a vector with entries in {0, 1, 2}. Each position where a 2 appears is called ambiguous position. For a genotype g we have to determine a pair of haplotypes hP and hM (hP stands for the paternal haplotype and hM stands for the maternal haplotype), which are binary vectors such that g = hP ⊕ hM . Given two haplotypes h ′ and h ′′, their sum is defined as the vector g = h′ ⊕ h′′ with g[i] = 0, if h′[i] = h′′[i] = 0, g[i] = 1, if h′[i] = h′′[i] = 1 and g[i] = 2, if h′[i] 6= h′′[i]. We say that a setH of haplotypes resolves G if for every g ∈ G there exist h1, h2 ∈ H such that g = h1 ⊕ h2. Given a set G of genotypes, the haplotyping problem consists E-mail address: Ingo.Schiermeyer@tu-freiberg.de (Ingo Schiermeyer) Copyright c© 2013 DMFA Slovenije 84 Ars Math. Contemp. 6 (2013) 83–88 in finding a set H of haplotypes that resolves G. In the Pure Parsimony Haplotyping problem (PPH problem) we are interested in finding a set H of smallest possible cardi- nality. If each genotype has at most k ambiguous positions, then we denote this problem by PPH(k). The PPH problem has been studied in ([3],[4],[7],[9]). Matos Camacho et al. [8] have shown that the PPH(k) can be transformed to a graph problem, the MINIMUM RAINBOW SUBGRAPH problem (MRS). Note that this edge-colou- ring need not be proper. Definition 1.1 (Rainbow subgraph). Let G be a graph with an edge-colouring. A subgraph H of G is called rainbow subgraph if H does not contain two edges of the same colour. Definition 1.2 (Minimum Rainbow Subgraph problem (MRS)). Given a graph G, whose edges are coloured with p colours, find a subgraph F ⊆ G of minimum order and with p edges such that each colour occurs exactly once. For a set G of p genotypes g1 , g2 , . . . , gp we will use p colours 1, 2, . . . , p. For each haplotype we introduce a vertex. If two haplotypes h′ and h′′ resolve a genotype gi (gi = h′ ⊕ h′′ ), then the corresponding vertices will be joined by an edge which receives colour i. If a genotype is resolved by two identical haplotypes, then the corresponding vertex is joined by an edge which is called a loop. In this way we construct a graph G, whose edges are coloured with p colours. Note that this is a proper edge colouring (no vertex is incident with two edges of the same colour), since a haplotype h can be used at most once in a pair of haplotypes, which resolves a genotype g. Furthermore, every set H of haplotypes that resolves G corresponds to a rainbow subgraph F of G. It has been shown in [8] that a graph G containing loops can be transformed into a graph G′ without loops. Hence in the following we may assume that all graphs have no loops. Matos Camacho et al. [8] proved the MRS problem to be NP-hard and APX-hard. In [5] it has been shown that the MRS problem remains NP-hard and APX-hard even for graphs with maximum degree 2. Remark: If we do not consider edge colourings, the analogous problem is known as the (t, f(t)) dense subgraph problem ((t, f(t))-DSP), which asks whether there is a t-vertex subgraph of a given graphGwhich has at least f(t) edges. When f(t) = ( t 2 ) , (t, f(t))-DSP is equivalent to the well-known t-clique problem (cf. [1]). 2 Lower bounds for the rainbow subgraph number Definition 2.1. Let G be a graph and c be its edge colouring with p colours. The rainbow subgraph number of G (with respect to the colouring c) is defined as the order of its mini- mum rainbow subgraph of size p, and denoted by rs(G, c) (or rs(G), when the colouring c is clear from the context). Improved lower bounds for the rainbow subgraph number rs(G) will be of major importance for the design of approximation algorithms with better approximation ratios for the MRS problem (cf. [8, 5]). So far nothing better than the trivial lower bound rs(G) ≥ 2p∆(G) is known. We can improve this lower bound by counting the number of distinct colours among all edges incident to a vertex. I. Schiermeyer: On the minimum rainbow subgraph number of a graph 85 Definition 2.2. Given an edge colouring of a graph G with colours 1, 2, . . . , p, we define c(e) = i, if the edge e has colour i for 1 ≤ i ≤ p. Let cd(v) (colour degree) denote the number of distinct colours among all edges incident to the vertex v and let cd(i) = max{cd(v) | v ∈ V (G) has an incident edge with colour i} be the maximum colour degree for every colour i, 1 ≤ i ≤ p. Using the maximum colour degrees for all colours we can show the following improved lower bound. Proposition 2.3. Let G be a graph, whose edges are coloured with p colours. Then rs(G) ≥ p∑ i=1 2 cd(i) ≥ 2p ∆(G) . Proof. Let F be a minimum rainbow subgraph of order k = rs(G). Then rs(G) = k = ∑ v∈V (F ) dF (v) dF (v) = ∑ e=uw,e∈E(F ) 1 dF (u) + 1 dF (w) ≥ p∑ i=1 2 cd(i) ≥ 2p ∆(G) . The following example shows that this bound is sharp and improves the lower bound of 2p∆(G) significantly. Example 2.4. For p ≥ 4 and ∆ ≥ 2 let G = K1,∆ + Cp−1 (where G + H denotes the disjoint union of two graphsG andH). All edges of the cycle Cp−1 are coloured distinctly, say with colours 1, 2, . . . , p − 1, and all edges of K1,∆ are coloured with colour p. Then rs(G) = p+ 1 = p− 1 + 2 = ∑p i=1 2 cd(i) > 2p ∆(G) . We can further improve this lower bound by counting the number of distinct colours among all edges incident to the endvertices of an edge. For this purpose we define q(i) = min{ 1cd(u) + 1 cd(w) | uw ∈ E(G) and c(uw) = i}. Proposition 2.5. Let G be a graph, whose edges are coloured with p colours. Then rs(G) ≥ p∑ i=1 q(i) ≥ p∑ i=1 2 cd(i) ≥ 2p ∆(G) . Proof. Let F be a minimum rainbow subgraph of order k = rs(G). For every colour i, 1 ≤ i ≤ p, let uiwi be an edge such that 1cd(ui) + 1 cd(wi) = q(i). If uw ∈ E(F ) is an edge with c(uw) = i, then 1cd(u) + 1 cd(w) ≥ 1 cd(ui) + 1cd(wi) = q(i) ≥ 2 · 1 max{cd(ui),cd(wi)} ≥ 2 cd(i) . Therefore, rs(G) = k = ∑ v∈V (F ) dF (v) dF (v) = ∑ e=uw,e∈E(F ) 1 dF (u) + 1 dF (w) ≥ p∑ i=1 q(i) ≥ p∑ i=1 2 cd(i) ≥ 2p ∆(G) . 86 Ars Math. Contemp. 6 (2013) 83–88 The following example shows that this bound is sharp and improves the previous two lower bounds significantly. Example 2.6. Let G ∼= K1,p for some p ≥ 2. Let the edges of G be coloured with p colours. Then cd(i) = p and q(i) = 1 + 1p for 1 ≤ i ≤ p. Thus rs(G) = p + 1 = p · (1 + 1p ) = ∑p i=1 q(i) > 2 = p · 2 p = ∑p i=1 2 cd(i) = 2p ∆(K1,p) . 3 Upper bounds for the rainbow subgraph number First observe that the trivial upper bound rs(G) ≤ 2p is achieved if the rainbow subgraph F is a matching. This upper bound has been improved towards rs(G) ≤ 2p + 1 −∆(G) by Koch [6] for properly edge-coloured graphs and this bound is sharp. For instance, let G = K1,∆ + (p −∆)K2, where p ≥ ∆, and all edges of G are coloured distinctly. Then rs(G) = 2p+ 1−∆(G). Similar to Brooks’ Theorem (cf. [2]) we can characterize all graphs achieving this bound. Theorem 3.1. Let G be a graph with maximum degree ∆ ≥ 2, whose edges are properly coloured with p colours. If rs(G) = 2p+ 1−∆(G), then G has the following properties: 1. G contains a star K1,∆ with center vertex v0 and leaves v1, . . . , v∆ and G[N(v0)] is edgeless. Let c(v0vi) = i for 1 ≤ i ≤ ∆ and H0 ∼= G[N [v0]]. 2. If p > ∆, then let Hi be the subgraph spanned by the edges with colour i for ∆ + 1 ≤ i ≤ p. The subgraphs H∆+1, H∆+2, . . . ,Hp are pairwise vertex-disjoint and V (H0) ∩ V (Hi) = ∅ for ∆ + 1 ≤ i ≤ p. 3. E(Hi, Hj) = ∅ for ∆ + 1 ≤ i < j ≤ p (where E(Hi, Hj) is the set of all edges having one vertex in V (Hi) and the other vertex in V (Hj)). 4. E(vi, Hj) = ∅ for 1 ≤ i ≤ ∆ and ∆ + 1 ≤ j ≤ p (where E(vi, Hj) is the set of all edges incident with vi and a vertex in V (Hj)). 5. If uv ∈ E(Hi) for some ∆ + 1 ≤ i ≤ p, then N(u) ∩N(v) = ∅. 6. N(vi) ∩N(vj) = ∅ for vi ∈ V (Hi), vj ∈ V (Hj),∆ + 1 ≤ i < j ≤ p. Proof. 1. Suppose there is an edge vivj for some 1 ≤ i < j ≤ ∆. If c(vivj) = k for some k with 1 ≤ k ≤ ∆, k 6= i, j, then rs(G) ≤ (∆+1)−1+(2p−2∆) = 2p−∆ < 2p + 1 −∆, a contradiction. If c(vivj) = k for some k with ∆ + 1 ≤ k ≤ p, then rs(G) ≤ (∆ + 1) + (2p− 2∆− 2) = 2p−∆− 1 < 2p+ 1−∆, a contradiction as well. 2. Suppose there are integers i, j with ∆ + 1 ≤ i < j ≤ p and two adjacent edges e, f with c(e) = i, c(f) = j. Then rs(G) ≤ (∆ + 1) + (2p − 2∆ − 1) = 2p − ∆ < 2p + 1 − ∆, a contradiction. Suppose there are integers i, j with 1 ≤ i ≤ ∆,∆ + 1 ≤ j ≤ p and two adjacent edges e, f with c(e) = i, c(f) = j. Then rs(G) ≤ (∆ + 1) + (2p−2∆−1) = 2p−∆ < 2p+ 1−∆, a contradiction as well. 3. Suppose there is an edge vivj with vi ∈ V (Hi), vj ∈ V (Hj),∆ + 1 ≤ i < j ≤ p. Then c(vivj) = k for some 1 ≤ k ≤ ∆. Hence rs(G) ≤ (∆+1)−1+(2p−2∆) = 2p−∆ < 2p+ 1−∆, a contradiction. 4. Suppose there is an edge vivj for two vertices vi ∈ V (H0) and vj ∈ V (Hj),∆+1 ≤ j ≤ p. Then rs(G) ≤ (∆+1)+(2p−2∆−1) = 2p−∆ < 2p+1−∆, a contradiction. I. Schiermeyer: On the minimum rainbow subgraph number of a graph 87 5. Suppose there is an edge uv ∈ E(Hi) for some ∆+1 ≤ i ≤ p with N(u)∩N(v) 6= ∅. By 3. and 4. we conclude that N(u) ∩ N(v) ∩ V (H0) = ∅. Furthermore, for a vertex w ∈ N(u)∩N(v), we have c(uw) = j, c(vw) = k for some 1 ≤ j < k ≤ ∆. Then rs(G) ≤ (∆+1)−2+(2p−2∆+1) = 2p−∆ < 2p+1−∆, a contradiction. 6. SupposeN(vi)∩N(vj) 6= ∅ for two vertices vi ∈ V (Hi), vj ∈ V (Hj),∆+1 ≤ i < j ≤ p.By 3. and 4. we conclude thatN(vi)∩N(vj)∩V (H0) = ∅. Furthermore, for a vertexw ∈ N(vi)∩N(vj),we have c(uw) = k, c(vw) = l for some 1 ≤ k < l ≤ ∆. Then rs(G) ≤ (∆+1)−2+(2p−2∆+1) = 2p−∆ < 2p+1−∆, a contradiction. Another upper bound for the rainbow subgraph number follows from an approach pre- sented in [8]. Observe that two adjacent edges of different colours together have three vertices, whereas two edges of different colours in a matching have four vertices. Based on this obervation the following algorithm has been proposed in [8]. Algorithm Input: A graph G of order n whose edges are coloured with p colours 1. Construct a graph G′ with V (G′) = {v1, v2, . . . , vp} (vi corresponds to colour i) and vivj ∈ E(G′) if there exist two adjacent edges e, f ∈ E(G) with c(e) = i and c(f) = j (c(x) denotes the colour of the edge x). 2. Now compute a maximum matching M of order β(G′) in G′. This can be done in polynomial time. 3. Next construct a graph H with V (H) ⊆ V (G) as follows: For each matching edge of M choose two adjacent edges in G with these two colours. For each vertex of V (G′) not in M choose an edge in G with this colour. In this way we obtain a rainbow subgraph H ⊆ G with |E(H)| = p. Correctness of the algorithm: Edges of the matching correspond to pairs of adjacent edges in the original graph. Colours that are left out by this procedure are added greedily at the end. Claim 3.2. |V (H)| ≤ 2p− β(G′) Proof. For each matching edge of G′ three vertices appear in H. Hence |V (H)| ≤ 3β(G′) + 2(p− 2β(G′)) = 2p− β(G′) Corollary 3.3. rs(G) ≤ 2p− β(G′). Acknowledgement We thank the referees for some valuable comments. 88 Ars Math. Contemp. 6 (2013) 83–88 References [1] Y. Asahiro, R. Hassin and K. Iwama, Complexity of finding dense subgraphs, Discrete Appl. Math. 121 (2002), 15–26. [2] J. A. Bondy and U.S.R. Murty, Graph Theory, Springer, London, 2008. [3] D. Catanzaro and M. Labbé, The pure parsimony haplotyping problem: overview and computa- tional advances, International Transactions in Operational Research 16 (2009), 561–584. [4] D. Gusfield, Haplotype inference by pure Parsimony, in: Proceedings ot the 14th annual confer- ence on Combinatorial pattern matching (CPM’03), Springer, Berlin, Heidelberg, 2003, 144– 155. [5] J. Katrenič and I. Schiermeyer, Improved approximation bounds for the minimum rainbow sub- graph problem, Inf. Process. Lett. 111 (2011), 110–114. [6] M. Koch, Das Population-Haplotyping-Problem: Graphentheoretische Ansätze, Diploma thesis, TU Bergakademie Freiberg, 2008. [7] G. Lancia, M. C. Pinotti and R. Rizzi, Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms, INFORMS Journal on Computing 16 (2004), 348–359. [8] S. Matos Camacho, I. Schiermeyer and Z. Tuza. Approximation algorithms for the minimum rainbow subgraph problem, Discrete Math. 310 (2010), 2666–2670. [9] L. S. Wang, Hapolotype inference by maximum parsimony, Bioinformatics 19 (2003), 1773– 1780.