IMFM Institute of Mathematics, Physics and Mechanics Jadranska 19, 1000 Ljubljana, Slovenia Preprint series Vol. 49 (2011), 1138 ISSN 2232-2094 COVERING A BICHROMATIC POINT SET WITH TWO DISJOINT MONOCHROMATIC DISKS S. Cabello J.M. Dlaz-Banez P. Perez-Lantero Ljubljana, February 23, 2011 CO CO Covering a Bichromatic Point Set with Two Disjoint Monochromatic Disks (N S. Cabello * J.M. Diaz-Banez t P. Perez-Lantero * February 19, 2011 Abstract CO Let P be a set of n points in the plane in general position such that its elements are colored red or blue. We study the problem of finding two disjoint disks Dr and Db such that Dr covers only red points, Db covers only blue points, and the number of elements of P contained in Dr U Db is maximized. We prove that this problem can be solved in O(n11/3 polylog n) time. We also present a randomized algorithm that with high probability returns a (1 — e)-approximation to the optimal solution in O(n4/3e-6 polylog n) time. o 1 Introduction In data mining and classification problems, a natural method for analyzing data is to select prototypes representing different data classes. A standard technique for achieving this is to perform cluster analysis on the training data [DHS01, HSM01]. The clusters can be obtained by using simple geometric shapes such as disks or boxes. Aronov and Har-Peled [AHP08], Eckstein et al. [EHL+02], and Liu et al. [LN03] considered disks and axis-aligned boxes for the selection problem. Aronov and Har-Peled [AHP08] studied the following problem: Given a bicolored set of n points in the plane, find a disk that contains the maximum number of red points without containing any blue point. They propose an algorithm to solve this problem optimally in O(n2 log n) time, and also provide a (1 — ^-approximation algorithm that needs near-linear time. This type of classification is asymmetric in the way red and blue points are treated. In this paper, we consider a symmetric two-class version, where we want to find a witness set for each color. We next formalize the problem. & Let P be a set of n points in the plane such that its elements are colored red or blue. Denote by R (resp. B) the set of red (resp. blue) elements of P. We say that Y C R2 is red (resp. blue) if Y contains only red (resp. blue) elements of P, and Y is monochromatic if it is either red or blue. In this paper we study the following problem: *Department of Mathematics, FMF, University of Ljubljana, sergio.cabello@fmf.uni-lj.si. Partially supported by the Slovenian Research Agency, program P1-0297 ^Departamento Matematica Aplicada II, Universidad de Sevilla, dbanez@us.es. Partially supported by Grant MEC MTM2009-08625 CD ^Departamento Matematica Aplicada II, Universidad de Sevilla, pplantero@us.es. Partially supported u cu by Grant MAEC-AECI and MEC MTM2009-08625 The Two Disjoint Disks problem (2DD-problem): find a red disk Dr and a blue disk Db such that Dr and Db are disjoint and |Dr n R| + |Db n B| is maximized. We provide algorithms to solve the 2DD-problem optimally and approximately. It is easy to see that the 2DD-problem can be solved optimally in O(n5) time. We reduce this running time to O(n11/3 polylog n). This result is described in Section 2. We also provide a randomized approximation scheme that, with probability at least 1 — O(1/n), returns a (1 — ^-approximation to the optimal solution in O(n4/3e-6 polylog n) time. Under the assumption that at least a constant fraction of the points is covered in the optimal solution, the (1 — e)-approximation can be obtained in O(ne-13 polylog n) time. This approximation algorithm is described in Section 3. In our algorithms, we did not try to improve the exponents of e or log n. 00 I CSI 00 We next discuss variants of the 2DD-problem that have been considered previously. If in the 2DD-problem we do not restrict Dr and Db to be disjoint, then we can solve the problem considering each color separately. First, we find the disk covering the maximum number of red points and no blue point, and after that, the disk that contains the maximum number of blue points and no red point. These two problems can be solved optimally in O(n2 log n) time, or a (1 — e)-approximation can be obtained in near-linear time with high probability [AHP08]. Another variant of the 2DD-problem that permits intersection of the disks but penalizes it is the problem of finding disks Dr and Db that maximize |(Dr \ Db) n R| + | (Db \ Dr) n B|. This criterion is considered for axis-aligned boxes in [CDBPL+09]. An optimal solution for this variant can be found in O(n2) time using a key observation and known approaches. Namely, suppose that (Dr, Db) is an optimal solution, and denote by £ the radical axis of Dr and Db. Let ns and n2 be the open half-planes bounded by £ such that ((Dr \Db) nR) C ns and ((Db \ Dr) n B) C n2. Note that (Dr \ Db) n R = ns n R and (Db \ Dr) n R = n2 n B because the pair (Dr,Db) is optimal. Since both half-planes bounded by the line £ are disks with infinite radii, the problem reduces to finding a line £ such that the number of red points to one side of £ plus the number of blue points to the other side is maximized. CO This latter problem is known as the Weak Separation Problem [Cha05, ERvK96, Hou93] and can be solved in O(n2) time in the worst case [Hou93]. Moreover, it was proven in [BDBL+09] that the Weak Separation Problem is 3SUM-hard [GO95]. Another variant of the 2DD-problem is the problem of finding two unit disks Dr and Db with disjoint interiors, but not necessarily monochromatic, such that |Dr n R| + |Db n B| is maximized. Note that in this variant there are two differences with the 2DD-problem: the disks are unitary and do not need to be monochromatic. This variant was considered in [CDBS+08], where an O(n8/3 log2 n)-time algorithm is described. • i-H Notation. Given two points p and q we denote: by pq the straight line segment joining p and q, by £(p, q) the straight line containing both p and q, by bis(p, q) the line perpendicular to pq passing through the midpoint of pq (i.e. the bisector of p and q), and by D(p, q) the disk centered at p with radius equal to the length of pq. Given a region S C M2, let SS denote the boundary of S. a CD General position. We assume general position, that is, there are no four cocircular points in P, neither three collinear points. We relax the definition of our problem by u CO CD allowing the boundary of the red disk (resp. blue disk) to contain one blue point (resp. one red point). A solution to the relaxed problem induces a solution to the original one by shrinking the disks slightly. 2 An exact algorithm In this section we provide an exact O(n11/3 polylog n)-time algorithm to solve the 2DD-problem. First, we will show that we only need to consider a certain type of solutions, thus obtaining a discretization of the problem. Secondly, we will consider a decision version of the problem, where we want to decide if there exists a solution covering a prescribed number of points of each color. Finally, we will discuss how to find an optimal solution to the 2DD-problem. 00 2.1 Discretization 1 It is not hard to see that a simple discretization of our problem in which the boundary of each disk contains three points of P, or two diametrically opposed points, is not possible. In order to obtain an appropriate discretization we will use the following lemmas. Lemma 2.1 If the points p, q, o, p', q', and o' are such that o G bis(p, q), o' G bis(p', q'), and D(o,p) n D(o',p') = 0, then both o and o' can be moved simultaneously along bis(p, q) and bis(p',q'), respectively, so that at every moment D(o,p) n D(o',p') = 0, until o or o' reaches infinity. i Proof. Notice that both p' and q' are in the same half-plane bounded by £(p,q), or both p and q are in the same half-plane bounded by £(p',q'). Assume w.l.o.g. the former case. We prove now that such a movement exists so that o reaches infinity. Denote by n1 and n2 the open half-planes bounded by £(p,q) and suppose w.l.o.g. that p',q' G n1. Refer to Figure 1 a). Let ho be the half-line starting at o such that ho C bis(p, q) and ho n n2 is unbounded. Analogously define ho as the half-line starting at o' such that ho C bis(p', q') and ho n n1 is unbounded. If D(o',p') C n1 then the result follows by moving only o through ho in direction to infinity. Otherwise, let e > 0 be a small enough value and u(o') G ho be the furthest point from o such that the disk D(u(o'),p) is exterior tangent to D(o',p') (see Figure 1 b)). We first move o in direction to u(o') until the distance between o and u(o') is equal to e. After that we simultaneously move o' through ho in direction to infinity and o through ho in such a way that o stays at distance e from u(o'). We stop this simultaneous movement when D(o',p') C n1. At this moment the center o can be moved to infinity. □ CD Lemma 2.2 Let D1 be a disk with center o'. If the points p, q, and o, are such that o G bis(p, q) and D(o,p) n D1 = 0, then o can be moved along bis(p, q), in such a way that D(o,p) fl D\ = 0 at every moment, until one of the next conditions is satisfied: (i) o reaches infinity, or (ii) the segment od contains p or q. a If D1 is a half-plane, then o can be moved to the line perpendicular to 5D1 passing through p or q. U cu CO b ti u CD fc 00 CO 0 o 1 CO ^ CO CO bis(p, q) C0 CD $H CD $H a CD U bis(p, q) D(o',p') bis(p', q') D(o',p') bis(p', q') D(u(o'),p) b) Figure 1: Proof of Lemma 2.1. Proof. The set of centers o € bis(p, q) such that D(o,p) n D1 = 0 is a connected interval of bis(p, q). If this interval is unbounded, then we can move o to infinity, and condition (i) is satisfied. If the interval is bounded, then the line £(p,q) intersects D1. The extremes o1 and o2 of the interval correspond to the two points such that the disks D (o1,p) and D(o2,p) are exterior tangent to D1. Consider w.l.o.g. that p is closer to o' than q. See Figure 2 a). Denote by o3 the intersection point of the line ^(o1,o2) with £(p,o'). Since £(p,q) intersects Di, the point p lies in the segment o'os, and thus the disk D(os,p) does not intersect D\. This implies that 03 is on the segment UT02, and condition (ii) can be satisfied by moving o to o3. Consider now the case where D1 is a half-plane, and assume w.l.o.g. that p is closer to 5D1 than q. See Figure 2 b) The set of centers o € bis(p, q) such that D(o,p) n D1 = 0 is bounded, and thus the second case of the previous analysis applies. □ o 1 V- 03! .02 Di a) b) Figure 2: Proof of Lemma 2.2. a) Case when D1 is a disk. b) Case when D1 is a half-plane. Solutions (Dr,Db) such that \Dr n R\ = 1 can be easily found as follows. We find a blue disk Db that contains the maximum number of blue points by using [AHP08], and set Dr to be a degenerate disk containing a single red point. Similarly, we can treat the case \Db n B| = 1. Thus we assume from now on that |Dr n R\ > 2 and \Db n B\ > 2. p We now proceed with the discretization to the 2DD-problem. Let Dr be the family of the disks D with red interior such that 5D contains exactly three red points, or two red points and one blue point. Let Hr be the family of the closed red half-planes H such that 5H contains two red points. Let the families DB and HB be defined analogously for the blue color. CO CSI Lemma 2.3 For any feasible .solution (D'r,D'b) to the 2DD-problem, there exists another feasible solution (Dr, Db) with D'r n R C Dr n R and D'b n B C Db n B that satisfies one of the next conditions: (a) Dr e DR U Hr and Db e DB. CD (b) Dr e Dr, 5Db contains two blue points, and one blue point in 5Db is on the segment connecting the center of Dr to the the center of Db. 00 (c) Dr e Hr, 5Db contains two blue points, and one blue point in 5Db is on the line perpendicular to 5Dr passing through the center of Db. (d) the cases symmetric to (a)-(c) by exchanging colors. o Proof. We can shrink Dr into a disk D'r contained in Dr that has two red points on its boundary and satisfies Dr' n R = D'r n R. An analogous transformation can be done to obtain Db from Db. Q\ b Assume that Db' is the blue disk centered at o with p and q on its boundary, and that Dr' is the red disk centered at o' with p' and q' on its boundary. Keeping the set of points in the interior of D(o,p) and D(o',p') invariant, we move o and o' as in Lemma 2.1 until D(o,p) belongs to DB U HB or D(o',p') belongs to DR U HR. Up to symmetry, we may assume the latter case: D(o',p') e DR U Hr. We leave Di = Dr = D(o',p') unchanged for the rest of the proof. Keeping the set of points in the interior of D(o,p) unchanged, we can now move o as in Lemma 2.2. If at any moment D(o,p) belongs to Db because its boundary touches a third point we obtain a feasible solution satisfying case (a). If o reaches infinity (condition (i) in Lemma 2.2), then D(o,p) belongs to Hb, and we obtain a solution satisfying case (d) symmetric to (a). If D(o',p') is a disk and o is moved to satisfy condition (ii) in Lemma 2.2, then we obtain case (b). If D(o',p') is a half-plane and o is moved as described in Lemma 2.2, then we obtain case (c). The result follows. □ i CSI CO £ CO CO 2.2 Decision and optimization problem HH We now consider the following decision version of the problem for integers i, j, 2 < i,j < n. • Sh (i,j)-2DD-problem: find a solution (Dr,Db) to the 2DD-problem subject to the constraints |Dr n R| > i and |Db n B| > j; or report that no such solution exists. Note that, if there exists a solution to the (i* ,j*)-2DD-problem, then there is a solution to the (i, j)-2DD-problem whenever i < i* and j < j*. Also, when searching for a solution to the (i,j)-2DD-problem, it is enough to restrict the search to pairs of disks that contain exactly i red points and j blue points. Lemma 2.4 The (i, j)-2DD-problem can be solved in O(n8/3 polylogn) time. i—l o Proof. Let (i) be the set of disks of containing exactly i red points, and let HR(i) be the set of half-planes of Hr containing exactly i red points. Notice that the centers of the disks of Bft(i) are vertices of the (i — 2)th-order or the (i — 1)th-order Voronoi Diagram of R U B, while the half-planes in HR(i) correspond to some of the unbounded edges in the (i — 1)th-order Voronoi diagram. Since the kth-order Voronoi diagram has combinatorial complexity O(k(n — k)) = O(n2) in total [Aur91], the sets DR(i) and HR(i) have at most O(n2) elements. Analogous sets DB(j) and HB(j) can be defined as those covering j blue points. J-l For every blue point q, let SB(q, j) be the set of straight-line segments s such that for each point o of s the disk D(o, q) is blue, covers j blue points, and its boundary contains other blue point distinct from q. The set SB(q, j) is a subset of the edges in the (j — 1)th-order Voronoi diagram of R U B. 00 We next discuss how to construct DR(i) and HR(i) in O(n8/3) time. Using [ABMS98, CSY87], the kth-order Voronoi diagram (k = i — 2, i — 1) for n points can be constructed in O(n2+<5) = O(n8/3) time, for any 5 > 0. Within the same time bound, we can attach to each vertex (resp. edge) of the diagram a label telling how many of its k + 2 (resp. k + 1) closest neighbours are red and how many are blue. Both the vertices of the (i — 2)th-order Voronoi diagram of RUB whose i closest neighbours are red, and the vertices of the (i — 1)th-order Voronoi diagram of RU B whose i + 1 closest neighbours are all red except one that is blue, correspond to the centers of the disks DR(i). The infinite edges of the (i — 1)th-order Voronoi Diagram of R U B whose i neighbours are all red correspond to the half-planes HR(i). CD Similarly, using the (j — 2)th- and the (j — 1)th-order Voronoi diagrams of R U B we can construct the sets DB(j) and HB(j). Furthermore, the sets SB(q, j) for all q € B can also 2 be constructed using the (j — 1)th-order Voronoi diagram. In this case, we have to attach to each edge the two furthest points among its j closest neighbours. Let Vj denote the Furthest Disk Voronoi Diagram of DB(j). The Furthest Disk Voronoi Diagram is the furthest site Voronoi diagram for a set of disks. Given a point p and a disk (i.e. site) D, the distance function used is 0(p, D) = d(p, c(D)) — r(D), where d is the Euclidean distance, and c(D) and r(D) are the center and the radius of D, respectively. The Furthest Disk Voronoi Diagram for m disks can be constructed in O(m log m) time [Rap92], and thus we can construct Vj- in O(n2 log n) time. We further preprocess Vj in O(n2 log n) time in order to support O(log n)-time point location queries [Sno97]. For each blue point q and each edge e of SB(q, j), consider the double wedge obtained by the union of lines that intersect both q and e, and let A(q, e) be wedge that does not contain e. Thus A(q, e) has q as apex and the prolongation of its sides pass through the endpoints of e. (If e is infinite then a side of A(q, e) is parallel to e.) The set A(q, e) has the followring property a point x € TU2 CO the following property: a point x € R2 is in A(q, e) if and only if the ray emanating from x towards q intersects e after q. We now discuss how to decide if there exists a solution to the (i, j)-2DD-problem. We only need to restrict our search to solutions satisfying the properties listed in Lemma 2.3. a Finding if there is a solution satisfying condition (a) can be done as follows. For each disk D in Bft(i) centered at o, we make a point location query in Vj with o in order to obtain the furthest disk D' to o in DB(j). If the disks D and D' are disjoint, then (D, D') is CO 5H a solution satisfying condition (a). If they intersect, then there is no solution satisfying condition (a) for the disk D. A similar point location procedure can be used to detect if there is a solution involving a half-plane from H#(i): we query Vj with the "center" of the half-plane, which we can take to be any symbolic point far enough in the direction perpendicular to the boundary of the half-plane. We spend here O(n2 log n) time in total because we have to make O(|DR(i)| + |HR(i)|) = O(n2) point locations in Vj. Finding if there is a solution satisfying condition (b) can be done as follows. For each disk D in DR(i) centered at o, and each blue point q not in 5D we want to know if the ray emanating from o towards q crosses a segment of SB(q,j) after passing through q. If such intersection o' exists, then (D, D(o',q)) is a solution satisfying condition (b). And vice versa, each solution satisfying condition (b) corresponds to one such intersection. The intersection exists if and only if o belongs to the union IJeeSB(q j) A(q,e). We can test for this intersection over all blue points q and all disks in DR(i) together. We are thus ~ interested in deciding 00 CO does some center of some disk in D#(i) lie in A(q,e)? qeB e€SB (q , j) which is a problem of deciding if there is any incidence between a set of O(n2) points and O(n2) triangles in the plane. This problem can be solved in O((n2)4/3 polylog n) = O(n8/3 polylog n) time using machinery for simplicial range searching [Cha10,Mat93]. Finding if there is a solution satisfying condition (c) can be done similarly to condition (b), as follows. For each half-plane H of H#(i), let v(H) be the vector normal to 5H that is included in H. We want to decide if the vector v(H) is contained inside any of the wedges IJeeSB(q j) A(q,e). Again, this can be solved using range searching techniques in O(n8/3 polylog n) time [Cha10, Mat93]. (This can actually be done faster because it is a 1-dimensional range query.) The solutions satisfying condition (d) can be obtained with symmetric algorithms. □ Theorem 2.5 The 2DD-problem can be .solved in O(n11/3 polylog n) time. CO Proof. We find values for i and j so that there exists a solution to the (i, j)-2DD-problem and i + j is maximized. In fact, we find all the Pareto optimal pairs (i,j) for which the (i, j)-2DD-problem has a feasible solution. We start with i = n — 2 and j = 2. If the (i, j)-2DD-problem has a solution, the value of j is incremented in one. Otherwise, the value of i is decremented by one. This process is continued until i = 1 or j = n — 1. During this process we keep the best feasible solution found to the (i,j)-2DD-problem. The time complexity is O(n11/3 polylog n) because the decision problem of Lemma 2.4 is invoked O(n) times. □ CD V ; 3 An approximation algorithm In this section we provide a (1 — ^-approximation algorithm to the 2DD-problem whose running time is roughly O(n4/3), for any constant e. The algorithm is randomized and successful with high probability. We assume henceforth that e < 1/4. We first consider an approximate version of the (i, j)-2DD-problem, in a precise sense that will be described below. The main idea for this step is using sampling to approximately count the number CO b ti u CD fc 00 CO of points in any disk. However, care has to be taken to restrict the search to feasible solutions in the original scenario. Thus, we have to work with the original sets of points and the sample sets together. Then, we run a search for the optimal pair (i,j), trying values that are exponentially increasing. We did not attempt to reduce the number of logarithmic factors or the dependency on e. 3.1 Random samples Given a finite set A and a parameter p € [0,1], a p-sample of A is a subset of A obtained by taking each element from A with probability p, independently. We will use notation like A to denote such random samples. It is natural to use p-1\D n A\ as an estimator for \D n A\. We first describe the properties of this estimator that we will use. Lemma 3.1 Let Q be set of n points in the plane and a be a given parameter. Let Q be a pa -sample of Q, where pa = min{1, ca-1 e-2 log n} and c> 0 is an appropriate constant. With probability at least 1 — 1/n3 we have 0 o 1 CO £ CO CO CO CD $H CD CO u a CD U cu (i) for all disks D that contain at least a/4 points of Q (1 — e/2)\D n Q\ < p-1 ■ \D n Q\ < (1 + e/2)\D n Q\. (ii) for all disks D containing at most a/4 points of Q p-1 ■ \D n qQ\ < a/2. Proof. This is a standard application of Chernoff bounds [MR96] using the fact that there are at most O(n3) different sets in the family {D n Q \ D is a disk}. Similar arguments are used in [AHP08, dBCHP09]. If pa = 1 then Q = Q and all inequalities hold. Thus, we consider the case pa = ca-1e-2 log n. For each point p € Q, let Xp be the indicator random variable, that takes value 1 if the point p is in the sample Q and value 0 otherwise. Thus, for a disk D we have \d n Q\ = £ Xp peDnQ whose expected value is PD := E [\D n Q\] = £ E [Xp] = \D n Q\ ■ pa. peDnQ Since the variables Xp are independent, we can use Chernoff bounds for \D n CQ\. We distinguish the two cases of the statement: For a fixed disk D with \ D n Q\ > a/4 we have Pr p-1 ■ \D n QQ\ —\D n Q\ > (e/2) ■ \D n Q\ Pr \D n Q\— PD > (e/2) ■ PD < e2) = e-Q(|DnQ|^pa^e2) _ e-Q(|DnQ|^ca-1 lo in) =n -n (c) • For a fixed disk D with |D n Q| < a/4 we consider a disk D' containing D and exactly a/4 points of Q. We can then use the previous case for D' and obtain, with probability at least 1 — n-n(c), 00 00 o 00 £ CO CO p-1 ■ ID n Q| < p-1 ■ |D' n Q| < (1 + e/2)|D' n Q| = (1 + e/2)a/4 < a/2 . Therefore, for a fixed disk, the inequalities of the lemma are true with probability at least 1 — n-n(c). We choose the constant c so that n-n(c) < n-6. Since there are at most n3 distinct sets D n Q over all disks D, the result for all disks follows from the union bound. □ 3.2 Decision problem We now consider the approach to approximately solve the (i, j)-2DD-problem for given values i, j. Throughout this section, we assume that i and j are fixed. We construct a p^-sample R of R and a pj-sample B of B, where pa = min{1, ca-1 e-2 log n} as defined in Lemma 3.1. The samples will be used to approximately count the points contained in a pair of disks. However, we cannot just discard the sets R and B, since they are important to restrict the search to feasible solutions in the original scenario. Thus, we have to work simultaneously with the sets R, B, R, and B. Note that i ■ pi = j ■ pj = ce 2 log n is the average "target" number of points in each disk, where c is the constant in Lemma 3.1. However, there can be some error in the counting because of using the random sample. We thus set k := (1 — e/2) ■ ce-2 log n = (1 — e/2) ■ i ■ pt = (1 — e/2) ■ j ■ pj as the relevant threshold for approximately counting. Consider the following problem: ApproxDecision-2DD-problem: find two disjoint disks Dr and Db subject to Dr n B = Db n R = 0, |Dr n > k, and |Db n B?| > k; or report that no such solution exists. Like before, to solve the ApproxDecision-2DD-problem it is enough to restrict the search to pairs of disks (Dr, Db) with |Dr n RR| = |Db n B3| = k. We will use the following relation between the (i, j)-2DD-problem and the ApproxDecision-2DD-problem. Lemma 3.2 Assume that there is a solution to the (i, j)-2DD-problem. With probability at least 1 — O(1/n3): CD (i) there is a .solution to the ApproxDecision-2DD-problem; (ii) any solution to the ApproxDecision-2DD-problem is a solution to the ((1 — e)i, (1 — e) j) -2DD-problem. Proof. Assume that there is a solution (D*,D£) to the (i, j)-2DD-problem. Because of Lemma 3.1 (i) we have, with probability at least 1 — O(1/n3), |d; n R| > pi ■ (1 — e/2) ■ |d; n R| > pi ■ (1 — e/2) ■ i = k. A similar argument shows that \D* n B\ > k, and we conclude that (D*, D*) is a solution to the ApproxDecision-2DD-problem. This finishes the proof of part (i). Consider now any feasible pair of disks (Dr , Db) that is not a solution to the ((1 — e)i, (1 — ^ e)j)-2DD-problem. We will then show that, with probability at least 1 — O(1/n3), the pair (Dr,Db) is not a solution to the ApproxDecision-2DD-problem. It follows that, with high probability, only solutions to the ((1 — e)i, (1 — e)j)-2DD-problem can be solutions to the ApproxDecision-2DD-problem. £ If (Dr ,Db) is not a solution tothe((1 — e)i, (1 — e)j)-2DD-problem, then \Dr n R\ < (1 — e)i or \Db n B \ < (1 — e)j. Let us assume the former case; the other case is similar. If \Dr n R\ > i/4, then Lemma 3.1 (i) tells that, with probability at least 1 — O(1/n3), \Dr n R\ < Pi ■ (1 + e/2) ■ \Dr n R\ < Pi ■ (1 + e/2) ■ (1 — e)i fc < Pi ■ (1 — e/2)i = k. i—l If \Dr n R\ < i/4, then Lemma 3.1 (ii) tells that, with probability at least 1 — O(1/n3), \Dr n R\ < Pi ■ i/2 < k. In either case, \Dr n R\ < k with high probability, and (Dr,Db) cannot be a solution to the ApproxDecision-2DD-problem. This finishes the proof of part (ii). □ To solve the ApproxDecision-2DD-problem we will use a discretization, analogous to Lemma 2.3. Let Dr be the family of the disks D with red interior such that 5D contains exactly three red points of R, or two red points of R and one blue point of B; and let Hr be the family ofjthe closed red half-planes H such that 5H contains two red points of R. Let the families DB and HB be defined analogously with respect to the points B. Lemma 3.3 If there exists a solution (Dr, Db) to the ApproxDecision-2DD-problem, then there exists another solution (D r, Db) with D'r n R C Dr n R and Db n B C Db n B, satisfying one of the next conditions: HH (a) Dr € Dr U Hr and Db € Db . (b) Dr € DR, £Db contains two blue points of B, and one blue point in 5Db is on the segment connecting the center of Dr to the the center of Db. (c) Dr € itR, £Db contains two blue points of B, and one blue point in 5Db is on the line perpendicular to 5Dr passing through the center of Db. CD (d) the cases symmetric to (a)-(c) by exchanging colors. CD Proof. The proof is similar to the proof of Lemma 2.3, but during the transformation we enforce that the red disk Dr keeps both Dr n B and the elements of R contained in the interior of Dr invariant, while the blue disk Db keeps both Db n R and the elements of B contained in the interior of Db invariant. □ a CD , Lemma 3.4 The ApproxDecision-2DD-problem can be solved in O((k2n+(kn)4/3) polylog n) time. CO c^ Proof. The approach is very similar to the algorithm in Lemma 2.4, and we will thus skim over some of the details. Let DR(k) be the set of disks of ©R containing exactly k red points of R, and let HR(k) be the set^of half-planes of HR containing exactly k red points of R. The centers of the disks of DR(k) are vertices of (k — 2)th-order or the (k — 1)th-order Voronoi Diagram of R U B, while the half-planes in H#(k) correspond to unbounded edges in the (k — 1)th-order Voronoi diagram. Since the kth-order Voronoi diagram has combinatorial complexity O(k(n — k)) = O(kn) in total [Aur91], the sets ©R(k) and HR(k) have at most O(kn) elements. Analogous sets ©B(k) and HB(k) can be defined with respect to the blue color, and they are related to the higher-order Voronoi diagrams of R U B. U ~ For every blue point q € B, let SB(q,k) be the set of straight-line segments s such that for each point o of s the disk^D(o, q) is blue, covers k blue points of B, and its boundary contains other blue point of B distinct from q. The set of^egments SB(q, k) is a subset of the edges in the (k — 1)th-order Voronoi diagram of R U B. 00 - The sets ©R(k) and Hr(k) can be obtained from the (k — 2)th- and the (k — 1)th-order Voronoi Diagrams of R U B. Since computing the kth-order Voronoi diagram of n points can be done in O(k2n log n) time [Lee82], the sets ©R(k) and HR(k) can be obtained in O(k2n log n) time. Similarly, using the (k — 2)th-and the (k — 1)th-order Voronoi diagrams of R U B we can construct the sets DB(k) and SB(q, k) for all q € B in O(k2n log n) time. fi ~ Let Vk denote the Furthest Disk Voronoi Diagram of ©B(k). We can compute Vk and preprocess it for point location queries in O((kn) log(kn)) = O(kn log n) time [Rap92, Sno97], so that any point location in Vk can be answered in O(log n) time. For each blue point q and each edge e of SB(q,k), consider the double wedge obtained by the union of lines that intersect q and e, and let A(q, e) be wedge that does not contain e. We now discuss how to decide if there exists a solution with the properties listed in CO Lemma 2.3 that covers k points of R and k points of B. Finding if there is a solution satisfying condition (a) reduces to point location queries in Vk with the centers of disks from ©R(k) and far enough points representing the "centers" of half-planes from HR(k). We spend here O(knlogn)time in total because we have to make O(|DR(k)| + |HR(k)|) = O(kn) point locations in V"k. CSI CSI CO CD ■ i J-H CD W Finding if there is a solution satisfying condition (b) reduces to deciding does some center of some disk in ©R(k) lie in A(q, e)? qgB eeSs (q , k) This is a problem of deciding if there is any incidence between a set of O(kn) points and a set of O(kn) triangles in the plane, which can be solved in O((kn)4/3 polylog(nk)) = O((kn)4/3 polylog n) time using machinery for simplicial range searching [Cha10, Mat93]. Finding if there is a solution satisfying condition (c) can be done similarly to condition (b). The solutions satisfying condition (d) can be obtained with symmetric algorithms. The algorithm spends O(k2n log n) + O((kn)4/3 polylog n) = O((k2n + (kn)4/3) polylog n) time in total. □ Lemma 3.5 Let co be a constant, and assume that i, j > en/co. Then the ApproxDecision-2DD-problem can be solved in O(k4e-3n polylog n) time with high probability. CD Proof. We reuse the notation and approach in the proof of Lemma 3.4. In this case, with cu high probability R and B have O((n/i)e 2 log n) = O(k/e) points. Since each disk of IDR(k) can contain only red points in its interior, the set IIR(k) contains O(|R|3) ^disks, which is bounded by O((k/e)3) with high probability. Similarly, IDB(k), HR(k), and HB(k) contain O((k/e)3) disks or half-planes with high probability. CO CSI All the steps in the proof of Lemma 3.4 take O(kn log n) time but for the step where we have to decide does some center of some disk in IDR(k) lie in A(q, e)? qeB eeSs (q,k) ti In this scenario, the problem is that of deciding if there is an incidence between a set of O((k/e)3) points and O(kn) triangles. This problem can be solved trivially in O(k4e-3n) time by checking each point against each triangle. The result follows. □ CD fc 00 CO £ CO CO CO We can summarize the results of this section with the following Lemma. Lemma 3.6 There is a randomized algorithm RandAlg that in time O(n4/3e-4 polylog n) returns a feasible solution and has the following property: if there is a .solution to the (i, j) -2DD-problem, with probability at least 1 — O(1/n3) it returns a solution to the ((1 — e)i, (1 — e)j)-2DD-problem. If co is a constant and i, j > en/co then the algorithm RandAlg takes O(e-11 n polylog n) time with high probability & Proof. We construct the p*-sample R or R and the pj-sample B of B, and set k = (1 — e/2)ce 2 log n. We then solve the ApproxDecision-2DD-problem using Lemma 3.4, which takes O((k2n + (kn)4/3) polylogn) = O(((e-2 logn)2n + n4/3(e-2 logn)4/3) polylogn) = O(n4/3e-4 polylog n) time. If we find a solution (Dr ) to the ApproxDecision-2DD-problem, then we return that solution. Otherwise, we return an arbitrary pair of feasible disks. The properties of the algorithm follow from Lemma 3.2. When i,j > en/co, we just use the algorithm from Lemma 3.5 instead of Lemma 3.4, which takes O(k4e-3n polylog n) = O(e-11n polylog n) time because k = O(e-2 log n). □ 3.3 Algorithm to approximate the 2DD-problem. Theorem 3.7 There is a randomized algorithm that computes a (1 — e)-approximation to the 2DD-problem in O(n4/3e-6 polylog n) time. The probability of error is at most O(1/n). ^ Proof. Consider the set C0 T = {[(1+ e)s] | s = 0,1,2,3,...,logi+£n} . Ji -1 Note that T has O(log1+e n) = O(e 1 log n) elements because e-1 ln n e-1 ln n ln(1 + e) 1/(1 + e) lim --= lim ---—--- = lim-= lim- = 1. e^o log1+e n e^o ln n/ ln(1 + e) e^o e e^o r 1 a For any integer i, define ^(i) = (1 + e)Llogl+£ %i , and note that ^(i) € T. We have that ^(i) = (1+ e) Llogi+£ ii < (1+ e)logi+£ * = [i] = i u 00 00 I 00 £ CO CO i 1 + t > —> (1 - e)i. - l + e ' and i—I n ^(i) = (1 + e)Llogi+- iJ > (1+ e)(logi+- i)-1 ^ We thus conclude that (1 - e)i < ^(i) < i. C^ We can now see that among the pairs (i, j) € T2 for which there exists a solution to the (i, j)-2DD-problem, one maximizing i + j is a (1 — ^-approximation to the 2DD-problem. Indeed, if (i* ,j*) is an optimal solution, then there exists a solution to the (0(i*),^(j*))-2DD-problem because ^(i*) < i and ^(j*) < j. Furthermore, a solution to the (0(i* ),^(j* ))-2DD-problem covers ^(i*) + ^(j *) > (1 — e)(i* + j *) points, and is thus a (1 — e)-approximation. Finally, note that (^(i*),^(j*)) € T2. For each pair (i,j) € T2 we use the algorithm RandAlg of Lemma 3.6 to find a feasible solution. Among all the returned solutions, we select the best one by counting the points covered. Let us assume first that there are no errors in all our calls to RandAlg in Lemma 3.6. When we call RandAlg for the value (^(i*),^(j*)), we obtain a solution to the ((1 — e)^(i*), (1 — e)^(j*))-2DD-problem, and thus a solution to the ((1 — e)2i*, (1 — e)2 j * )-2DD-problem. Thus, the best solution found among the O(|T |2) calls to RandAlg is a (1 — e)2-approximation to the optimal solution. Since (1 — e)2 > 1 — 2e, we can achieve a (1 — e')-approximation by taking e = e'/2. 1 To analyze the running time of the algorithm, note that it makes |T|2 = O(e-2 log2 n) calls to the algorithm RandAlg, and thus takes O(e-2 log2 n ■ (n4/3e-4 polylog n)) = O(n4/3e-6 polylog n) time. Selecting the best solution among the |T|2 feasible solutions can be done in O(|T|2n) = O(ne-2 polylogn) time. Since each call to RandAlg (Lemma 3.6) has a probability of error bounded by 1/n3 and |T2| = O(n2) it follows from the union bound that all outputs of RandAlg are correct2 with probability at least 1 — O(1/n). Thus, the algorithm finds a (1 — e)-approximation with probability at least 1 — O(1/n). □ For the special case where the optimal solution covers several points, we can obtain a (1 — e)-approximation in near-linear time. Corollary 3.8 Let c0 be a constant. If the optimal solution to the 2DD-problem covers at least n/c0 points, then we can compute a (1 — e)-approximation to the 2DD-problem in O(ne-13 polylog n) time. The probability of error is at most O(1/n) and the running time of the algorithm is with high probability. Proof. We use the same algorithm as in Theorem 3.7, but only restrict attention to values (i,j) € T2 with i,j > en/c0. Since we can use the second case in Lemma 3.6, each call to RandAlg takes O(e-11 n polylogn) time with high probability. We make |T|2 = O(e-2 log2 n) calls to the algorithm RandAlg, and thus spend a total of O(e-2 log2 n ■ e-11n polylog n) = O(e-13n polylog n) time. □ CD 1We could run a more careful search in T as it was done in Theorem 2.5. However, this would only decrease the running time by a factor of |T| = O(e-1 log n). Also, we could reduce the number of calls to RandAlg by computing first a simple 2-approximation algorithm: choose the best among the disks covering only blue points and the disks covering only red points. If the returned solution covers A points, only pairs (i,j) £ |T|2 with A < i + j < 2A are relevant. However, with the approach that we describe, we obtain (1 — ^-approximations for both i* and j*. This is convenient in several scenarios, for example, when maximizing min{|Db n B|Dr n R|}. 2In fact, the algorithm returns a (1 — e)-approximation unless the call to RandAlg with the value (^(i*),^(j*)) fails. However, with this approach we (1 — e)-approximate both i* and j*. 4 Conclusions O CSI Given a bicolored point set P = R U B, in this paper we have studied the problem of finding two disjoint disks Dr and Db such that Dr (resp. Db) covers only red (resp. blue) points and the number of elements of P contained in Dr U Db is maximized. We gave an exact algorithm running in O(n11/3 polylog n) time based on solving instances of the (i, j)-2DD-problem, which asks for the same disks Dr and Db subject to |Dr n R| > i and |Db n B| > j. The (i, j)-2DD-problem was solved in O(n8/3 polylogn) time for any given values of i and j. Using random samples of R and B to estimate |DnR| and |DnBwhere D is any disk, we presented a randomized (1 — ^-approximation algorithm running in O(n4/3e-6 polylog n) time, where the probability of error is O(1/n). A better, near-linear time algorithm was obtained when the solution covers a constant fraction of the input points. CD fc O It is worth noticing that if we want to solve the max-min version of the problem, that 00 is, to maximize the minimum between |Dr n R| and |Db n B|, then we can consider |Dr n R| = |DbnB|. Therefore, the problem reduces to finding a maximum value of i such that the (i, i)-2DD-problem has solution. This can be done in O(log n ■ n8/3 polylog n) = O(n8/3 polylog n) time using a binary search. Additionally, a (1 — ^-approximation algorithm can also be obtained. Indeed, we can run RandAlg of Lemma 3.6 only for the pairs (i, i) € T, where T is the set in the proof of Theorem 3.7. Since |T| = O(e-1 log n), a randomized (1 — ^-approximation algorithm running in O(e-1 log n ■ n4/3e-4 polylog n) = O(n4/3e-5 polylogn) time is obtained. o c^ References C^ [ABMS98] P. K. Agarwal, M. de Berg, J. Matousek, and O. Schwarzkopf. Constructing levels in arrangements and higher order voronoi diagrams. SIAM J. Comput., 27:654-667, 1998. [AHP08] B. Aronov and S. Har-Peled. On approximating the depth and related problems. SIAM J. Comput., 38:899-921, 2008. [Aur91] F. Aurenhammer. Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Computing Surveys, 23(3):345-405, 1991. [BDBL+09] S. Bereg, J. M. Diaz-Banez, D. Lara, P. Perez-Lantero, C. Seara, and J. Urrutia. Bichromatic discrepancy via convex partitions. In XIII Encuentros de Geometrta Computational, pages 259-265, Zaragoza, Spain, 2009. [CDBPL+09] C. Cortes, J. M. Diaz-Banez, P. Perez-Lantero, C. Seara, J. Urrutia, and I. Ventura. Bichromatic separability with two boxes: A general approach. J. Algorithms, 64:7988, 2009. CD $H CD m [CDBS+08] S. Cabello, J. M. Diaz-Banez, C. Seara, J. A. Sellares, J. Urrutia, and I. Ventura. Covering point sets with two disjoint disks or squares. Comput. Geom. Theory Appl., 40:195-206, 2008. [Cha05] T. M. Chan. Low-dimensional linear programming with violations. SIAM J. Comput., 34:879-893, 2005. • I [Cha10] T. M. Chan. Optimal partition trees. In Proc. of the 26th Annual Symposium on a Computational Geometry, SoCG'10, pages 1-10, 2010. CD [CSY87] R. Cole, M. Sharir, and C. K. Yap. On k-hulls and related problems. SIAM J. Comput., 16:61-77, 1987. CO b u CD fc 00 CO 0 o 1 CO ^ CO CO [dBCHP09] M. de Berg, S. Cabello, and S. Har-Peled. Covering many or few points with unit disks. Theory Comput. Syst., 45(3):446-469, 2009. [DHS01] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd Edition). Wiley-Interscience, 2 edition, November 2001. [EHL+02] J. Eckstein, P. L. Hammer, Y. Liu, M. Nediak, and B. Simeone. The maximum box problem and its applications to data analysis. Comput. Optim. Appl., 23:285-298, 2002. [ERvK96] H. Everett, J-M. Robert, and M. van Kreveld. An optimal algorithm for the (< k)-levels, with applications to separation and transversal problems. Int. J. Comput. Geometry Appl., 6(3):247-261, 1996. [GO95] A. Gajentaan and M. H. Overmars. On a class of O(n2) problems in computational geometry. Comput. Geom., 5:165-185, 1995. [Hou93] M. E. Houle. Algorithms for weak and wide separation of sets. Discrete Applied Mathematics, 45(2):139-159, 1993. [HSM01] D. J. Hand, P. Smyth, and H. Mannila. Principles of data mining. MIT Press, Cambridge, MA, USA, 2001. [Lee82] D-T. Lee. On k-nearest neighbor voronoi diagrams in the plane. IEEE Trans. Computers, 31(6):478-487, 1982. [LN03] Y. Liu and M. Nediak. Planar case of the maximum box and related problems. In Proc. of the Canadian Conference on Computational Geometry, pages 11-13, August 2003. [Mat93] J. Matousek. Range searching with efficient hierarchical cuttings. Discrete and Computational Geometry, 10:157-182, 1993. [MR96] R. Motwani and P. Raghavan. Randomized algorithms. ACM Comput. Surv., 28:33- 37, 1996. [Rap92] D. Rappaport. A convex hull algorithm for discs, and applications. Comput. Geom. Theory Appl., 1:171-187, 1992. [Sno97] J. Snoeyink. Point location. In Jacob E. Goodman and Joseph O'Rourke, editors, Handbook of discrete and computational geometry, pages 559-574. CRC Press, Inc., Boca Raton, FL, USA, 1997. m CD $H CD m u a CD U