¿^creative , ars mathematica ^commons contemporánea Also available at http://amc-journal.eu ISSN 1855-3966 (printed edn.), ISSN 1855-3974 (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 10 (2016) 1-8 Forbidden configurations: Finding the number predicted by the Anstee-Sali conjecture is NP-hard Miguel Raggi * Centro de Ciencias Matemáticas and Escuela Nacional de Estudios Superiores Universidad Nacional Autónoma de México Received 31 January 2013, accepted 23 June 2014, published online 24 January 2015 Abstract Let F be a (possibly non-simple) hypergraph and let forb(m, F) denote the maximum number of edges a simple hypergraph with m vertices can have if it doesn't contain F as a subhypergraph. A conjecture of Anstee and Sali predicts the asymptotic behaviour of forb(m, F) for fixed F. In this paper we prove that even finding this predicted asymptotic behaviour is an NP-hard problem, meaning that if the Anstee-Sali conjecture were true, finding the asymptotics of forb(m, F) would be NP-hard. Keywords: Forbidden configuration, hypergraph, trace, NP-hard, NP-complete, Anstee-Sali conjecture. Math. Subj. Class.: 05D05, 68R01 1 Introduction This paper considers an extremal problem in hypergraph theory that results as a natural generalization of Turan's famous problem. Some of the most celebrated extremal results are those of Erdos, Stone and Simonovits ([11], [10]). They consider the following problem: Given m G N and a graph F, find the maximum number of edges in a graph on m vertices that avoids having a subgraph isomorphic to F. There are a number of ways to generalize this to hypergraphs. A k-uniform hypergraph is one in which each edge has size k. Some view k-uniform hypergraphs as the most natural generalization of a graph (a graph is a 2-uniform hypergraph) and one might also generalize * Research supported by DGAPA Postdoctoral scholarship, UNAM. E-mail address: mraggi@gmail.com (Miguel Raggi) ©® This work is licensed under http://creativecommons.org/licenses/by/3.0/ 2 Ars Math. Contemp. 10 (2016) 31-44 the forbidden subgraph problem to a forbidden k-uniform subhypergraph problem. There are both asymptotic and exact results (e.g. [9], [13], [12]). Forbidden Configurations is a different (but also natural) generalization that is studied mainly by Richard Anstee and his colaborators. We consider the following problem: Given m G N and a hypergraph F, find the maximum number of edges in a simple hypergraph (i.e. no repeated edges) H on m vertices that avoids having a subhypergraph isomorphic to F. Surveys on the topic can be found in [1] and [14]. We find it convenient to use the language of matrices to describe hypergraphs: Each column of a {0,1}-matrix can be viewed as an incidence vector on the set of rows. Definition 1.1. If a is a column and A a matrix, define A(a, A) to be the multiplicity of a in A. Define a matrix to be simple if it is a {0,1}-matrix with no repeated columns (that is, if A(a, A) < 1 for every column a). Note that an m x n simple matrix corresponds to a simple hypergraph (or set system) on m vertices with n distinct edges, where we allow the "empty edge". Definition 1.2. When A is a {0,1}-matrix, we denote by || A|| the number of columns in A (which is the cardinality of the associated set system). Definition 1.3. Let A and B be {0,1}-matrices with the same number of rows. Define the concatenation [A|B] to be the configuration that results from taking all columns of A together with all columns of B. For t G N, we define the product t ■ A := [ A | A | ••• | A ]. v-V-' t times Our objects of study are {0,1}-matrices with row and column order information stripped from them. Definition 1.4. Two {0,1}-matrices are said to be equivalent if one is a row and column permutation of the other. An equivalence class is called a configuration. Abusing notation, we will commonly use matrices and their corresponding configurations interchangeably. Definition 1.5. For a configuration F and a {0,1}-matrix A (or a configuration A), we say that F is a subconfiguration of A, and write F -< A if there is a representative of F which is a submatrix of A. We say A has no configuration F (or doesn't contain F as a configuration) if F is not a subconfiguration of A. Let Avoid(m, F) denote the set of all simple matrices on m-rows with no subconfiguration F. Our main extremal problem is to compute forb(m, F) = max{||A|| : A G Avoid(m, F)}. Perhaps some examples are useful: forb m m +1, since we can take all columns with at most one 1. M. Raggi: Forbidden configurations: Finding the number predicted by the Anstee-Sali... 3 forb m, = 2, since we may only take the column of 1's and the column of 0's (i.e. the empty set and the complete set). "1 1" forb 1 1 (?) + (?) + (?) > by taking all columns with at most two 1's. The proof that this is indeed the maximum is easy and can be found in [14]. Let Ac denote the {0,1}-complement of A (replace every 0 in A by a 1 and every 1 by a 0). Note that forb(m, F) = forb(m, Fc). Remark 1.6. Let F and G be configurations such that F -< G. Then forb(m, F) < forb(m, G). We say a column a has column sum t if it has exactly t ones. Let 0m denote the column with m rows, all of them zeros. Similarly, let 1m denote the column of m ones. For a set of rows S, we let A|S denote the submatrix of A given by restricting the rows of A to only those in S. An important general result due to Fiiredi applies to simple and to non-simple configurations. Theorem 1.7 (Z. Furedi). Let F be a given k-rowed {0,1}-matrix. Then forb(m, F) is in O(mk). We desire more accurate asymptotic bounds. Anstee and Sali conjectured that the best asymptotic bounds can be achieved with certain product constructions. Definition 1.8. Let A and B be {0,1}-matrices. We define the product A x B by taking each column of A and putting it on top of every column of B. Here is an example of a product: A Note that this is a well defined operation in configurations. We are interested in asymptotic bounds for forb(m, F). Let Im be the m x m identity matrix, be the {0,1}-complement of Im (all ones except for the diagonal) and let Tm be the tower matrix: a matrix corresponding to a maximum chain in the partially ordered set of the power set of the vertices. For example, A B 0 0 1 1 1 1 0 1 1" , B = 1 0" 0 0 0 0 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 T4 01111 0 0 111 00011 00001 Anstee and Sali conjectured that the asymptotically "best" constructions avoiding a single configuration would be products of I, Ic and T. 4 Ars Math. Contemp. 10 (2016) 31-44 Definition 1.9. Let F be a configuration. Let Pr(a, b, c) := Ir x ... x Ir x IC x ... x Z£ x 7T x ... x 7T, V-V-' "-v-' V-v-' a times b times c times Define X(F) to be the largest number such that there exist numbers a, b, c G N with a + b + c = X(F) such that for all r G N, F ^ Pr(a, b, c). Conjecture 1.10. [8] Let F be a configuration. Then forb(m, F) is in ©(mX(F)). Observe that X (F ) is always an integer. Also note that ||Pr (a, b, c)|| = ra+b •(r +1)c G ©(rX(F )), so by taking r = [m/X (F )] (and perhaps deleting a constant number of rows and therefore columns in case X(F) \ m), we have that ||Pr (a, b, c)|| G Q(mX(F)). So the fact that forb(m, F) G Q(mX(F) ) is built into the conjecture. Thus, in order to prove the conjecture, all that would be required would be to prove that forb(m, F) G O(mX(F)) for every F. A disproof could be potentially easier, as only a counterexample would be required. The conjecture has been proven for all k x I configurations F with k G {1,2,3} and many others cases in various papers. The proofs for k = 2 are in [4], for k = 3 in [4], [2], [8]. For I = 2, the conjecture was verified in [5]. For k = 4, all cases either when the conjecture predicts a cubic bound for F or when F is simple were completed in [3]. For k = 4 and F non-simple, there are three boundary cases with quadratic bounds, one of which is established in [6]. For k G {5,6} some results can be found in [7]. Anstee has long conjectured that even finding X(F) given F was not a trivial task, and the specific question of its NP-hardness was conjectured in [1], [14]. In this paper we settle this question: finding X is indeed NP-hard. We also note that one of the decision versions associated with this optimization problem is NP-complete, adding this function to the long list of functions known to be NP-complete, with the interesting plus that this function is conjectured to give the exponent of the asymptotic growth of forb. For relatively small configurations F we have a computer program that yields the answer (relatively) quickly. The source code (in C++) can be freely downloaded from: http://matmor.unam.mx/~mraggi/ This program can compute X (F ) for F having less than ~ 10 rows in just a few minutes. This task takes merely exponential time, not doubly exponential (as it is often the case with forbidden configuration problems). This program was written to perform many tasks other than finding X(F). A description of the algorithms used can be found in [14]. 2 Results There are two natural decision problems associated with X(F): Given F and k as inputs, 1. Is it true that X(F) < k? 2. Is it true that X(F) > k? We prove that the first of the two decision problems is in NP by exhibiting a certificate which can be checked in polynomial time. The main result of this paper is the following: M. Raggi: Forbidden configurations: Finding the number predicted by the Anstee-Sali... 5 Theorem 2.1. Finding X (F) is an NP-hardproblem. In other words, should a polynomial-time algorithm exist for finding X (F ) given F, then every problem in NP could be solved in polynomial time. Furthermore, the problem "given F and k, is X(F) < k?" is in NP. Before proving this theorem, we need a few results. Proposition 2.2. Let F be a configuration with n rows. Then X(F) < n. Proof. Indeed, assume for the sake of contradiction that a, b, and c are such that a + b+c = n +1 and F ^ Pr(a, b, c). Every configuration of Pr(a, b, c) contains [01], therefore any {0,1}-column can be formed with the first n configurations of Pr(a, b, c). The n + 1-th matrix ensures the columns of F with high multiplicity get repeated as many times as needed. □ This observation in particular implies that if a polynomial time algorithm existed for any of the two decision versions of the problem, then we'd have a polynomial time algorithm for finding X(F), which together with Theorem 2.1 would make the decision version of finding X(F) an NP-complete problem. A simple (but surprising) corollary of Conjecture 1.10, if it were true, would be that repeating columns more than twice in F has no effect on the asymptotic behavior of forb(m, F). In other words, assuming the conjecture were true, the multiplicity of a column in a configuration would not affect the asymptotic bound and, asymptotically, it would only matter if a column is not there (has multiplicity 0), appears once (has multiplicity 1), or appears "multiple times" (has multiplicity 2 or more). Formally, Proposition 2.3. Let Ft = [G|t • H] with G and H simple {0,1}-matrices that have no columns in common. Then X(F2) = X(Ft) for all t > 2. In particular, if the conjecture were true, then forb(m, Ft) and forb(m, F2) would have the same asymptotic behavior. Proof. It suffices to show that given t, G, H, a, b and c, there exists an R such that for every r > R, we have F2 = [G|2 • H] ^ Pr(a, b, c) ^ Ft = [G|t • H] ^ Pr(a,b,c). Since F2 -< Ft, we only need to prove that if F2 -< Pr (a, b, c) for some r, then Ft -< PR (a, b, c) for some R. Suppose then F2 is contained in the product Pr (a, b, c) for some r. The idea is to find a subconfiguration of Pr (a, b, c) in which there are some columns with multiplicity 1, and for the columns with multiplicity 2 or more, the multiplicity depends on r, and goes to infinity as r goes to infinity. Then we may take r to be large enough so that the multiplicity of any one column (with multiplicity of 2 or more) is larger than t. Let x be the number of rows of Ft. Notice the following three facts, which include definitions for Ex, EXc and Ej. Ei(x, r) := [(r - x) • 0X | lx] -< I, Eic (x, r) := [(r - x) • 1x | IX] ^ I ET(x,r) := fx Tr . The first and second facts are easy to see; just take any subset of x rows from Ir or The third statement is true by taking the |_r/xj -th row of Tr, the 2 |_r/xj -th row of Tr, etc., 6 Ars Math. Contemp. 10 (2016) 31-44 up to the x|_r/xj -th row. For example, if r = 5 and x = 2, we may take the second and fourth row from T5: 75 = 0 1111 0 0 111 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 T l{2,4} — 001111 0 0 0 0 1 1 — Et (2,5) Note that in the three configurations Ex(x, r), EXc(x, r) and E—(x, r), we have that there are some columns of multiplicity 1 and there are some columns for which their multiplicity can be made as large as we wish by making r large. Formally, let E(x, r) be one of Ex(x, r) or EXc (x, r) or E— (x, r). We have that for every x-rowed column a there are three possibilities: either A(a, E(x, r)) = 0 for all r, or A(a, E(x, r)) = 1 for all r, or lim A(a, E(x, r)) = to. r^w If a is a column for which lim A(a, E(x, r)) = to, we may conclude that there is an R for which A(a, E(x, r)) > t for every r > R. Since F2 is contained in Pr (a, b, c) for some r, the columns in H will have multiplicity at least 2 in some subset of the rows of Pr (a, b, c). We see that Ft is also a subconfiguration of Pfl(a,b,c). □ 3 Proof of the Main Theorem We now prove the main theorem 2.1. Proof. First we prove that the decision problem has a certificate which can be checked in polynomial time. A certificate that indeed X(F) < k would have to be a proof that F -< Pr (a, b, c) for each triple a, 6, c for which a + b + c — k. Note that there are at most a quadratic (with respect to the number of rows) number of a, b, c's which satisfy the equation, since the question has a trivial "yes" answer when k is more than the number of rows (Proposition 2.2). Given F and A configurations, one can easily construct a certificate that a configuration F is indeed a subconfiguration of a configuration A: explicitly state which permutation of F appears in exactly which rows and columns of A. For the case A — Pr (a, b, c), a certificate only needs to specify which rows of F go inside which factors, so at most a quadratic number of these certificates-of-being-a-subconfiguration suffice. We now prove that finding X(F) is NP-hard. Suppose there existed some polynomial-time algorithm that finds X (F) given F. We shall prove that there would then exist a polynomial time algorithm for GRAPH COLORING. Suppose we are given a graph G and we wish to find the minimum number of colors for which there exists a good coloring of the graph. We may assume that no isolated vertices exist. The idea is to construct a 3-part matrix F (G) in which the first two parts ensure there is no T or Ic in a maximum product of the form Pr (a, b, c) with no subconfiguration F (G), and the last part is constructed so that a partition into I 's produces a partition of the vertices of the graph into independent sets and vice-versa. Suppose G has n vertices and e edges. Let M be a large number with M > n + 2 and let S be the incidence matrix of G (i.e., the edges of G are encoded as columns with two M. Raggi: Forbidden configurations: Finding the number predicted by the Anstee-Sali... 7 1's corresponding to the vertices that belong to the edge). Construct the following simple matrix: 1 1M I'M F (G) := 1 Tm S 0 Clearly we can construct F(G) in polynomial time (with respect to the number of vertices of G). We prove now that we have x(G) = X(F) - 2M + 1, which in turn would yield a polynomial time algorithm for GRAPH COLORING, provided we had a polynomial time algorithm for finding X(F). Now, let us study the possibilities for a product of type Pr (a, b, c) that does not have F(G) as a subconfiguration for any r. If b = 0, then we could place all of [11] in the 1c part of Pr (a, b, c), so a + b + c would be at most 1 + M + n (using Proposition 2.2). The same conclusion holds when c = 0. But if we let b = c = 0, Pr (a, 0,0) is just a product of 1's, so let us calculate how many 1's we can multiply together and still not create a subconfiguration F (G). In order for F(G) to be a part of a product of 1's, every row of [1] and [1 |7M] must be in a separate factor 1, since there is no 11 in 1 (and also separate from the rows of [S|0], since we are assuming G has no isolated vertices). Then two rows of the [S|0] part can be in the same 1 if and only if there is no in those two rows, which, in terms of the graph, means there is no edge between those two vertices. On the other hand, a product of x(G) identity matrices contains the incidence matrix of a complete x(G)-multipartite graph. In other words, partitioning [S|0] into 1 's is equivalent to partitioning the vertices of G into independent sets. So if the graph G cannot be colored with x(G) - 1 colors and this is the maximum, this means that X( F) = a = 2M + x(G) - 1 > n + M + 1. Then x(G) = X(F(G)) - 2M + 1. □ Acknowledgements The author would like to thank the anonymous referees for their invaluable comments that doubtlessly improved the presentation of the paper. References [1] R. Anstee, A survey of forbidden configuration results, http://www.combinatorics. org/ojs/index.php/eljc/article/view/DS2 0. [2] R. Anstee, R. Ferguson and A. Sali, Small forbidden configurations II, Electronic Journal of Combinatorics 8 (2001), R4 25pp. [3] R. Anstee and B. Fleming, Two refinements of the bound of Sauer, Perles and Shelah and Vapnik and Chervonenkis, Discrete Mathematics 310 (2010), 3318-3323. [4] R. Anstee, J. Griggs and A. Sali, Small forbidden configurations, Graphs and Combinatorics 13 (1997), 97-118. [5] R. Anstee and P. Keevash, Pairwise intersections and forbidden configurations, European Journal of Combinatorics 27 (2006), 1235-1248. [6] R. Anstee, M. Raggi and A. Sali, Evidence for a forbidden configuration conjecture: One more case solved, Discrete Mathematics 312 (2012), 2720 - 2729, doi:10.1016/j.disc.2012.02.020. [7] R. Anstee, M. Raggi and A. Sali, Forbidden configurations: Boundary cases, European Journal of Combinatorics (2014), 51. 8 Ars Math. Contemp. 10 (2016) 31-44 [8] R. Anstee and A. Sali, Small forbidden configurations IV, Combinatorica 25 (2005), 503-518. [9] D. de Caen and Z. FUredi, The maximum size of 3-uniform hypergraphs not containing a Fano plane, Journal of Combinatorial Theory, Series B 78 (2000), 274-276. [10] P. ErdCs and M. Simonovits, A limit theorem in graph theory., Studia Scientiarum Mathemati-carum Hungarica 1 (1966), 51-57. [11] P. Erdos and A. Stone, On the structure of linear graphs, Bulletin of the American Mathematical Society 52 (1946), 1089-1091. [12] Z. Furedi, Turan type problems, Surveys in Combinatorics (Proc. of the 13th British Combinatorial Conference), ed. A.D. Keedwell, Cambridge Univ. Press. London Math. Soc. Lecture Note Series 166 (1991), 253-300. [13] O. Pikhurko, An exact bound for Turan result for the generalized triangle, Combinatorica 28 (2008), 187-208. [14] M. Raggi, Forbidden Configurations, Ph.D. thesis, University of British Columbia, 2011.