Metodoloˇski zvezki, Vol. 1, No. 2, 2004, 407-418 Relinking Marriages in Genealogies Andrej Mrvar1 and Vladimir Batagelj2 Abstract Genealogies can be represented as graphs in different ways: as Ore graphs, as p-graphs, or as bipartite p-graphs. p-graphs are usually more suitable for analyses. Some approaches to analysis of large genealogies implemented in program Pajek are presented and illustrated with analysis of some large ge-nealogies. 1 Sources of genealogies People collect genealogical data for several different reasons/purposes: • Research on different cultures in history, sociology and anthropology (White et al., 1999), where kinship is taken as a fundamental social relation. • Genealogies of families and/or territorial units, e.g., – Mormons genealogy (MyFamily.com, 2004) ˇ – genealogy of Skofja Loka district (Hawlina, 2004) – genealogy of American presidents (Tompsett, 1993) • Special genealogies – Students and their PHD thesis advisors: Theoretical Computer Science Genealogy (Johnson and Parberry, 1993) – gods (antique). See Hawlina (2004). There also exist many programs for genealogical data entry and maintenance (GIM, Brother’s Keeper, Family Tree Maker,...), but only few analyses can be done using the programs. We use Pajek for analyses and visualization of ge-nealogies. Faculty of Social Sciences, University of Ljubljana, Slovenia; andrej.mrvar@uni-lj.si Faculty of Mathematics and Physics, Univ. of Ljubljana, Slovenia; vladimir.batagelj@uni-lj.si 408 Andrej Mrvar and Vladimir Batagelj 2 GEDCOM standard GEDCOM is a standard for storing genealogical data, which is used to interchange and combine data from different programs, which were used for entering the data. The following lines are extracted from the GEDCOM file of European Royal families. 0 HEAD 0 @I115@ INDI 1 FILE ROYALS.GED 1 NAME William Arthur Philip/Windsor/ 1 TITL Prince 0 @I58@ INDI 1 SEX M 1 NAME Charles Philip Arthur/Windsor/ 1 BIRT 1 TITL Prince 2 DATE 21 JUN 1982 1 SEX M 2 PLAC St.Mary’s Hospital, Paddington 1 BIRT 1 CHR 2 DATE 14 NOV 1948 2 DATE 4 AUG 1982 2 PLAC Buckingham Palace, London 2 PLAC Music Room, Buckingham Palace 1 CHR 1 FAMC @F16@ 2 DATE 15 DEC 1948 ... 2 PLAC Buckingham Palace, Music Room 0 @I116@ INDI 1 FAMS @F16@ 1 NAME Henry Charles Albert/Windsor/ 1 FAMC @F14@ 1 TITL Prince ... 1 SEX M ... 1 BIRT 0 @I65@ INDI 2 DATE 15 SEP 1984 1 NAME Diana Frances /Spencer/ 2 PLAC St.Mary’s Hosp., Paddington 1 TITL Lady 1 FAMC @F16@ 1 SEX F 1 BIRT 0 @F16@ FAM 2 DATE 1 JUL 1961 1 HUSB @I58@ 2 PLAC Park House, Sandringham 1 WIFE @I65@ 1 CHR 1 CHIL @I115@ 2 PLAC Sandringham, Church 1 CHIL @I116@ 1 FAMS @F16@ 1 DIV N 1 FAMC @F78@ 1 MARR ... 2 DATE 29 JUL 1981 2 PLAC St.Paul’s Cathedral, London From data represented in the described way we can generate several graphs as explained in next chapters. 3 Representation of genealogies using networks Genealogies can be represented as networks in different ways: as Ore-graph, as p-graph, and as bipartite p-graph. 3.1 Ore-graph In an Ore graph of genealogy every person is represented by a vertex, marriages are represented with edges and relation is a parent of is represented as arcs pointing from each of the parents to their children. See Figure 1. Relinking Marriages in Genealogies 409 Figure 1: Ore graph. grandfather-f & grandmother-f grandfather-m & grandmother-m Figure 2: p-graph. 3.2 p-graph In a p-graph vertices represent individuals or couples. In case that person is not married yet (s)he is represented by a vertex, otherwise the person is represented with the partner in a common vertex. There are only arcs in p-graphs – they point from children to their parents (Figure 2). The solid arcs represent the relation is a son of and the dotted arcs represent relation is a daughter of. 410 Andrej Mrvar and Vladimir Batagelj (^__jsister-in-law sister .son-in-law father & stepmother \ ^stepmother V,___/grandmother-f Figure 3: Bipartite p-graph. grandmother-m 3.3 Bipartite p-graph A bipartite p-graph has two kinds of vertices – vertices representing couples (rectan-gles) and vertices representing individuals (circles for women and triangles for men) – therefore each married person is involved in two kinds of vertices (or even more if he/she is involved in multiple marriages). Arcs again point from children to their parents (see Figure 3). 3.4 Comparison of different presentations p-graphs and bipartite p-graphs have many advantages (see White et al., 1999): • there are less vertices and lines in p-graphs than in corresponding Ore graphs; • p-graphs are directed, acyclic networks; • every semi-cycle of the p-graph corresponds to a relinking marriage. There exist two types of relinking marriages: – blood marriage: e.g., marriage among brother and sister. – non-blood marriage: e.g., two brothers marry two sisters from another family. • p-graphs are more suitable for analyses. Relinking Marriages in Genealogies 411 Bipartite p-graphs have an additional advantage: we can distinguish between a married uncle and a remarriage of a father (see Figures 2 and 3). This property enables us, for example, to find marriages between half-brothers and half-sisters. 4 Genealogies are sparse networks We will call genealogy regular if every person in it has at most two parents. Genealogies are sparse networks - number of lines is of the same order as the number of vertices. In this section some approximations and bounds on the number of lines in different kinds of regular genealogies are given. For the directed part of an regular Ore genealogy the approximation of the number of arcs A is: |A| = ^ din(v) ?2|V| veV where V is set of vertices, and din(v) input degree of vertex v, din(v) ? 2. Most of the persons are married only once, some are not married. For the undirected part of an Ore genealogy the number of edges (E) is |E| ? -|V| Therefore |L| = |A| + |E| ? 5 |V| p-graphs are almost trees - deviations from trees are caused by relinking marriages. Let us denote the number of vertices of p-graph with |Vp| and the number of multiple marriages with nmult. Then, since |E| equals to the number of couples, |Vp| = |V| - |E| + nmult and therefore |V| ? |Vp| ? |V|-|E| ? -|V| The number of arcs in p-graph is |Ap| = ]T dout(v) ? 2|Vp| veVp where dout(v) is output degree of vertex v. For the number of vertices Vb in a bipartite p-graph, we have |Vb| = |V| + |E| Since |E| ? ±|V| we get |V| ? |Vb| ? 412 Andrej Mrvar and Vladimir Batagelj Table 1: Number of vertices and number of lines in Ore graphs and p-graphs for some large networks. data \V\ \E\ \A\ \V\ Vi nmult 1Vp1 \Ap\ \Ap\ \Vp\ Drame Hawlina Marcus Mazol President Royale 29606 7405 702 2532 2145 17774 8256 2406 215 856 978 7382 41814 9908 919 3347 2223 25822 1.69 1.66 1.62 1.66 1.49 1.87 13937 2808 292 894 282 4441 843 215 20 74 93 1431 22193 5214 507 1750 1260 11823 21862 5306 496 1794 1222 15063 0.99 1.02 0.98 1.03 0.97 1.27 Loka Silba Ragusa Tur Royal92 47956 6427 5999 1269 3010 14154 2217 2002 407 1138 68052 9627 9315 1987 3724 1.71 1.84 1.89 1.89 1.62 21074 2263 2347 549 1003 1426 270 379 94 269 35228 4480 4376 956 2141 36192 5281 5336 1114 2259 1.03 1.18 1.22 1.17 1.06 For the number of arcs Ab we have \Ab\ = \Ap\ - nmult + 2\E\ < 2(\Vp\ + \E\) - nmult = 2\V\ + nmult To check the results we take several large genealogies and look at the corresponding Ore and p-graphs. A comparison of Ore and p-graph is given in Table 1. In the table the following notation is used: • Ore genealogy: \V\ - number of vertices; \E\ - number of edges; \A\ - number of arcs; \L\ = \E\ + \A\ - total number of lines. • p-graph: \Vi\ - number of individuals; nmult - number of multiple marriages; 1Vp1 = 1Vi1 + 1E1- total number of vertices; \Ap\ - number of arcs. p-graphs are usually used also for visual representation of genealogies. Since they are acyclic graphs the vertices can be assigned to levels (see Figure 4 and Figure 5). 5 Relinking index The relinking index is a measure of relinking by marriages among persons belonging to the same families. A special case of relinking is a blood-marriage in which the man and woman from the couple have a common ancestor. Let n denotes number of vertices in p-graph, m number of arcs, k number of weakly connected components, and M number of maximal vertices (vertices having output degree 0, M > 1). If a p-graph is a forest (consists of trees), then m = n-k, or k + m-n = 0. In a regular genealogy, m < 2(n - M) = 2n - 2M. Thus: 0