ISSN 1855-3966 (printed edn.), ISSN 1855-3974 (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 22 (2022) #P1.01 / 1–22 https://doi.org/10.26493/1855-3974.2467.497 (Also available at http://amc-journal.eu) Modifications of the Floyd-Warshall algorithm with nearly quadratic expected-time* Andrej Brodnik † University of Primorska, UP FAMNIT, Glagoljaška 8, 6000 Koper, Slovenia University of Primorska, UP IAM, Muzejski trg 2, 6000 Koper, Slovenia University of Ljubljana, UL FRI, Večna pot 113, 1000 Ljubljana, Slovenia Marko Grgurovič University of Primorska, UP FAMNIT, Glagoljaška 8, 6000 Koper, Slovenia University of Primorska, UP IAM, Muzejski trg 2, 6000 Koper, Slovenia Rok Požar ‡ University of Primorska, UP FAMNIT, Glagoljaška 8, 6000 Koper, Slovenia University of Primorska, UP IAM, Muzejski trg 2, 6000 Koper, Slovenia IMFM, Jadranska 19, 1000 Ljubljana, Slovenia Received 25 October 2020, accepted 10 April 2021, published online 25 November 2021 Abstract The paper describes two relatively simple modifications of the well-known Floyd- Warshall algorithm for computing all-pairs shortest paths. A fundamental difference of both modifications in comparison to the Floyd-Warshall algorithm is that the relaxation is done in a smart way. We show that the expected-case time complexity of both algorithms is O(n2 log2 n) for the class of complete directed graphs on n vertices with arc weights selected independently at random from the uniform distribution on [0, 1]. Theoretically best known algorithms for this class of graphs are all based on Dijkstra’s algorithm and obtain a better expected-case bound. However, by conducting an empirical evaluation we prove that our algorithms are at least competitive in practice with best know algorithms and, moreover, outperform most of them. The reason for the practical efficiency of the presented algorithms is the absence of use of priority queue. *A preliminary version of this work has been published in Shortest Path Solvers: From Software to Wetware, volume 32 of Emergence, Complexity and Computation (2018). The authors would like to thank the reviewer for excellent comments that substantially improved the quality of the paper. †This work is sponsored in part by the Slovenian Research Agency (research program P2-0359 and research projects J1-2481, J2-2504, and N2-0171). ‡Corresponding author. This work is supported in part by the Slovenian Research Agency (research program P1-0285 and research projects N1-0062, J1-9110, J1-9187, J1-1694, N1-0159, J1-2451). cb This work is licensed under https://creativecommons.org/licenses/by/4.0/ 2 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 Keywords: All-pairs shortest paths, probabilistic analysis. Math. Subj. Class. (2020): 05C85, 68W40 1 Introduction Finding shortest paths in graphs is a classic problem in algorithmic graph theory. Given a (directed) graph in which arcs are assigned weights, a shortest path between pair of vertices is such a path that infimizes the sum of the weights of its constituent arcs. The problem pops up very frequently also in practice in areas like bioinformatics, logistics, VLSI design (for a more comprehensive list of applications see e.g. [2]). Two of the most common problem’s variants are the single-source shortest path problem and the all-pairs shortest path problem (APSP). In the first variant of the problem, we are searching for paths from a fixed vertex to every other vertex, while the APSP asks for a shortest path between every pair of vertices. In this paper we focus exclusively on the APSP variant of the problem. In general, the APSP can be solved by using the technique of relaxation. The relaxation consists of testing whether we can improve the weight of the shortest path from u to v found so far by going via w, and updating it if necessary. In fact, the number of attempts to perform relaxation corresponds to the time complexity under the RAM model. A trivial text-book relaxation-based solution to the APSP is a dynamic programming Floyd-Warshall algorithm [11] running in O(n3) time on graphs with n vertices. Moreover, also Dijkstra’s algorithm [10] solving single-source shortest path problem is relaxation-based. However, since the order in which the relaxations are performed is greedy, it uses an additional priority queue data structure. Obviously we can solve the APSP running Dijkstra’s algorithm from each vertex of the graph obtaining O(mn log n) solution where m is the number of arcs in the graph, provided we use the binary heap imple- mentation of the priority queue. This is an improvement over the Floyd-Warshal solution for sparse graphs. Asymptotically we get an even better solution by using Fibonacci heaps over binary heaps yielding O(n2 log n+mn) time complexity. We refer to such approaches as a Dijkstra-like, which inherently use some kind of a priority queue implementation. However, the described solutions to the APSP using the Dijkstra’s algorithm have at least two limitations. The first one is that all arc weights have to be non-negative. This can be alleviated by using Johnson’s approach [15], which reweighs all arc weights making them non-negative. On such a graph we can now run Dijkstra’s algorithm. The second lim- itation is related to the efficiency of the solution implementation. Namely, due to computer architecture efficient implementations exploit the issue of data locality; i.e. in consecutive memory accesses they try to access memory locations that are “close together”. A simi- lar observation is made in [5] for the solutions to the single-source shortest path problem, where authors show that a Fibonacci heap, as asymptotically better implementation of a priority queue, in practice underperform simple binary heap. For dense graphs, a slightly better worst-case running time of O(n3 log log n/ log2 n) over the O(n3)-time Floyd-Warshall algorithm can be achieved by using an efficient matrix multiplication technique [13]. For sparse graphs on n vertices and with m non-negative weighted arcs fastest known solution [20] runs in time O(mn+ n2 log log n). E-mail addresses: andrej.brodnik@upr.si (Andrej Brodnik), marko.grgurovic@famnit.upr.si (Marko Grgurovič), rok.pozar@upr.si (Rok Požar) A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 3 Considering expected-case running-time of APSP algorithms we can find in the litera- ture a number of good solutions assuming that input instances are generated according to a probability model on the set of complete directed graphs with arc weights. In the uniform model, arc weights are drawn at random, independently of each other, according to a com- mon probability distribution. A more general model is the endpoint-independent model [3, 24], where, for each vertex v, a sequence of n−1 non-negative arc weights is generated by a deterministic or stochastic process and then randomly permuted and assigned to the outgoing arcs of v. In the vertex potential model [5, 6], arc weights can be both positive and negative. This is a probability model with arbitrary real arc weights but without negative cycles. In the uniform model with arc weights drawn from the uniform distribution on [0, 1], the O(n2 log n) expected time complexity algorithms for solving the APSP were pre- sented by Karger et al. [16] and Demetrescu and Italiano [8, 9], where the latter was inspired by the former. Another algorithm with the same expected time complexity was presented by Brodnik and Grgurovič [4]. Peres et al. [19] improved the Demetrescu and Italiano algorithm to theoretically optimal O(n2) by replacing the priority queue imple- mentation with a more involved data structure yielding theoretically desired time complex- ity. In the endpoint-independent model, Spira [24] proved an expected-case time bound of O(n2 log2 n), which was improved by several authors. Takaoka and Moffat [25] improved the bound to O(n2 log n log log n). Bloniarz [3] described an algorithm with expected- case running time O(n2 log n log∗ n). Finally, Moffat and Takaoka [18] and Mehlhorn and Priebe [17] improved the running time to O(n2 log n). In the vertex potential model, Cooper et al. [6] gave an algorithm with an expected-case running time O(n2 log n). All the above algorithms use Dijkstra-like approach. In this paper, we present two modifications of the Floyd-Warshall algorithm, which we name the Tree algorithm and the Hourglass algorithm. A fundamental difference of both modifications in relation to the Floyd-Warshall algorithm is a smarter way to perform the relaxations. This is done by introducing a tree structure that allows us to skip relaxations that do not contribute to the result. The worst-case time complexity of both algorithms remains O(n3), however, in the analysis we show that their expected running time is sub- stantially better. To simplify the analysis, we consider the uniform model which gives us the following main result. Theorem 1.1. For complete directed graphs on n vertices with arc weights selected in- dependently at random from the uniform distribution on [0, 1], the Tree algorithm and the Hourglass algorithm both have an expected-case running time of O(n2 log2 n). The proof of our main result relies on the following well-known properties of the com- plete directed graph on n vertices with uniformly distributed arc weights on [0, 1]. First, a maximum weight of a shortest path in such a graph is O(log n/n) with high probability; second, a longest shortest path has O(log n) arcs with high probability; and third, the max- imum outdegree of the subgraph consisting of all arcs that are shortest paths is O(log n) with high probability. These properties, together with the observation that if the relaxation on some vertex of the introduced tree structure fails, we can skip relaxations on the entire subtree defined by this vertex (see Lemma 3.1), then give the desired result. Since theoreti- cally best expected-case APSP algorithms are based on Dijkstra’s algorithm, it is interesting that a competitive approach can also be obtained by a modification of the Floyd-Warshall algorithm. 4 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 To prove the practical competitiveness of our algorithms, we supplement the theoret- ical analysis with an empirical evaluation. It should be pointed out, that all algorithms mentioned above with o(n2 log2 n) expected-case running time obtain a better theoretical bound. Moreover, Brodnik and Grgurovič in [4] show, for the same family of graphs as studied in this paper, practical supremacy of their algorithm over the algorithms due to Karger et al. [16] and Demetrescu and Italiano [8, 9] and consequently over the algorithm of Peres et al. [19], since its improvement of Demetrescu and Italiano solution does not improve the practical efficiency of the original algorithm. Therefore in the practical evalu- ation of Tree and Hourglass algorithms we compare them to the algorithm of Brodnik and Grgurovič [4] only. The reason for the practical efficiency of the presented algorithms is the absence of use of priority queue. Indeed, the Tree and Hourglass algorithms are simple to implement and use only simple structures such as vectors and arrays, which also exhibit a high data locality. The structure of the paper is the following. Section 2 contains the necessary notation and basic definitions to make the paper self-contained. In Section 3 we describe the Tree and Hourglass algorithms. Properties of certain shortest paths in complete graphs with independently and uniformly distributed arc weights are analyzed in Section 4. The proof of the main result is presented in Section 5, while Section 6 contains empirical evaluation of the algorithms. In Section 7 we give some concluding remarks and open problems. 2 Preliminaries All logarithms are base e unless explicitly stated otherwise. The model of computation used in algorithm design and analysis is the comparison-addition model, where the only allowed operations on arc weights are comparisons and additions. A digraph (or directed graph) G is a pair (V,A), where V is a non-empty finite set of vertices and A ⊆ V × V a set of arcs. We assume V = {v1, v2, . . . , vn} for some n. The two vertices joined by an arc are called its endvertices. For an arc (u, v) ∈ A, we say that u is its tail. The outdegree of v ∈ V , is the number of arcs in A that have v as their tail. The maximum outdegree in G is denoted by ∆(G). A digraph G′ = (V ′, A′) is a subdigraph of the digraph G = (V,A) if V ′ ⊆ V and A′ ⊆ A. The (vertex-)induced subdigraph with the vertex set S ⊆ V , denoted by G[S], is the subgraph (S,C) of G, where C contains all arcs a ∈ A that have both endvertices in S, that is, C = A ∩ (S × S). The (arc-)induced subdigraph with the arc set B ⊆ A, denoted by G[B], is the subgraph (T,B) of G, where T is the set of all those vertices in V that are endvertices of at least one arc in B. A path P in a digraph G from vP,0 to vP,m is a finite sequence P = vP,0, vP,1, . . . , vP,m of pairwise distinct vertices such that (vP,i, vP,i+1) is an arc of G, for i = 0, 1, . . . ,m− 1. The length of a path P , denoted by |P |, is the number of vertices occurring on P . Any vertex of P other than vP,0 or vP,m is an intermediate vertex. A k-path is a path in which all intermediate vertices belong to the subset {v1, v2, . . . , vk} of vertices for some k. Obviously, a 0-path has no intermediate vertices. A weighted digraph is a digraph G = (V,A) with a weight function w : A → R that assigns each arc a ∈ A a weight w(a). A weight function w can be extended to a path P by w(P ) = ∑m−1 i=0 w(vP,i, vP,i+1). A shortest path from u to v, denoted by u ⇝ v, is a path in G whose weight is infimum among all paths from u to v. The distance between two vertices u and v, denoted by DG(u, v), is the weight of a shortest path u ⇝ v in G. A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 5 The diameter of G is maxu,v∈V DG(u, v), that is, the maximum distance between any two vertices in G. Given a subset S ⊆ V , the distance between S and a vertex v in G, denoted by DG(S, v), is DG(S, v) = minu∈S DG(u, v). A shortest k-path from u to v is denoted by u k⇝v. Further, we denote the set of arcs that are shortest k-paths in G by A(k) and the subdigraph G[A(k)] by G(k). Finally, we will need some tools from combinatorics. In the balls-into-bins process M balls are thrown uniformly and independently into N bins. The maximum number of balls in any bin is called the maximum load. Let Li denote the load of bin i, i ∈ {1, 2, . . . , N}. The next lemma, used in Subsection 4.3, provides an upper bound on the maximum load. It is a simplified version of a standard result, c.f. [23], tailored to our present needs. For completeness we provide the proof. Lemma 2.1. If M balls are thrown into N bins where each ball is thrown into a bin chosen uniformly at random, then P(max1≤i≤N Li ≥ e2(M/N + logN)) = O(1/N). Proof. First, we have µ = E(Li) = M/N , i = 1, 2, . . . , N , and we can write each Li as a sum Li = Xi1 + Xi2 + · · · + XiM , where Xij is a random variable taking value 1, if ball j is in bin i, and 0 otherwise. Next, since Li is a sum of independent random variables taking values in {0, 1}, we can apply, for any particular bin i and for every c > 1, the multiplicative Chernoff bound [12], which states that P(Li ≥ cµ) ≤ ( ec−1 c c )µ ≤ ( e c )cµ . We consider two cases, depending on whether µ ≥ logN or not. Let µ ≥ logN . Take c = e2. Then, P(Li ≥ e2µ) ≤ ( 1 e )e2µ ≤ ( 1 e )e2 logN = 1 Ne2 ≤ 1 N2 . Consider now µ < logN . Take c = e2 NM logN . Since x −x ≤ ( 1 e )x for all x ≥ e, we have P ( Li ≥ µe2 N M logN ) = P(Li ≥ e2 logN) ≤ ( e c )cµ = (( c e )− ce)eµ ≤ (( 1 e ) c e )eµ = ( 1 e )e2 logN ≤ 1 N2 . Putting everything together, we get that P(Li ≥ e2(µ+ logN)) ≤ P(Li ≥ e2µ | µ ≥ logN) + P(Li ≥ e2 logN | µ < logN) ≤ 1 N2 + 1 N2 = 2 N2 . This, by the union bound, implies that P ( max 1≤i≤N Li ≥ e2(µ+ logN) ) ≤ N∑ i=1 P(Li ≥ e2(µ+ logN)) ≤ N 2 N2 = O(1/N). 6 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 3 Speeding up the Floyd-Warshall algorithm The Floyd-Warshall algorithm [11, 26] as presented in Algorithm 1 is a simple dynamic programming approach to solve APSP on a graph G(V,A) represented by a weight matrix W , where Wij = w(vi, vj) if (vi, vj) ∈ A and ∞ otherwise. Its running time is O(n3) due to three nested for loops. Algorithm 1 FLOYD-WARSHALL(W ) 1 for k := 1 to n do 2 for i := 1 to n do 3 for j := 1 to n do 4 if Wik +Wkj < Wij then ▷ Relaxation 5 Wij := Wik +Wkj 3.1 The Tree algorithm Let us consider iteration k, and let OUTk denote a shortest path tree rooted at vertex vk in G(k−1). Intuitively, one might expect that the relaxation in lines 4-5 would not always succeed in lowering the value of Wij which currently contains the weight w(vi k−1 ⇝ vj). This is precisely the observation that we exploit to arrive at a more efficient algorithm: instead of simply looping through every vertex of V in line 3, we perform the depth-first traversal of OUTk. This permits us to skip iterations which provably cannot lower the current value of Wij . As the following lemma shows, if w(vi k ⇝vj) = w(vi k−1 ⇝ vj), then w(vi k ⇝vy) = w(vi k−1 ⇝ vy) for all vertices vy in the subtree of vj in OUTk. Lemma 3.1. Let vj ∈ V \ {vk} be some non-leaf vertex in OUTk, vy ̸= vj an arbitrary vertex in the subtree of vj in OUTk, and vi ∈ V \ {vk}. Consider the path vi k−1 ⇝ vk k−1 ⇝ vj . If w(vi k−1 ⇝ vk k−1 ⇝ vj) ≥ w(vi k−1 ⇝ vj), then w(vi k−1 ⇝ vk k−1 ⇝ vy) ≥ w(vi k−1 ⇝ vy). Proof. Since vj is neither a leaf nor the root of OUTk, we have j < k, and so vi k−1 ⇝ vj k−1 ⇝ vy is a (k − 1)-path between vi and vy . Because vi k−1 ⇝ vy is a shortest (k − 1)-path between vi and vy , we have w(vi k−1 ⇝ vy) ≤ w(vi k−1 ⇝ vj k−1 ⇝ vy) = w(vi k−1 ⇝ vj) + w(vj k−1 ⇝ vy) ≤ w(vi k−1 ⇝ vk k−1 ⇝ vj) + w(vj k−1 ⇝ vy) = w(vi k−1 ⇝ vk k−1 ⇝ vj k−1 ⇝ vy), where the last inequality follows by the assumption. Finally, since vy is in the subtree rooted at vj , we have vi k−1 ⇝ vk k−1 ⇝ vj k−1 ⇝ vy = vi k−1 ⇝ vk k−1 ⇝ vy , and so the last term is equal to w(vi k−1 ⇝ vk k−1 ⇝ vy). This completes the proof. The pseudocode of the modified Floyd-Warshall algorithm augmented with the tree OUTk, named the Tree algorithm, is given in Algorithm 2. To perform depth first search we first construct the tree OUTk in line 3 using CONSTRUCTOUT given in Algorithm 3. For the construction of tree OUTk an additional matrix π, where πij specifies the penultimate vertex on a k-shortest path from vi to vj (i.e. the vertex “before” vj)1 is used. More 1C.f. π(k)ij in [7, Sec. 25.2]. A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 7 Algorithm 2 TREE(W ) 1 Initialize π, an n× n matrix, as πij := i. 2 for k := 1 to n do 3 OUTk := CONSTRUCTOUTk(π) 4 for i := 1 to n do 5 Stack := empty 6 Stack.push(vk) 7 while Stack ̸= empty do 8 vx := Stack.pop() 9 for all children vj of vx in OUTk do 10 if Wik +Wkj < Wij then ▷ Relaxation 11 Wij := Wik +Wkj 12 πij := πkj 13 Stack.push(vj) Algorithm 3 CONSTRUCTOUTk(π) 1 Initialize n empty trees: T1, . . . , Tn. 2 for i := 1 to n do 3 Ti.Root := vi 4 for i := 1 to n do 5 if i ̸= k then 6 Make Ti a subtree of the root of Tπki . return Tk precisely, the tree OUTk is obtained from π by making vi a child of vπki for all i ̸= k. This takes O(n) time. Finally, we replace the iterations in lines 3-5 in Algorithm 1 with depth-first tree traversal of OUTk in lines 5-13 in Algorithm 2. Note that if, for a given i and a child vj , the condition in line 10 evaluates to false we do not traverse the subtree of vj in OUTk. Corollary 3.2. The Tree algorithm correctly computes all-pairs shortest paths. Proof. The correctness of the algorithm follows directly from Lemma 3.1. Time complexity Let Tk denote the running time of the algorithm TREE(W ) in lines 3-13 at iteration k. As already said, line 3 requires O(n) time. To estimate the time complexity of lines 4-13, we charge the vertex vx in line 8 by the number of its children. This pays for lines 9-13. Furthermore, this means that on the one hand leaves are charged nothing, while on the other hand nobody is charged for the root vk. To this end, let SP (k) k be the set of all shortest k-paths that contain vk and end at some vertex in the set {v1, v2, . . . , vk}. Now vx in line 8 is charged at most |SP(k)k | times over all iterations of i. Since the number of children of vx is bounded from above by ∆(OUTk), we can bound Tk from above by Tk ≤ |SP(k)k | ·∆(OUTk) +O(n). (3.1) 8 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 Practical improvement Observe that in Algorithm 2 vertices of OUTk are visited in a depth-first search (DFS) order, which is facilitated by using the stack. However, this requires pushing and popping of each vertex, as well as reading of all its children in OUTk. We can avoid this by precomputing two read-only arrays dfs and skip to support the traversal of OUTk. The array dfs consists of OUTk vertices as visited in the DFS order. On the other hand, the array skip is used to skip OUTk subtree when relaxation in line 10 of Algorithm 2 does not succeed. In detail, for a vertex vz , skipz contains the index in dfs of the first vertex after vz in the DFS order that is not a descendant of vz in OUTk. Utilizing the arrays outlined above, we traverse OUTk by scanning dfs in left-to-right order and using skip whenever a relaxation is not made. Consequently, we perform only two read operations per visited vertex. It should be pointed out that the asymptotic time remains the same, as this is solely a technical optimization. 3.2 The Hourglass algorithm We can further improve the Tree algorithm by using another tree. The second tree, denoted by INk, is similar to OUTk, except that it is a shortest path “tree” for paths vi k−1 ⇝ vk for each vi ∈ V \ {vk}. Strictly speaking, this is not a tree, but if we reverse the directions of the arcs, it turns it into a tree with vk as the root. Traversal of INk is used as a replacement of the for loop on variable i in line 4 of Algorithm 2 (in line 2 of Algorithm 1). As the following lemma shows, if w(vi k ⇝vj) = w(vi k−1 ⇝ vj), then w(vy k ⇝vj) = w(vy k−1 ⇝ vj) for all vertices vy in the subtree of vi in INk. Lemma 3.3. Let vi ∈ V \ {vk} be some non-leaf vertex in INk and let vy ̸= vi be an arbi- trary vertex in the subtree of vi in INk, and vj ∈ V \{vk}. Consider the path vi k−1 ⇝ vk k−1 ⇝ vj . If w(vi k−1 ⇝ vk k−1 ⇝ vj) ≥ w(vi k−1 ⇝ vj), then w(vy k−1 ⇝ vk k−1 ⇝ vj) ≥ w(vy k−1 ⇝ vj). Proof. Due to the choice of vi and vy we have: vy k−1 ⇝ vk = vy k−1 ⇝ vi k−1 ⇝ vk. We want to show, that: w(vy k−1 ⇝ vj) ≤ w(vy k−1 ⇝ vi) + w(vi k−1 ⇝ vk k−1 ⇝ vj). Observe that i < k, since vi is neither a leaf nor the root of INk. Thus we have: w(vy k−1 ⇝ vj) ≤ w(vy k−1 ⇝ vi) + w(vi k−1 ⇝ vj). Putting these together we get the desired inequality: w(vy k−1 ⇝ vj) ≤ w(vy k−1 ⇝ vi) + w(vi k−1 ⇝ vj) ≤ w(vy k−1 ⇝ vi) + w(vi k−1 ⇝ vk k−1 ⇝ vj). The pseudocode of the modified Floyd-Warshall algorithm augmented with the trees OUTk and INk, named the Hourglass algorithm2, is given in Algorithms 4 and 5. To con- struct INk efficiently, we need to maintain an additional matrix ϕij which stores the second 2The hourglass name comes from placing INk tree atop the OUTk tree, which gives it an hourglass-like shape, with vk being at the neck. A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 9 vertex on the path from vi to vj (cf. π and πij). Algorithm 6 constructs INk similarly to the construction of OUTk, except that we use the matrix ϕik instead. The only extra space requirement of the Hourglass algorithm that bears any significance is the matrix ϕ, which does not deteriorate the space complexity of O(n2). The depth-first traversal on INk is performed by a recursion on each child of vk in line 7 of Algorithm 4. In the recursive step, given in Algorithm 5, we can prune OUTk as follows: if vi is the parent of vy in INk and vi k−1 ⇝ vj ≤ vi k−1 ⇝ vk k−1 ⇝ vj , then the subtree of vj can be removed from OUTk, while inspecting the subtree of vi in INk. Before the return from the recursion the tree OUTk is reconstructed to the form it was passed as a parameter to the function. Algorithm 4 HOURGLASS(W ) 1 Initialize π, an n× n matrix, as πij := i. 2 Initialize ϕ, an n× n matrix, as ϕij := j. 3 for k := 1 to n do 4 OUTk := CONSTRUCTOUTk(π) 5 INk := CONSTRUCTINk(ϕ) 6 for all children vi of vk in INk do 7 RECURSEIN(W,π, ϕ, INk,OUTk, vi) Algorithm 5 RECURSEIN(W,π, ϕ, INk,OUTk, vi) 1 Stack := empty 2 Stack.push(vk) 3 while Stack ̸= empty do 4 vx := Stack.pop() 5 for all children vj of vx in OUTk do 6 if Wik +Wkj < Wij then ▷ Relaxation 7 Wij := Wik +Wkj 8 πij := πkj 9 ϕij := ϕik 10 Stack.push(vj) 11 else 12 Remove the subtree of vj from OUTk. 13 for all children vi′ of vi in INk do 14 RECURSEIN(W,π, ϕ, INk,OUTk, vi′ ) 15 Restore OUTk by reverting changes done by all iterations of line 12. In practice, the recursion can be avoided by using an additional stack, which further speeds up an implementation of the algorithm. Corollary 3.4. The Hourglass algorithm correctly computes all-pairs shortest paths. Proof. Observe, that lines 5-10 of Algorithm 5 are effectively the same as in Algorithm 2. Line 12 of Algorithm 5 does not affect the correctness of the algorithm due to Lemma 3.3, which states that, for any vi′ that is a child of vi in INk, these comparisons can be skipped, as they cannot lead to shorter paths. However, Lemma 3.3 does not apply to a sibling vi∗ 10 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 Algorithm 6 CONSTRUCTINk(ϕ) 1 Initialize n empty trees: T1, . . . , Tn. 2 for i := 1 to n do 3 Ti.Root := vi 4 for i := 1 to n do 5 if i ̸= k then 6 Make Ti a subtree of the root of Tπki . return Tk of vi, arising from line 6 of Algorithm 4. Therefore line 15 restores the tree OUTk, which maintains the correctness of the algorithm. Finally, note that the worst-case time complexity of the Hourglass (and Tree) algo- rithm remains O(n3). The simplest example of this is when all shortest paths are the arcs themselves, at which point all leaves are children of the root and the tree structure never changes. 4 Properties of shortest k-paths in complete graphs Let Kn denote a complete digraph on the vertex set V = {v1, v2, . . . , vn}. 4.1 Distances We assume that arc weights of Kn are exponential random variables with mean 1 and that all n(n − 1) random arc weights are independent. Due to the memoryless property, it is easier to deal with exponentially distributed arc weights than directly with uniformly distributed arc weights. The aim of this subsection is to show that the diameter of K(k)n , the subdigraph of Kn consisting of all (weighted) arcs that are shortest k-paths in Kn, is O(log n/k) with very high probability. We note, however, that by the same argument as given in the beginning of Subsection 4.3, all results derived in this subsection for expo- nential arc weights also hold, asymptotically for [0, 1]-uniformly distributed arc weights as soon as k ≥ log2 n. We start by considering for a fixed u ∈ V , the maximum distance in K(k)n between u and other vertices in V . To this end, let S = {u, v1, . . . , vk} ⊆ V , and let S = V \ S. We clearly have max v∈V D K (k) n (u, v) ≤ max v∈S DKn[S](u, v) + max v∈S DKn[S×S](S, v), (4.1) that is, the maximum distance in K(k)n between u and other vertices in V is bounded above by the sum of the maximum distance in Kn[S] between u and other vertices in S, and by the maximum distance in Kn[S × S] between S and vertices in S. We note that Kn[S] is a complete digraph on |S| vertices and Kn[S × S] is a complete bipartite digraph with bipartition (S, S). To provide an upper bound on maxv∈S DKn[S](u, v), we use the following result, which follows from the equation (2.8) in the proof of Theorem 1.1 of Janson [14]. A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 11 Theorem 4.1 ([14, Theorem 1.1]). Let u ∈ V be a fixed vertex of Kn. Then for every a > 0, we have P ( max v∈V DKn(u, v) ≥ a log n n ) = O(ean2−a log2 n). Lemma 4.2. Let 8 ≤ k ≤ n, and let S ⊆ V with |S| = k. Then, for a fixed u ∈ S and for any constant c > 0, we have P ( max v∈S DKn[S](u, v) ≥ c log n k ) = O(n2−c/2 log2 n). Proof. By Theorem 4.1, for any a > 0 we have P ( max v∈S DKn[S](u, v) ≥ a log k k ) = O(eak2−a log2 k). Setting a = c log n/ log k we get eak2−a log2 k = ec logn/ log kk2k−c logn/ log k log2 k ≤ (elogn)c/2k2(klogk n)−c log2 k. In the last step we used the fact that 1/ log k ≤ 1/2 for k ≥ 8 and that log n/ log k = logk n. Furthermore, (elogn)c/2k2(klogk n)−c log2 k = nc/2k2n−c log2 k = O(n2−c/2 log2 n), and the result follows. Next, we provide an upper bound on maxv∈S DKn[S×S](S, v). Lemma 4.3. Let 1 ≤ k ≤ n, let S ⊆ V with |S| = k, and let S = V \ S. Then for any constant c > 0, we have P ( max v∈S DKn[S×S](S, v) ≥ c log n k ) = O(n1−c log n). Proof. Let Z = maxv∈S DKn[S×S](S, v). Arguing similarly as in the proof of Theorem 1.1 of Janson [14], Z is distributed as n−1∑ j=k Xj , where Xj are independent exponentially distributed random variables with mean 1k(n−j) . First, for any constant c > 0, the Chernoff bound [12] states that P(Z ≥ c log n/k) ≤ e−tc lognE(ektZ). Further, for −∞ < t ≤ 1, we have E(ektZ) = n−1∏ j=k E(ektXj ) = n−1∏ j=k ( 1− t n− j )−1 . 12 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 Using the inequality − log(1 − x) ≤ x + x2 for all 0 ≤ x ≤ 1/2, we can bound, for all 0 < t < 1 and k ≤ j ≤ n− 2, each term (1− t/(n− j))−1 as follows( 1− t n− j )−1 = exp ( − log ( 1− t n− j )) ≤ exp ( t n− j + ( t n− j )2) . This gives us P(Z ≥ c log n/k) ≤ (1− t)−1 exp ( − tc log n+ n−2∑ j=k ( t n− j + ( t n− j )2)) = (1− t)−1 exp(−tc log n+ t log(n− k) +O(1)). Taking t = 1− 1/ log n, we finally get P(Z ≥ c log n/k) ≤ (1/ log n)−1 exp(−c log n+ log n+O(1)) = O(n1−c log n). We are now ready to show that the diameter of K(k)n is O(log n/k) with very high probability. Theorem 4.4. Let 8 ≤ k ≤ n. Then, for any constant c > 0, we have P ( max u,v∈V D K (k) n (u, v) ≥ c log n k ) = O(n3−c/4 log2 n). Proof. Let S = {u, v1, . . . , vk} ⊆ V , let S = V \ S, and write α = c log n/k. Then, by inequality (4.1), we have P ( max v∈V D K (k) n (u, v) ≥ α ) ≤ P ( max v∈S DKn[S](u, v) + max v∈S DKn[S×S](S, v) ≥ α ) ≤ P ( max v∈S DKn[S](u, v) ≥ α 2 ) + P ( max v∈S DKn[S×S](S, v) ≥ α 2 ) . By Lemma 4.2, we have P ( max v∈S DKn[S](u, v) ≥ α 2 ) = O(n2−c/4 log2 n), and, by Lemma 4.3, P ( max u∈S DKn[S×S](S, v) ≥ α 2 ) = O(n1−c/2 log n). Putting everything together, we get P ( max v∈V D K (k) n (u, v) ≥ α ) = O(n2−c/4 log2 n), which, by the union bound, implies P ( max u,v∈V D K (k) n (u, v) ≥ α ) ≤ nP ( max u∈V D K (k) n (v, u) ≥ α ) = O(n3−c/4 log2 n). A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 13 4.2 Lengths Let all arc weights of Kn be either independent [0, 1]-uniform random variables or inde- pendent exponential random variables with mean 1. In this subsection, we bound the length of the longest shortest k-path in Kn. The proof of our next lemma follows directly from Theorem 1.1 of Addario-Berry et. al [1] on the longest shortest path in Kn. Theorem 4.5 ([1, Theorem 1.1]). The following two properties hold: (i) For every t > 0, we have P ( max u,v∈V |u⇝v| ≥ α∗ log n+ t ) ≤ eα ∗+t/ logne−t, where α∗ ≈ 3.5911 is the unique solution of α logα− α = 1. (ii) E(maxu,v∈V |u⇝v|) = O(log n). Lemma 4.6. The following two properties hold: (i) For every c > 5 and 8 ≤ k ≤ n, we have P(maxu,v∈V |u k ⇝v| ≥ c log n) = O(n2−c/2). (ii) E(maxu,v∈V |u k ⇝v|) = O(log k). Proof. Let S = {v1, v2, . . . , vk}, and let u → w k ⇝z → v be a shortest k-path in Kn. Since w k ⇝z is a shortest path from w to z in Kn[S], we have max u,v∈V |u k⇝v| ≤ max w,z∈S |w k⇝z|+ 2. (4.2) By (i) of Theorem 4.5, for any t > 0, P ( max w,z∈S |w k⇝z| ≥ α∗ log k + t ) ≤ eα ∗+t/ log ke−t, where α∗ ≈ 3.5911 is the unique solution of α logα−α = 1. Using t = (c−α∗) log n−2 gives us P ( max w,z∈S |w k⇝z|+ 2 ≥ α∗ log(k/n) + c log n ) ≤ eα ∗−2/ log k+2e(c−α ∗)( log nlog k −logn) ≤ eα ∗−2/ log k+2(elogn)1/2(α ∗−c) = O(n2−c/2). By inequality (4.2), we have P ( max u,v∈V |u k⇝v| ≥ c log n ) ≤ P ( max w,z∈S |u k⇝v|+ 2 ≥ c log n ) ≤ P ( max w,z∈S |w k⇝z|+ 2 ≥ α∗ log(k/n) + c log n ) , and (i) follows. To prove (ii), we note that, by (ii) of Theorem 4.5, E(maxu,v∈S |u k ⇝v|) = O(log k), and, by inequality (4.2), the result follows. 14 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 4.3 Maximum outdegree Let arc weights of Kn be independent [0, 1]-uniform random variables. Our goal in this subsection is to show that the maximum outdegree of a shortest path tree OUTk in K (k) n is O(log k + (n− k)/k) with high probability for all k ≥ log2 n. Let now S = {v1, v2, . . . , vk} and S = V \ S. We can consider OUTk as consisting of the subtree OUTk[S] to which each vertex from S is attached as a leaf. To see how these vertices are attached to OUTk[S], let us assume for the moment that arc weights are exponentially distributed with mean 1. Then, it is easy to see that a vertex v ∈ S is attached to that one in S with which it forms a shortest arc, say av , between S and v. Let (Kn[S × S])∗ be the subdigraph of Kn[S × S] with the set V of vertices and the set {av | v ∈ S} of arcs. By observing that OUTk[S] is a subdigraph of the graph (Kn[S])(k) consisting of all arcs that are shortest paths in Kn[S], we have ∆(OUTk) ≤ ∆((Kn[S])(k)) + ∆((Kn[S × S])∗). (4.3) To extend the latter bound to uniform distribution, we use a standard coupling argument as in [1]. Let U be a random variable uniform on [0, 1]. Then − log(1 − U) is an expo- nential random variable with mean 1, and so we can couple the exponential arc weights W ′(u, v) to uniform arc weights W (u, v) by setting W ′(u, v) = − log(1−W (u, v)). As x ≤ − log(1 − x) ≤ x + 2x2 for all 0 ≤ x ≤ 1/2, we have that, for all arcs (u, v) of Kn, |W ′(u, v) − W (u, v)| = O((W ′(u, v))2), uniformly for all W ′(u, v) ≤ 1/2. In particular, if W ′(u, v) ≤ 12 log n/k, say, and k ≥ log2 n, then |W ′(u, v) − W (u, v)| = O(1/ log2 n) for n large enough, and so for a path P with O(log n) vertices and with W ′(P ) ≤ 12 log n/k, we have |W ′(P )−W (P )| = O(1/ log n) for n large enough. By Theorem 4.4, with very high probability a shortest (k − 1)-path in Kn with the exponential arc weights has weight less than 12 log n/k, while by (i) of Lemma 4.6, with very high probability it has O(log n) vertices. It then follows easily that, for all n sufficiently large and k ≥ log2 n, the bound as in (4.3) holds for uniform distribution, as well. The following result on the maximum outdegree in the subgraph (Kn[S])(k) of the complete graph Kn[S] on k vertices with [0, 1]-uniform arc weights can be found in Peres et al. [19]. Lemma 4.7 ([19, Lemma 5.1]). Let 1 ≤ k ≤ n and let S ⊆ V with |S| = k. Then, for every c > 6, we have P(∆((Kn[S])(k)) > c log k) = O(k1−c/6). The maximum outdegree in (Kn[S × S])∗ is directly related to the maximum load in the balls-into-bins process, which is used in the proof of the following lemma. Lemma 4.8. Let 1 ≤ k ≤ n, let S ⊆ V with |S| = k, and let S = V \ S. Then, P(∆((Kn[S × S])∗) ≥ e2((n− k)/k + log k)) = O(k−1). Proof. Consider vertices from S as bins and vertices from S as balls. For v ∈ S, each arc in S × v is equally likely to be the shortest, so v is thrown into a bin chosen uniformly at random, and the result follows by Lemma 2.1 for N = k and M = n− k. A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 15 We are now ready to prove the main result of this subsection. Theorem 4.9. For every k ≥ log2 n, we have P ( ∆(OUTk) ≥ (e2 + 12) log k + e2 n− k k ) = O(k−1). Proof. Let S = {v1, v2, . . . , vk} and S = V \ S. Further, let us write α = 12 log k and β = e2((n− k)/k log k). By the inequality (4.3), for every k ≥ log2 n, we have P(∆(OUTk) ≥ α+ β) ≤ P(∆((Kn[S])(k)) + ∆((Kn[S × S])∗) ≥ α+ β) ≤ P(∆((Kn[S])(k)) ≥ α) + P(∆((Kn[S × S])∗) ≥ β). By Lemma 4.7, we have P(∆((Kn[S])(k)) ≥ α) ≤ 1/k. Similarly, by Lemma 4.8, we have P(∆((Kn[S × S])∗) ≥ β) ≤ 1/k. Hence, P(∆(OUTk) ≥ α + β) ≤ 1/k + 1/k = O(1/k). 5 Expected-case analysis We perform an expected-case analysis of the Tree algorithm for the complete directed graphs on n vertices with arc weights selected independently at random from the uniform distribution on [0, 1]. Recall that SP(k)k is the set of all shortest k-paths that contain vk and end at some vertex in the set {v1, v2, . . . , vk}. We first show that the expected number of paths in SP(k)k is O(n log k). Lemma 5.1. For each k = 1, 2, . . . , n, we have E(|SP(k)k |) = O(n log k). Proof. For vi ∈ V , let SP(k)i denote the set of all shortest k-paths that contain vi and end at some vertex in the set {v1, v2, . . . , vk}. Note that k∑ i=1 |SP(k)i | ≤ n∑ i=1 k∑ j=1 |vi k ⇝vj |. By symmetry, we have E(|SP(k)i |) = E(|SP (k) j |) for arbitrary i, j ∈ {1, 2, . . . , k}, and hence kE(|SP(k)k |) = k∑ i=1 E(|SP(k)i |) ≤ n∑ i=1 k∑ j=1 E(|vi k ⇝vj |) ≤ knE( max u,v∈V |u k⇝v|). By (ii) of Lemma 4.6, we get that E(|SP(k)k |) = O(n log k). We are now ready to analyse the expected time of the Tree algorithm. Theorem 5.2. The Tree algorithm has an expected-case running time of O(n2 log2 n) for the complete directed graphs on n vertices with arc weights selected independently at ran- dom from the uniform distribution on [0, 1]. 16 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 Proof. To estimate the number of comparisons Tk at iteration k, we consider two cases. First, for k < log2 n we bound Tk from above by n2. Second, we estimate E(Tk) for k ≥ log2 n. For every c > 0, we have E(Tk) = E(Tk | ∆(OUTk) < c) · P(∆(OUTk) < c) + E(Tk | ∆(OUTk) ≥ c) · P(∆(OUTk) ≥ c). Using inequality (3.1) we get E(Tk | ∆(OUTk) < c) ≤ E(|SP(k)k | ·∆(OUTk) +O(n) | ∆(OUTk) < c) ≤ c · E(|SP(k)k |) +O(n). As Tk is always at most n2, we have E(Tk | ∆(OUTk) ≥ c) ≤ n2. Further, taking into account that P(∆(OUTk) < c) ≤ 1, we get E(Tk) ≤ c · E(|SP(k)k |) +O(n) + n 2 · P(∆(OUTk) ≥ c). Take c = (e2 + 12) log k + e2 n−kk . Then, by Lemma 4.9, we have P(∆(OUTk) ≥ c) = O(k−1). Moreover, by Lemma 5.1, we have E(|SP(k)k |) = O(n log k), which gives us E(Tk) = O((e2 + 12)n log2 k + e2(n− k)n log k/k) +O(n) +O(n2/k) = O(n log2 n+ n2 log n/k). Putting everything together, we bound the expected time of the algorithm from above as E ( n∑ k=1 Tk ) = log2 n−1∑ k=1 E(Tk) + n∑ k=log2 n E(Tk) ≤ log2 n−1∑ k=1 n2 + n∑ k=log2 n O(n log2 n+ n2 log n/k) = O(n2 log2 n), as claimed. We conclude the section with a proof of the main theorem. Proof of Theorem 1.1. The Hourglass algorithm does not have a worse bound than the Tree variant, so the result follows by Theorem 5.2. 6 Empirical evaluation All algorithms were implemented in C++ and compiled using g++ -march=native -O3. The tests were ran on an Intel(R) Core(TM) i7-2600@3.40GHz with 8GB RAM running Windows 7 64-bit. To make the comparison between Floyd-Warshall and its modified versions fairer, we improved the Floyd-Warshall algorithm with a simple modification skipping combinations of i and k where Wik = ∞, and consequently reducing the number of relaxations of the algorithm to RFW ≤ n3. A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 17 The experiments were conducted on the following random digraphs: (i) uniform ran- dom digraphs with arc weights uniformly distributed on the interval [0, 1], and (ii) un- weighted random digraphs. In both cases, the digraphs were constructed by first setting the desired vertex count and density. Then, a random Hamiltonian cycle was constructed, ensuring the strong connectivity of the digraph. After the cycle was constructed, the re- maining n(n − 2) arcs were put into a list and randomly permuted, and then added into the digraph until the desired density was reached. Finally, algorithms were executed on the instance, and their running times were recorded. Tests were conducted ten times and averaged, with each test running on a different randomly generated graph. 6.1 Empirical comparison of the number of relaxations Our motivation when designing the Hourglass and Tree algorithms was to skip relaxations that are not contributing to the result. To verify the theoretical results on the expected number of relaxations in practice we conducted two experiments in which we counted the number of relaxations by different algorithms. For the first experiment we generated a sub- family of digraphs from (i), mentioned above, consisting of complete digraphs of varying size vertex set. On contrary, for the second experiment we generated another subfamily of digraphs from (i), now consisting of sparse digraphs with fixed vertex set and variable arc density. The results of experiments are presented in the plots relative to the number of relaxations performed by the Floyd-Warshall algorithm; i.e. all numbers of relaxations are divided by RFW . The results of the first experiment, in which RFW = n3 since digraphs are complete, are presented in Figure 1. To relate the theoretical upper bound of O(n2 lg2 n) of the Tree algorithm and the experimental results, we added also the plot of the function 60n 2 lg2 n n3 . We chose the constant 60 so that the plots of the Tree algorithm and the added function start at the same initial point, namely at 28 vertices. The results of the second experiment for 28 29 210 211 212 0 1 2 3 4 5 7.5 10 15 Number of vertices (n) R el ax at io ns /n 3 Tree Hourglass 60n 2 lg2 n n3 Figure 1: Complete digraphs of various sizes with the number of relaxations of algorithms divided by n3. 18 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 n = 1024 vertices and sizes of the arc set varying between n2/10 and 8n2/10 are shown in Figure 2. 10 20 30 40 50 60 70 80 1 2 3 4 5 6 Density (100% = n2 arcs) R el ax at io ns /R F W n = 1024 vertices Tree Hourglass Figure 2: Digraphs with n = 1024 vertices and various arc densities with the number of relaxations of algorithms divided by RFW . In Figure 1 we see a significant reduction of relaxations which also implies the decrease of running time of the Tree and Hourglass algorithms. From the plot we can also see that the experimental results indicate that the theoretical upper bound of the Tree algorithm is asymptotic tight. The experiments on sparse digraphs (see Figure 2) also show a reduction in relaxations as digraphs become sparser. 6.2 Empirical comparison of running times As discussed in the introduction, we compared the Tree3 and Hourglass algorithms with the Floyd-Warshall [11, 26] and Dijkstra [10] algorithms, as well as the algorithm of Brodnik and Grgurovič [4], which we refer to as Propagation. These algorithms were chosen since they proved to perform best out of a larger group of algorithms compared in [4]. It should be pointed out, that breadth-first search is extremely efficient in solving APSP on unweighted digraphs. However, we did not include breadth-first search in comparisons, because we consider unweighted graph instances only as the worst-case instances of the general shortest path problem (each arc is part of at least one shortest path in such in- stances). The algorithms were tested on the graph families (i) and (ii) described at the beginning of this section, with sizes of the vertex set varying between 512 and 4096, and sizes of the arc set varying between n1.1 and n2. As the priority queue in the Dijkstra and Propaga- tion algorithms we used pairing heaps since they are known to perform especially well in solving APSP in practice [22], even though the amortized complexity of their decrease key 3In the tests, we used the implementation of the algorithm with improvements from Subsection 3.1. A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 19 operation takes O(22 √ lg lgn) in comparison to O(1) of Fibonacci heaps [21]. We used the implementation of pairing heaps from the Boost Library, version 1.55. The results for uniform random digraphs presented in Figure 3 show that both, Prop- agation and Tree, outperform the other algorithms on all vertex and arc densities. As the size n of graphs increases, the running time of Hourglass approaches the running time of Tree, but the constant factors still prove to be too large for Hourglass to prevail because of a more clever exploration strategy. Moreover, it is interesting to see that Floyd-Warshall based Tree and Hourglass outperform Dijkstra on sparse graphs. n1.1 n1.325 n1.55 n1.775 n2 24 25 26 27 28 Density Ti m e (m ill is ec on ds ) n = 512 vertices n1.1 n1.325 n1.55 n1.775 n2 26 27 28 29 210 211 Density Ti m e (m ill is ec on ds ) n = 1024 vertices n1.1 n1.325 n1.55 n1.775 n2 28 29 210 211 212 213 214 Density Ti m e (m ill is ec on ds ) n = 2048 vertices n1.1 n1.325 n1.55 n1.775 n2 211 212 213 214 215 216 217 Density Ti m e (m ill is ec on ds ) n = 4096 vertices Dijkstra Propagation Floyd-Warshall Tree Hourglass Figure 3: Experimental results on family (i) – uniform digraphs. The results for unweighted random digraphs are shown in Figure 4. What is interesting is that Tree and Hourglass remain competitive with Dijkstra, and even outperforming it on smaller graphs in some instances. In contrast, the performance of Propagation falls short of Dijkstra because each arc is part of at least one shortest path in these graphs. 20 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 n1.1 n1.325 n1.55 n1.775 n2 24 25 26 27 28 29 Density Ti m e (m ill is ec on ds ) n = 512 vertices n1.1 n1.325 n1.55 n1.775 n2 25 26 27 28 29 210 211 212 213 Density Ti m e (m ill is ec on ds ) n = 1024 vertices n1.1 n1.325 n1.55 n1.775 n2 28 29 210 211 212 213 214 215 216 Density Ti m e (m ill is ec on ds ) n = 2048 vertices n1.1 n1.325 n1.55 n1.775 n2 210 211 212 213 214 215 216 217 218 219 220 Density Ti m e (m ill is ec on ds ) n = 4096 vertices Dijkstra Propagation Floyd-Warshall Tree Hourglass Figure 4: Experimental results on family (ii) – unweighted digraphs. 7 Conclusion We theoretically analyzed the Tree algorithm which is a relatively simple modification of the Floyd-Warshall algorithm. The analysis gives its expected-case time complexity in the uniform model of O(n2 log2 n), which also explains the algorithm’s good practical performance presented in Section 6. We also presented the Hourglass algorithm as a further improvement of the Tree algorithm, but it remains an open question whether its expected- case time complexity in the uniform model is o(n2 log2 n). Next, since both the Tree and Hourglass algorithms allow negative arc weights, it would be interesting to analyze their expected-case running time complexity for a model that permits negative arcs such as the vertex potential model [5, 6]. Overall, the Tree algorithm is simple to implement and offers very good performance. The Hourglass algorithm has the potential to be even better but probably requires a more A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 21 complex implementation. It is also worthwhile to note that the space requirement of the Tree algorithm is not worse than the space requirement of any algorithm that reports all shortest paths. The Hourglass algorithm requires an additional matrix of size n2. ORCID iDs Andrej Brodnik https://orcid.org/ 0000-0001-9773-0664 Rok Požar https://orcid.org/ 0000-0002-2037-9535 References [1] L. Addario-Berry, N. Broutin and G. Lugosi, The longest minimum-weight path in a complete graph, Comb. Probab. Comput. 19 (2010), 1–19, doi:10.1017/s0963548309990204. [2] R. K. Ahuja, T. L. Magnanti and J. B. Orlin, Network Flows: Theory, Algorithms, and Appli- cations, Prentice-Hall, Inc., USA, 1993. [3] P. A. Bloniarz, A shortest-path algorithm with expected time O(n2logn log∗n), SIAM J. Com- put. 12 (1983), 588–600, doi:10.1137/0212039. [4] A. Brodnik and M. Grgurovič, Solving all-pairs shortest path by single-source computations, Discrete Appl. Math. 231 (2017), 119–130, doi:10.1016/j.dam.2017.03.008. [5] B. V. Cherkassky, A. V. Goldberg and T. Radzik, Shortest paths algorithms: theory and ex- perimental evaluation, Math. Programming 73 (1996), 129–174, doi:10.1016/0025-5610(95) 00021-6. [6] C. Cooper, A. Frieze, K. Mehlhorn and V. Priebe, Average-case complexity of shortest-paths problems in the vertex-potential model, Random Struct. Algorithms 16 (2000), 33–46, doi: 10.1002/(sici)1098-2418(200001)16:1⟨33::aid-rsa3⟩3.0.co;2-0. [7] T. H. Cormen, C. Stein, R. L. Rivest and C. E. Leiserson, Introduction to Algorithms, McGraw- Hill Higher Education, 2nd edition, 2001. [8] C. Demetrescu and G. F. Italiano, A new approach to dynamic all pairs shortest paths, J. ACM 51 (2004), 968–992, doi:10.1145/1039488.1039492. [9] C. Demetrescu and G. F. Italiano, Experimental analysis of dynamic all pairs shortest path algorithms, ACM Trans. Algorithms 2 (2006), 578–601, doi:10.1145/1198513.1198519. [10] E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik 1 (1959), 269–271, doi:10.1007/bf01386390. [11] R. W. Floyd, Algorithm 97: Shortest path, Commun. ACM 5 (1962), 345, doi:10.1145/367766. 368168. [12] T. Hagerup and C. Rüb, A guided tour of Chernoff bounds, Inf. Process. Lett. 33 (1990), 305– 308, doi:10.1016/0020-0190(90)90214-I. [13] Y. Han and T. Takaoka, An O(n3 log log n/ log2 n) time algorithm for all pairs shortest paths, in: Proceedings of the 13th Scandinavian Conference on Algorithm Theory, Springer-Verlag, Berlin, Heidelberg, SWAT’12, 2012 pp. 131–141, doi:10.1007/978-3-642-31155-0 12. [14] S. Janson, One, two and three times logn/n for paths in a complete graph with random weights, Combin. Probab. Comput. 8 (1999), 347–361, doi:10.1017/s0963548399003892. [15] D. B. Johnson, Efficient algorithms for shortest paths in sparse networks, J. ACM 24 (1977), 1–13, doi:10.1145/321992.321993. [16] D. Karger, D. Koller and S. J. Phillips, Finding the hidden path: time bounds for all-pairs shortest paths, SIAM J. Comput. 22 (1993), 1199–1217, doi:10.1137/0222071. 22 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22 [17] K. Mehlhorn and V. Priebe, On the all-pairs shortest-path algorithm of Moffat and Takaoka, Random Struct. Algorithms 10 (1997), 205–220, doi:10.1002/(sici)1098-2418(199701/03)10: 1/2⟨205::aid-rsa11⟩3.3.co;2-J. [18] A. Moffat and T. Takaoka, An all pairs shortest path algorithm with expected time O(n2 logn), SIAM J. Comput. 16 (1987), 1023–1031, doi:10.1137/0216065. [19] Y. Peres, D. Sotnikov, B. Sudakov and U. Zwick, All-pairs shortest paths in O(n2) time with high probability, J. ACM 60 (2013), Article No. 26 (25 pages), doi:10.1145/2508028.2505988. [20] S. Pettie, A new approach to all-pairs shortest paths on real-weighted graphs, Theor. Comput. Sci. 312 (2004), 47–74, doi:10.1016/s0304-3975(03)00402-x. [21] S. Pettie, Towards a final analysis of pairing heaps, in: 46th Annual IEEE Symposium on Foun- dations of Computer Science (FOCS’05), IEEE Computer Society, Washington, DC, USA, 2005 pp. 174–183, doi:10.1109/sfcs.2005.75. [22] S. Pettie, V. Ramachandran and S. Sridhar, Experimental evaluation of a new shortest path algorithm, in: Proceedings of the 4th International Workshop on Algorithm Engineering and Experiments 2002 (ALENEX ’02), Springer-Verlag, Berlin, Heidelberg, 2002 pp. 126–142, doi: 10.1007/3-540-45643-0 10. [23] M. Raab and A. Steger, “Balls into bins”—a simple and tight analysis, in: Randomization and Approximation Techniques in Computer Science (Barcelona, 1998), Springer, Berlin, volume 1518 of Lecture Notes in Computer Science, pp. 159–170, 1998, doi:10.1007/3-540-49543-6 13. [24] P. M. Spira, A new algorithm for finding all shortest paths in a graph of positive arcs in average time O(n2log2n), SIAM J. Comput. 2 (1973), 28–32, doi:10.1137/0202004. [25] T. Takaoka and A. Moffat, An O(n2 logn log logn) expected time algorithm for the all shortest distance problem, in: Mathematical Foundations of Computer Science 1980, Springer, Berlin, Heidelberg, volume 88 of Lecture Notes in Computer Science, 1980 pp. 643–655, doi:10.1007/ bfb0022539, proceedings of the 9th Symposium Held in Rydzyna, Poland, September 1 – 5, 1980. [26] S. Warshall, A theorem on boolean matrices, J. ACM 9 (1962), 11–12, doi:10.1145/321105. 321107.