ISSN 1855-3966 (printed edn.), ISSN 1855-3974 (electronic edn.)
ARS MATHEMATICA CONTEMPORANEA 22 (2022) #P1.01 / 1–22
https://doi.org/10.26493/1855-3974.2467.497
(Also available at http://amc-journal.eu)
Modifications of the Floyd-Warshall algorithm
with nearly quadratic expected-time*
Andrej Brodnik †
University of Primorska, UP FAMNIT, Glagoljaška 8, 6000 Koper, Slovenia
University of Primorska, UP IAM, Muzejski trg 2, 6000 Koper, Slovenia
University of Ljubljana, UL FRI, Večna pot 113, 1000 Ljubljana, Slovenia
Marko Grgurovič
University of Primorska, UP FAMNIT, Glagoljaška 8, 6000 Koper, Slovenia
University of Primorska, UP IAM, Muzejski trg 2, 6000 Koper, Slovenia
Rok Požar ‡
University of Primorska, UP FAMNIT, Glagoljaška 8, 6000 Koper, Slovenia
University of Primorska, UP IAM, Muzejski trg 2, 6000 Koper, Slovenia
IMFM, Jadranska 19, 1000 Ljubljana, Slovenia
Received 25 October 2020, accepted 10 April 2021, published online 25 November 2021
Abstract
The paper describes two relatively simple modifications of the well-known Floyd-
Warshall algorithm for computing all-pairs shortest paths. A fundamental difference of
both modifications in comparison to the Floyd-Warshall algorithm is that the relaxation is
done in a smart way. We show that the expected-case time complexity of both algorithms
is O(n2 log2 n) for the class of complete directed graphs on n vertices with arc weights
selected independently at random from the uniform distribution on [0, 1]. Theoretically
best known algorithms for this class of graphs are all based on Dijkstra’s algorithm and
obtain a better expected-case bound. However, by conducting an empirical evaluation we
prove that our algorithms are at least competitive in practice with best know algorithms
and, moreover, outperform most of them. The reason for the practical efficiency of the
presented algorithms is the absence of use of priority queue.
*A preliminary version of this work has been published in Shortest Path Solvers: From Software to Wetware,
volume 32 of Emergence, Complexity and Computation (2018). The authors would like to thank the reviewer for
excellent comments that substantially improved the quality of the paper.
†This work is sponsored in part by the Slovenian Research Agency (research program P2-0359 and research
projects J1-2481, J2-2504, and N2-0171).
‡Corresponding author. This work is supported in part by the Slovenian Research Agency (research program
P1-0285 and research projects N1-0062, J1-9110, J1-9187, J1-1694, N1-0159, J1-2451).
cb This work is licensed under https://creativecommons.org/licenses/by/4.0/
2 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
Keywords: All-pairs shortest paths, probabilistic analysis.
Math. Subj. Class. (2020): 05C85, 68W40
1 Introduction
Finding shortest paths in graphs is a classic problem in algorithmic graph theory. Given a
(directed) graph in which arcs are assigned weights, a shortest path between pair of vertices
is such a path that infimizes the sum of the weights of its constituent arcs. The problem
pops up very frequently also in practice in areas like bioinformatics, logistics, VLSI design
(for a more comprehensive list of applications see e.g. [2]). Two of the most common
problem’s variants are the single-source shortest path problem and the all-pairs shortest
path problem (APSP). In the first variant of the problem, we are searching for paths from
a fixed vertex to every other vertex, while the APSP asks for a shortest path between every
pair of vertices. In this paper we focus exclusively on the APSP variant of the problem.
In general, the APSP can be solved by using the technique of relaxation. The relaxation
consists of testing whether we can improve the weight of the shortest path from u to v
found so far by going via w, and updating it if necessary. In fact, the number of attempts
to perform relaxation corresponds to the time complexity under the RAM model. A trivial
text-book relaxation-based solution to the APSP is a dynamic programming Floyd-Warshall
algorithm [11] running in O(n3) time on graphs with n vertices.
Moreover, also Dijkstra’s algorithm [10] solving single-source shortest path problem
is relaxation-based. However, since the order in which the relaxations are performed is
greedy, it uses an additional priority queue data structure. Obviously we can solve the
APSP running Dijkstra’s algorithm from each vertex of the graph obtaining O(mn log n)
solution where m is the number of arcs in the graph, provided we use the binary heap imple-
mentation of the priority queue. This is an improvement over the Floyd-Warshal solution
for sparse graphs. Asymptotically we get an even better solution by using Fibonacci heaps
over binary heaps yielding O(n2 log n+mn) time complexity. We refer to such approaches
as a Dijkstra-like, which inherently use some kind of a priority queue implementation.
However, the described solutions to the APSP using the Dijkstra’s algorithm have at
least two limitations. The first one is that all arc weights have to be non-negative. This
can be alleviated by using Johnson’s approach [15], which reweighs all arc weights making
them non-negative. On such a graph we can now run Dijkstra’s algorithm. The second lim-
itation is related to the efficiency of the solution implementation. Namely, due to computer
architecture efficient implementations exploit the issue of data locality; i.e. in consecutive
memory accesses they try to access memory locations that are “close together”. A simi-
lar observation is made in [5] for the solutions to the single-source shortest path problem,
where authors show that a Fibonacci heap, as asymptotically better implementation of a
priority queue, in practice underperform simple binary heap.
For dense graphs, a slightly better worst-case running time of O(n3 log log n/ log2 n)
over the O(n3)-time Floyd-Warshall algorithm can be achieved by using an efficient matrix
multiplication technique [13]. For sparse graphs on n vertices and with m non-negative
weighted arcs fastest known solution [20] runs in time O(mn+ n2 log log n).
E-mail addresses: andrej.brodnik@upr.si (Andrej Brodnik), marko.grgurovic@famnit.upr.si (Marko
Grgurovič), rok.pozar@upr.si (Rok Požar)
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 3
Considering expected-case running-time of APSP algorithms we can find in the litera-
ture a number of good solutions assuming that input instances are generated according to a
probability model on the set of complete directed graphs with arc weights. In the uniform
model, arc weights are drawn at random, independently of each other, according to a com-
mon probability distribution. A more general model is the endpoint-independent model
[3, 24], where, for each vertex v, a sequence of n−1 non-negative arc weights is generated
by a deterministic or stochastic process and then randomly permuted and assigned to the
outgoing arcs of v. In the vertex potential model [5, 6], arc weights can be both positive and
negative. This is a probability model with arbitrary real arc weights but without negative
cycles.
In the uniform model with arc weights drawn from the uniform distribution on [0, 1],
the O(n2 log n) expected time complexity algorithms for solving the APSP were pre-
sented by Karger et al. [16] and Demetrescu and Italiano [8, 9], where the latter was
inspired by the former. Another algorithm with the same expected time complexity was
presented by Brodnik and Grgurovič [4]. Peres et al. [19] improved the Demetrescu and
Italiano algorithm to theoretically optimal O(n2) by replacing the priority queue imple-
mentation with a more involved data structure yielding theoretically desired time complex-
ity. In the endpoint-independent model, Spira [24] proved an expected-case time bound of
O(n2 log2 n), which was improved by several authors. Takaoka and Moffat [25] improved
the bound to O(n2 log n log log n). Bloniarz [3] described an algorithm with expected-
case running time O(n2 log n log∗ n). Finally, Moffat and Takaoka [18] and Mehlhorn
and Priebe [17] improved the running time to O(n2 log n). In the vertex potential model,
Cooper et al. [6] gave an algorithm with an expected-case running time O(n2 log n). All
the above algorithms use Dijkstra-like approach.
In this paper, we present two modifications of the Floyd-Warshall algorithm, which we
name the Tree algorithm and the Hourglass algorithm. A fundamental difference of both
modifications in relation to the Floyd-Warshall algorithm is a smarter way to perform the
relaxations. This is done by introducing a tree structure that allows us to skip relaxations
that do not contribute to the result. The worst-case time complexity of both algorithms
remains O(n3), however, in the analysis we show that their expected running time is sub-
stantially better. To simplify the analysis, we consider the uniform model which gives us
the following main result.
Theorem 1.1. For complete directed graphs on n vertices with arc weights selected in-
dependently at random from the uniform distribution on [0, 1], the Tree algorithm and the
Hourglass algorithm both have an expected-case running time of O(n2 log2 n).
The proof of our main result relies on the following well-known properties of the com-
plete directed graph on n vertices with uniformly distributed arc weights on [0, 1]. First,
a maximum weight of a shortest path in such a graph is O(log n/n) with high probability;
second, a longest shortest path has O(log n) arcs with high probability; and third, the max-
imum outdegree of the subgraph consisting of all arcs that are shortest paths is O(log n)
with high probability. These properties, together with the observation that if the relaxation
on some vertex of the introduced tree structure fails, we can skip relaxations on the entire
subtree defined by this vertex (see Lemma 3.1), then give the desired result. Since theoreti-
cally best expected-case APSP algorithms are based on Dijkstra’s algorithm, it is interesting
that a competitive approach can also be obtained by a modification of the Floyd-Warshall
algorithm.
4 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
To prove the practical competitiveness of our algorithms, we supplement the theoret-
ical analysis with an empirical evaluation. It should be pointed out, that all algorithms
mentioned above with o(n2 log2 n) expected-case running time obtain a better theoretical
bound. Moreover, Brodnik and Grgurovič in [4] show, for the same family of graphs as
studied in this paper, practical supremacy of their algorithm over the algorithms due to
Karger et al. [16] and Demetrescu and Italiano [8, 9] and consequently over the algorithm
of Peres et al. [19], since its improvement of Demetrescu and Italiano solution does not
improve the practical efficiency of the original algorithm. Therefore in the practical evalu-
ation of Tree and Hourglass algorithms we compare them to the algorithm of Brodnik and
Grgurovič [4] only. The reason for the practical efficiency of the presented algorithms is
the absence of use of priority queue. Indeed, the Tree and Hourglass algorithms are simple
to implement and use only simple structures such as vectors and arrays, which also exhibit
a high data locality.
The structure of the paper is the following. Section 2 contains the necessary notation
and basic definitions to make the paper self-contained. In Section 3 we describe the Tree
and Hourglass algorithms. Properties of certain shortest paths in complete graphs with
independently and uniformly distributed arc weights are analyzed in Section 4. The proof
of the main result is presented in Section 5, while Section 6 contains empirical evaluation
of the algorithms. In Section 7 we give some concluding remarks and open problems.
2 Preliminaries
All logarithms are base e unless explicitly stated otherwise. The model of computation
used in algorithm design and analysis is the comparison-addition model, where the only
allowed operations on arc weights are comparisons and additions.
A digraph (or directed graph) G is a pair (V,A), where V is a non-empty finite set of
vertices and A ⊆ V × V a set of arcs. We assume V = {v1, v2, . . . , vn} for some n. The
two vertices joined by an arc are called its endvertices. For an arc (u, v) ∈ A, we say that
u is its tail. The outdegree of v ∈ V , is the number of arcs in A that have v as their tail.
The maximum outdegree in G is denoted by ∆(G).
A digraph G′ = (V ′, A′) is a subdigraph of the digraph G = (V,A) if V ′ ⊆ V and
A′ ⊆ A. The (vertex-)induced subdigraph with the vertex set S ⊆ V , denoted by G[S],
is the subgraph (S,C) of G, where C contains all arcs a ∈ A that have both endvertices
in S, that is, C = A ∩ (S × S). The (arc-)induced subdigraph with the arc set B ⊆ A,
denoted by G[B], is the subgraph (T,B) of G, where T is the set of all those vertices in V
that are endvertices of at least one arc in B.
A path P in a digraph G from vP,0 to vP,m is a finite sequence P = vP,0, vP,1, . . . , vP,m
of pairwise distinct vertices such that (vP,i, vP,i+1) is an arc of G, for i = 0, 1, . . . ,m− 1.
The length of a path P , denoted by |P |, is the number of vertices occurring on P . Any
vertex of P other than vP,0 or vP,m is an intermediate vertex. A k-path is a path in
which all intermediate vertices belong to the subset {v1, v2, . . . , vk} of vertices for some
k. Obviously, a 0-path has no intermediate vertices.
A weighted digraph is a digraph G = (V,A) with a weight function w : A → R that
assigns each arc a ∈ A a weight w(a). A weight function w can be extended to a path P
by w(P ) =
∑m−1
i=0 w(vP,i, vP,i+1). A shortest path from u to v, denoted by u ⇝ v, is
a path in G whose weight is infimum among all paths from u to v. The distance between
two vertices u and v, denoted by DG(u, v), is the weight of a shortest path u ⇝ v in G.
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 5
The diameter of G is maxu,v∈V DG(u, v), that is, the maximum distance between any two
vertices in G. Given a subset S ⊆ V , the distance between S and a vertex v in G, denoted
by DG(S, v), is DG(S, v) = minu∈S DG(u, v). A shortest k-path from u to v is denoted
by u k⇝v. Further, we denote the set of arcs that are shortest k-paths in G by A(k) and the
subdigraph G[A(k)] by G(k).
Finally, we will need some tools from combinatorics. In the balls-into-bins process M
balls are thrown uniformly and independently into N bins. The maximum number of balls
in any bin is called the maximum load. Let Li denote the load of bin i, i ∈ {1, 2, . . . , N}.
The next lemma, used in Subsection 4.3, provides an upper bound on the maximum load.
It is a simplified version of a standard result, c.f. [23], tailored to our present needs. For
completeness we provide the proof.
Lemma 2.1. If M balls are thrown into N bins where each ball is thrown into a bin chosen
uniformly at random, then P(max1≤i≤N Li ≥ e2(M/N + logN)) = O(1/N).
Proof. First, we have µ = E(Li) = M/N , i = 1, 2, . . . , N , and we can write each Li as
a sum Li = Xi1 + Xi2 + · · · + XiM , where Xij is a random variable taking value 1, if
ball j is in bin i, and 0 otherwise. Next, since Li is a sum of independent random variables
taking values in {0, 1}, we can apply, for any particular bin i and for every c > 1, the
multiplicative Chernoff bound [12], which states that
P(Li ≥ cµ) ≤
(
ec−1
c c
)µ
≤
(
e
c
)cµ
.
We consider two cases, depending on whether µ ≥ logN or not. Let µ ≥ logN . Take
c = e2. Then,
P(Li ≥ e2µ) ≤
(
1
e
)e2µ
≤
(
1
e
)e2 logN
=
1
Ne2
≤ 1
N2
.
Consider now µ < logN . Take c = e2 NM logN . Since x
−x ≤
(
1
e
)x
for all x ≥ e, we have
P
(
Li ≥ µe2
N
M
logN
)
= P(Li ≥ e2 logN) ≤
(
e
c
)cµ
=
((
c
e
)− ce)eµ
≤
((
1
e
) c
e
)eµ
=
(
1
e
)e2 logN
≤ 1
N2
.
Putting everything together, we get that
P(Li ≥ e2(µ+ logN)) ≤ P(Li ≥ e2µ | µ ≥ logN) + P(Li ≥ e2 logN | µ < logN)
≤ 1
N2
+
1
N2
=
2
N2
.
This, by the union bound, implies that
P
(
max
1≤i≤N
Li ≥ e2(µ+ logN)
)
≤
N∑
i=1
P(Li ≥ e2(µ+ logN)) ≤ N
2
N2
= O(1/N).
6 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
3 Speeding up the Floyd-Warshall algorithm
The Floyd-Warshall algorithm [11, 26] as presented in Algorithm 1 is a simple dynamic
programming approach to solve APSP on a graph G(V,A) represented by a weight matrix
W , where Wij = w(vi, vj) if (vi, vj) ∈ A and ∞ otherwise. Its running time is O(n3)
due to three nested for loops.
Algorithm 1 FLOYD-WARSHALL(W )
1 for k := 1 to n do
2 for i := 1 to n do
3 for j := 1 to n do
4 if Wik +Wkj < Wij then ▷ Relaxation
5 Wij := Wik +Wkj
3.1 The Tree algorithm
Let us consider iteration k, and let OUTk denote a shortest path tree rooted at vertex vk
in G(k−1). Intuitively, one might expect that the relaxation in lines 4-5 would not always
succeed in lowering the value of Wij which currently contains the weight w(vi
k−1
⇝ vj).
This is precisely the observation that we exploit to arrive at a more efficient algorithm:
instead of simply looping through every vertex of V in line 3, we perform the depth-first
traversal of OUTk. This permits us to skip iterations which provably cannot lower the
current value of Wij . As the following lemma shows, if w(vi
k
⇝vj) = w(vi
k−1
⇝ vj), then
w(vi
k
⇝vy) = w(vi
k−1
⇝ vy) for all vertices vy in the subtree of vj in OUTk.
Lemma 3.1. Let vj ∈ V \ {vk} be some non-leaf vertex in OUTk, vy ̸= vj an arbitrary
vertex in the subtree of vj in OUTk, and vi ∈ V \ {vk}. Consider the path vi
k−1
⇝ vk
k−1
⇝ vj .
If w(vi
k−1
⇝ vk
k−1
⇝ vj) ≥ w(vi
k−1
⇝ vj), then w(vi
k−1
⇝ vk
k−1
⇝ vy) ≥ w(vi
k−1
⇝ vy).
Proof. Since vj is neither a leaf nor the root of OUTk, we have j < k, and so vi
k−1
⇝ vj
k−1
⇝ vy
is a (k − 1)-path between vi and vy . Because vi
k−1
⇝ vy is a shortest (k − 1)-path between
vi and vy , we have
w(vi
k−1
⇝ vy) ≤ w(vi
k−1
⇝ vj
k−1
⇝ vy) = w(vi
k−1
⇝ vj) + w(vj
k−1
⇝ vy)
≤ w(vi
k−1
⇝ vk
k−1
⇝ vj) + w(vj
k−1
⇝ vy) = w(vi
k−1
⇝ vk
k−1
⇝ vj
k−1
⇝ vy),
where the last inequality follows by the assumption. Finally, since vy is in the subtree
rooted at vj , we have vi
k−1
⇝ vk
k−1
⇝ vj
k−1
⇝ vy = vi
k−1
⇝ vk
k−1
⇝ vy , and so the last term is equal
to w(vi
k−1
⇝ vk
k−1
⇝ vy). This completes the proof.
The pseudocode of the modified Floyd-Warshall algorithm augmented with the tree
OUTk, named the Tree algorithm, is given in Algorithm 2. To perform depth first search we
first construct the tree OUTk in line 3 using CONSTRUCTOUT given in Algorithm 3. For
the construction of tree OUTk an additional matrix π, where πij specifies the penultimate
vertex on a k-shortest path from vi to vj (i.e. the vertex “before” vj)1 is used. More
1C.f. π(k)ij in [7, Sec. 25.2].
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 7
Algorithm 2 TREE(W )
1 Initialize π, an n× n matrix, as πij := i.
2 for k := 1 to n do
3 OUTk := CONSTRUCTOUTk(π)
4 for i := 1 to n do
5 Stack := empty
6 Stack.push(vk)
7 while Stack ̸= empty do
8 vx := Stack.pop()
9 for all children vj of vx in OUTk do
10 if Wik +Wkj < Wij then ▷ Relaxation
11 Wij := Wik +Wkj
12 πij := πkj
13 Stack.push(vj)
Algorithm 3 CONSTRUCTOUTk(π)
1 Initialize n empty trees: T1, . . . , Tn.
2 for i := 1 to n do
3 Ti.Root := vi
4 for i := 1 to n do
5 if i ̸= k then
6 Make Ti a subtree of the root of Tπki .
return Tk
precisely, the tree OUTk is obtained from π by making vi a child of vπki for all i ̸= k.
This takes O(n) time. Finally, we replace the iterations in lines 3-5 in Algorithm 1 with
depth-first tree traversal of OUTk in lines 5-13 in Algorithm 2. Note that if, for a given i
and a child vj , the condition in line 10 evaluates to false we do not traverse the subtree of
vj in OUTk.
Corollary 3.2. The Tree algorithm correctly computes all-pairs shortest paths.
Proof. The correctness of the algorithm follows directly from Lemma 3.1.
Time complexity
Let Tk denote the running time of the algorithm TREE(W ) in lines 3-13 at iteration k. As
already said, line 3 requires O(n) time. To estimate the time complexity of lines 4-13,
we charge the vertex vx in line 8 by the number of its children. This pays for lines 9-13.
Furthermore, this means that on the one hand leaves are charged nothing, while on the
other hand nobody is charged for the root vk. To this end, let SP
(k)
k be the set of all shortest
k-paths that contain vk and end at some vertex in the set {v1, v2, . . . , vk}. Now vx in line
8 is charged at most |SP(k)k | times over all iterations of i. Since the number of children of
vx is bounded from above by ∆(OUTk), we can bound Tk from above by
Tk ≤ |SP(k)k | ·∆(OUTk) +O(n). (3.1)
8 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
Practical improvement
Observe that in Algorithm 2 vertices of OUTk are visited in a depth-first search (DFS) order,
which is facilitated by using the stack. However, this requires pushing and popping of each
vertex, as well as reading of all its children in OUTk. We can avoid this by precomputing
two read-only arrays dfs and skip to support the traversal of OUTk. The array dfs consists
of OUTk vertices as visited in the DFS order. On the other hand, the array skip is used to
skip OUTk subtree when relaxation in line 10 of Algorithm 2 does not succeed.
In detail, for a vertex vz , skipz contains the index in dfs of the first vertex after vz
in the DFS order that is not a descendant of vz in OUTk. Utilizing the arrays outlined
above, we traverse OUTk by scanning dfs in left-to-right order and using skip whenever
a relaxation is not made. Consequently, we perform only two read operations per visited
vertex. It should be pointed out that the asymptotic time remains the same, as this is solely
a technical optimization.
3.2 The Hourglass algorithm
We can further improve the Tree algorithm by using another tree. The second tree, denoted
by INk, is similar to OUTk, except that it is a shortest path “tree” for paths vi
k−1
⇝ vk for
each vi ∈ V \ {vk}. Strictly speaking, this is not a tree, but if we reverse the directions of
the arcs, it turns it into a tree with vk as the root. Traversal of INk is used as a replacement
of the for loop on variable i in line 4 of Algorithm 2 (in line 2 of Algorithm 1). As the
following lemma shows, if w(vi
k
⇝vj) = w(vi
k−1
⇝ vj), then w(vy
k
⇝vj) = w(vy
k−1
⇝ vj) for
all vertices vy in the subtree of vi in INk.
Lemma 3.3. Let vi ∈ V \ {vk} be some non-leaf vertex in INk and let vy ̸= vi be an arbi-
trary vertex in the subtree of vi in INk, and vj ∈ V \{vk}. Consider the path vi
k−1
⇝ vk
k−1
⇝ vj .
If w(vi
k−1
⇝ vk
k−1
⇝ vj) ≥ w(vi
k−1
⇝ vj), then w(vy
k−1
⇝ vk
k−1
⇝ vj) ≥ w(vy
k−1
⇝ vj).
Proof. Due to the choice of vi and vy we have: vy
k−1
⇝ vk = vy
k−1
⇝ vi
k−1
⇝ vk. We want to
show, that:
w(vy
k−1
⇝ vj) ≤ w(vy
k−1
⇝ vi) + w(vi
k−1
⇝ vk
k−1
⇝ vj).
Observe that i < k, since vi is neither a leaf nor the root of INk. Thus we have:
w(vy
k−1
⇝ vj) ≤ w(vy
k−1
⇝ vi) + w(vi
k−1
⇝ vj).
Putting these together we get the desired inequality:
w(vy
k−1
⇝ vj) ≤ w(vy
k−1
⇝ vi) + w(vi
k−1
⇝ vj) ≤ w(vy
k−1
⇝ vi) + w(vi
k−1
⇝ vk
k−1
⇝ vj).
The pseudocode of the modified Floyd-Warshall algorithm augmented with the trees
OUTk and INk, named the Hourglass algorithm2, is given in Algorithms 4 and 5. To con-
struct INk efficiently, we need to maintain an additional matrix ϕij which stores the second
2The hourglass name comes from placing INk tree atop the OUTk tree, which gives it an hourglass-like shape,
with vk being at the neck.
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 9
vertex on the path from vi to vj (cf. π and πij). Algorithm 6 constructs INk similarly to
the construction of OUTk, except that we use the matrix ϕik instead. The only extra space
requirement of the Hourglass algorithm that bears any significance is the matrix ϕ, which
does not deteriorate the space complexity of O(n2). The depth-first traversal on INk is
performed by a recursion on each child of vk in line 7 of Algorithm 4. In the recursive
step, given in Algorithm 5, we can prune OUTk as follows: if vi is the parent of vy in INk
and vi
k−1
⇝ vj ≤ vi
k−1
⇝ vk
k−1
⇝ vj , then the subtree of vj can be removed from OUTk, while
inspecting the subtree of vi in INk. Before the return from the recursion the tree OUTk is
reconstructed to the form it was passed as a parameter to the function.
Algorithm 4 HOURGLASS(W )
1 Initialize π, an n× n matrix, as πij := i.
2 Initialize ϕ, an n× n matrix, as ϕij := j.
3 for k := 1 to n do
4 OUTk := CONSTRUCTOUTk(π)
5 INk := CONSTRUCTINk(ϕ)
6 for all children vi of vk in INk do
7 RECURSEIN(W,π, ϕ, INk,OUTk, vi)
Algorithm 5 RECURSEIN(W,π, ϕ, INk,OUTk, vi)
1 Stack := empty
2 Stack.push(vk)
3 while Stack ̸= empty do
4 vx := Stack.pop()
5 for all children vj of vx in OUTk do
6 if Wik +Wkj < Wij then ▷ Relaxation
7 Wij := Wik +Wkj
8 πij := πkj
9 ϕij := ϕik
10 Stack.push(vj)
11 else
12 Remove the subtree of vj from OUTk.
13 for all children vi′ of vi in INk do
14 RECURSEIN(W,π, ϕ, INk,OUTk, vi′ )
15 Restore OUTk by reverting changes done by all iterations of line 12.
In practice, the recursion can be avoided by using an additional stack, which further
speeds up an implementation of the algorithm.
Corollary 3.4. The Hourglass algorithm correctly computes all-pairs shortest paths.
Proof. Observe, that lines 5-10 of Algorithm 5 are effectively the same as in Algorithm 2.
Line 12 of Algorithm 5 does not affect the correctness of the algorithm due to Lemma 3.3,
which states that, for any vi′ that is a child of vi in INk, these comparisons can be skipped,
as they cannot lead to shorter paths. However, Lemma 3.3 does not apply to a sibling vi∗
10 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
Algorithm 6 CONSTRUCTINk(ϕ)
1 Initialize n empty trees: T1, . . . , Tn.
2 for i := 1 to n do
3 Ti.Root := vi
4 for i := 1 to n do
5 if i ̸= k then
6 Make Ti a subtree of the root of Tπki .
return Tk
of vi, arising from line 6 of Algorithm 4. Therefore line 15 restores the tree OUTk, which
maintains the correctness of the algorithm.
Finally, note that the worst-case time complexity of the Hourglass (and Tree) algo-
rithm remains O(n3). The simplest example of this is when all shortest paths are the arcs
themselves, at which point all leaves are children of the root and the tree structure never
changes.
4 Properties of shortest k-paths in complete graphs
Let Kn denote a complete digraph on the vertex set V = {v1, v2, . . . , vn}.
4.1 Distances
We assume that arc weights of Kn are exponential random variables with mean 1 and that
all n(n − 1) random arc weights are independent. Due to the memoryless property, it
is easier to deal with exponentially distributed arc weights than directly with uniformly
distributed arc weights. The aim of this subsection is to show that the diameter of K(k)n ,
the subdigraph of Kn consisting of all (weighted) arcs that are shortest k-paths in Kn, is
O(log n/k) with very high probability. We note, however, that by the same argument as
given in the beginning of Subsection 4.3, all results derived in this subsection for expo-
nential arc weights also hold, asymptotically for [0, 1]-uniformly distributed arc weights as
soon as k ≥ log2 n.
We start by considering for a fixed u ∈ V , the maximum distance in K(k)n between u
and other vertices in V . To this end, let S = {u, v1, . . . , vk} ⊆ V , and let S = V \ S. We
clearly have
max
v∈V
D
K
(k)
n
(u, v) ≤ max
v∈S
DKn[S](u, v) + max
v∈S
DKn[S×S](S, v), (4.1)
that is, the maximum distance in K(k)n between u and other vertices in V is bounded above
by the sum of the maximum distance in Kn[S] between u and other vertices in S, and by
the maximum distance in Kn[S × S] between S and vertices in S. We note that Kn[S]
is a complete digraph on |S| vertices and Kn[S × S] is a complete bipartite digraph with
bipartition (S, S).
To provide an upper bound on maxv∈S DKn[S](u, v), we use the following result,
which follows from the equation (2.8) in the proof of Theorem 1.1 of Janson [14].
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 11
Theorem 4.1 ([14, Theorem 1.1]). Let u ∈ V be a fixed vertex of Kn. Then for every
a > 0, we have
P
(
max
v∈V
DKn(u, v) ≥
a log n
n
)
= O(ean2−a log2 n).
Lemma 4.2. Let 8 ≤ k ≤ n, and let S ⊆ V with |S| = k. Then, for a fixed u ∈ S and for
any constant c > 0, we have
P
(
max
v∈S
DKn[S](u, v) ≥
c log n
k
)
= O(n2−c/2 log2 n).
Proof. By Theorem 4.1, for any a > 0 we have
P
(
max
v∈S
DKn[S](u, v) ≥
a log k
k
)
= O(eak2−a log2 k).
Setting a = c log n/ log k we get
eak2−a log2 k = ec logn/ log kk2k−c logn/ log k log2 k ≤ (elogn)c/2k2(klogk n)−c log2 k.
In the last step we used the fact that 1/ log k ≤ 1/2 for k ≥ 8 and that log n/ log k =
logk n. Furthermore,
(elogn)c/2k2(klogk n)−c log2 k = nc/2k2n−c log2 k = O(n2−c/2 log2 n),
and the result follows.
Next, we provide an upper bound on maxv∈S DKn[S×S](S, v).
Lemma 4.3. Let 1 ≤ k ≤ n, let S ⊆ V with |S| = k, and let S = V \ S. Then for any
constant c > 0, we have
P
(
max
v∈S
DKn[S×S](S, v) ≥
c log n
k
)
= O(n1−c log n).
Proof. Let Z = maxv∈S DKn[S×S](S, v). Arguing similarly as in the proof of Theorem
1.1 of Janson [14], Z is distributed as
n−1∑
j=k
Xj ,
where Xj are independent exponentially distributed random variables with mean 1k(n−j) .
First, for any constant c > 0, the Chernoff bound [12] states that
P(Z ≥ c log n/k) ≤ e−tc lognE(ektZ).
Further, for −∞ < t ≤ 1, we have
E(ektZ) =
n−1∏
j=k
E(ektXj ) =
n−1∏
j=k
(
1− t
n− j
)−1
.
12 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
Using the inequality − log(1 − x) ≤ x + x2 for all 0 ≤ x ≤ 1/2, we can bound, for all
0 < t < 1 and k ≤ j ≤ n− 2, each term (1− t/(n− j))−1 as follows(
1− t
n− j
)−1
= exp
(
− log
(
1− t
n− j
))
≤ exp
(
t
n− j
+
(
t
n− j
)2)
.
This gives us
P(Z ≥ c log n/k) ≤ (1− t)−1 exp
(
− tc log n+
n−2∑
j=k
(
t
n− j
+
(
t
n− j
)2))
= (1− t)−1 exp(−tc log n+ t log(n− k) +O(1)).
Taking t = 1− 1/ log n, we finally get
P(Z ≥ c log n/k) ≤ (1/ log n)−1 exp(−c log n+ log n+O(1)) = O(n1−c log n).
We are now ready to show that the diameter of K(k)n is O(log n/k) with very high
probability.
Theorem 4.4. Let 8 ≤ k ≤ n. Then, for any constant c > 0, we have
P
(
max
u,v∈V
D
K
(k)
n
(u, v) ≥ c log n
k
)
= O(n3−c/4 log2 n).
Proof. Let S = {u, v1, . . . , vk} ⊆ V , let S = V \ S, and write α = c log n/k. Then, by
inequality (4.1), we have
P
(
max
v∈V
D
K
(k)
n
(u, v) ≥ α
)
≤ P
(
max
v∈S
DKn[S](u, v) + max
v∈S
DKn[S×S](S, v) ≥ α
)
≤ P
(
max
v∈S
DKn[S](u, v) ≥
α
2
)
+ P
(
max
v∈S
DKn[S×S](S, v) ≥
α
2
)
.
By Lemma 4.2, we have
P
(
max
v∈S
DKn[S](u, v) ≥
α
2
)
= O(n2−c/4 log2 n),
and, by Lemma 4.3,
P
(
max
u∈S
DKn[S×S](S, v) ≥
α
2
)
= O(n1−c/2 log n).
Putting everything together, we get
P
(
max
v∈V
D
K
(k)
n
(u, v) ≥ α
)
= O(n2−c/4 log2 n),
which, by the union bound, implies
P
(
max
u,v∈V
D
K
(k)
n
(u, v) ≥ α
)
≤ nP
(
max
u∈V
D
K
(k)
n
(v, u) ≥ α
)
= O(n3−c/4 log2 n).
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 13
4.2 Lengths
Let all arc weights of Kn be either independent [0, 1]-uniform random variables or inde-
pendent exponential random variables with mean 1. In this subsection, we bound the length
of the longest shortest k-path in Kn.
The proof of our next lemma follows directly from Theorem 1.1 of Addario-Berry et.
al [1] on the longest shortest path in Kn.
Theorem 4.5 ([1, Theorem 1.1]). The following two properties hold:
(i) For every t > 0, we have
P
(
max
u,v∈V
|u⇝v| ≥ α∗ log n+ t
)
≤ eα
∗+t/ logne−t,
where α∗ ≈ 3.5911 is the unique solution of α logα− α = 1.
(ii) E(maxu,v∈V |u⇝v|) = O(log n).
Lemma 4.6. The following two properties hold:
(i) For every c > 5 and 8 ≤ k ≤ n, we have P(maxu,v∈V |u
k
⇝v| ≥ c log n) =
O(n2−c/2).
(ii) E(maxu,v∈V |u
k
⇝v|) = O(log k).
Proof. Let S = {v1, v2, . . . , vk}, and let u → w
k
⇝z → v be a shortest k-path in Kn. Since
w
k
⇝z is a shortest path from w to z in Kn[S], we have
max
u,v∈V
|u k⇝v| ≤ max
w,z∈S
|w k⇝z|+ 2. (4.2)
By (i) of Theorem 4.5, for any t > 0,
P
(
max
w,z∈S
|w k⇝z| ≥ α∗ log k + t
)
≤ eα
∗+t/ log ke−t,
where α∗ ≈ 3.5911 is the unique solution of α logα−α = 1. Using t = (c−α∗) log n−2
gives us
P
(
max
w,z∈S
|w k⇝z|+ 2 ≥ α∗ log(k/n) + c log n
)
≤ eα
∗−2/ log k+2e(c−α
∗)( log nlog k −logn)
≤ eα
∗−2/ log k+2(elogn)1/2(α
∗−c)
= O(n2−c/2).
By inequality (4.2), we have
P
(
max
u,v∈V
|u k⇝v| ≥ c log n
)
≤ P
(
max
w,z∈S
|u k⇝v|+ 2 ≥ c log n
)
≤ P
(
max
w,z∈S
|w k⇝z|+ 2 ≥ α∗ log(k/n) + c log n
)
,
and (i) follows.
To prove (ii), we note that, by (ii) of Theorem 4.5, E(maxu,v∈S |u
k
⇝v|) = O(log k),
and, by inequality (4.2), the result follows.
14 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
4.3 Maximum outdegree
Let arc weights of Kn be independent [0, 1]-uniform random variables. Our goal in this
subsection is to show that the maximum outdegree of a shortest path tree OUTk in K
(k)
n is
O(log k + (n− k)/k) with high probability for all k ≥ log2 n.
Let now S = {v1, v2, . . . , vk} and S = V \ S. We can consider OUTk as consisting
of the subtree OUTk[S] to which each vertex from S is attached as a leaf. To see how
these vertices are attached to OUTk[S], let us assume for the moment that arc weights
are exponentially distributed with mean 1. Then, it is easy to see that a vertex v ∈ S is
attached to that one in S with which it forms a shortest arc, say av , between S and v.
Let (Kn[S × S])∗ be the subdigraph of Kn[S × S] with the set V of vertices and the set
{av | v ∈ S} of arcs. By observing that OUTk[S] is a subdigraph of the graph (Kn[S])(k)
consisting of all arcs that are shortest paths in Kn[S], we have
∆(OUTk) ≤ ∆((Kn[S])(k)) + ∆((Kn[S × S])∗). (4.3)
To extend the latter bound to uniform distribution, we use a standard coupling argument
as in [1]. Let U be a random variable uniform on [0, 1]. Then − log(1 − U) is an expo-
nential random variable with mean 1, and so we can couple the exponential arc weights
W ′(u, v) to uniform arc weights W (u, v) by setting W ′(u, v) = − log(1−W (u, v)). As
x ≤ − log(1 − x) ≤ x + 2x2 for all 0 ≤ x ≤ 1/2, we have that, for all arcs (u, v)
of Kn, |W ′(u, v) − W (u, v)| = O((W ′(u, v))2), uniformly for all W ′(u, v) ≤ 1/2. In
particular, if W ′(u, v) ≤ 12 log n/k, say, and k ≥ log2 n, then |W ′(u, v) − W (u, v)| =
O(1/ log2 n) for n large enough, and so for a path P with O(log n) vertices and with
W ′(P ) ≤ 12 log n/k, we have
|W ′(P )−W (P )| = O(1/ log n)
for n large enough. By Theorem 4.4, with very high probability a shortest (k − 1)-path
in Kn with the exponential arc weights has weight less than 12 log n/k, while by (i) of
Lemma 4.6, with very high probability it has O(log n) vertices. It then follows easily
that, for all n sufficiently large and k ≥ log2 n, the bound as in (4.3) holds for uniform
distribution, as well.
The following result on the maximum outdegree in the subgraph (Kn[S])(k) of the
complete graph Kn[S] on k vertices with [0, 1]-uniform arc weights can be found in Peres
et al. [19].
Lemma 4.7 ([19, Lemma 5.1]). Let 1 ≤ k ≤ n and let S ⊆ V with |S| = k. Then, for
every c > 6, we have P(∆((Kn[S])(k)) > c log k) = O(k1−c/6).
The maximum outdegree in (Kn[S × S])∗ is directly related to the maximum load in
the balls-into-bins process, which is used in the proof of the following lemma.
Lemma 4.8. Let 1 ≤ k ≤ n, let S ⊆ V with |S| = k, and let S = V \ S. Then,
P(∆((Kn[S × S])∗) ≥ e2((n− k)/k + log k)) = O(k−1).
Proof. Consider vertices from S as bins and vertices from S as balls. For v ∈ S, each arc
in S × v is equally likely to be the shortest, so v is thrown into a bin chosen uniformly at
random, and the result follows by Lemma 2.1 for N = k and M = n− k.
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 15
We are now ready to prove the main result of this subsection.
Theorem 4.9. For every k ≥ log2 n, we have
P
(
∆(OUTk) ≥ (e2 + 12) log k + e2
n− k
k
)
= O(k−1).
Proof. Let S = {v1, v2, . . . , vk} and S = V \ S. Further, let us write α = 12 log k and
β = e2((n− k)/k log k). By the inequality (4.3), for every k ≥ log2 n, we have
P(∆(OUTk) ≥ α+ β) ≤ P(∆((Kn[S])(k)) + ∆((Kn[S × S])∗) ≥ α+ β)
≤ P(∆((Kn[S])(k)) ≥ α) + P(∆((Kn[S × S])∗) ≥ β).
By Lemma 4.7, we have P(∆((Kn[S])(k)) ≥ α) ≤ 1/k. Similarly, by Lemma 4.8, we
have P(∆((Kn[S × S])∗) ≥ β) ≤ 1/k. Hence, P(∆(OUTk) ≥ α + β) ≤ 1/k + 1/k =
O(1/k).
5 Expected-case analysis
We perform an expected-case analysis of the Tree algorithm for the complete directed
graphs on n vertices with arc weights selected independently at random from the uniform
distribution on [0, 1]. Recall that SP(k)k is the set of all shortest k-paths that contain vk and
end at some vertex in the set {v1, v2, . . . , vk}. We first show that the expected number of
paths in SP(k)k is O(n log k).
Lemma 5.1. For each k = 1, 2, . . . , n, we have E(|SP(k)k |) = O(n log k).
Proof. For vi ∈ V , let SP(k)i denote the set of all shortest k-paths that contain vi and end
at some vertex in the set {v1, v2, . . . , vk}. Note that
k∑
i=1
|SP(k)i | ≤
n∑
i=1
k∑
j=1
|vi
k
⇝vj |.
By symmetry, we have E(|SP(k)i |) = E(|SP
(k)
j |) for arbitrary i, j ∈ {1, 2, . . . , k}, and
hence
kE(|SP(k)k |) =
k∑
i=1
E(|SP(k)i |) ≤
n∑
i=1
k∑
j=1
E(|vi
k
⇝vj |) ≤ knE( max
u,v∈V
|u k⇝v|).
By (ii) of Lemma 4.6, we get that E(|SP(k)k |) = O(n log k).
We are now ready to analyse the expected time of the Tree algorithm.
Theorem 5.2. The Tree algorithm has an expected-case running time of O(n2 log2 n) for
the complete directed graphs on n vertices with arc weights selected independently at ran-
dom from the uniform distribution on [0, 1].
16 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
Proof. To estimate the number of comparisons Tk at iteration k, we consider two cases.
First, for k < log2 n we bound Tk from above by n2. Second, we estimate E(Tk) for
k ≥ log2 n. For every c > 0, we have
E(Tk) = E(Tk | ∆(OUTk) < c) · P(∆(OUTk) < c)
+ E(Tk | ∆(OUTk) ≥ c) · P(∆(OUTk) ≥ c).
Using inequality (3.1) we get
E(Tk | ∆(OUTk) < c) ≤ E(|SP(k)k | ·∆(OUTk) +O(n) | ∆(OUTk) < c)
≤ c · E(|SP(k)k |) +O(n).
As Tk is always at most n2, we have E(Tk | ∆(OUTk) ≥ c) ≤ n2. Further, taking into
account that P(∆(OUTk) < c) ≤ 1, we get
E(Tk) ≤ c · E(|SP(k)k |) +O(n) + n
2 · P(∆(OUTk) ≥ c).
Take c = (e2 + 12) log k + e2 n−kk . Then, by Lemma 4.9, we have P(∆(OUTk) ≥ c) =
O(k−1). Moreover, by Lemma 5.1, we have E(|SP(k)k |) = O(n log k), which gives us
E(Tk) = O((e2 + 12)n log2 k + e2(n− k)n log k/k) +O(n) +O(n2/k)
= O(n log2 n+ n2 log n/k).
Putting everything together, we bound the expected time of the algorithm from above as
E
( n∑
k=1
Tk
)
=
log2 n−1∑
k=1
E(Tk) +
n∑
k=log2 n
E(Tk)
≤
log2 n−1∑
k=1
n2 +
n∑
k=log2 n
O(n log2 n+ n2 log n/k) = O(n2 log2 n),
as claimed.
We conclude the section with a proof of the main theorem.
Proof of Theorem 1.1. The Hourglass algorithm does not have a worse bound than the Tree
variant, so the result follows by Theorem 5.2.
6 Empirical evaluation
All algorithms were implemented in C++ and compiled using g++ -march=native
-O3. The tests were ran on an Intel(R) Core(TM) i7-2600@3.40GHz with 8GB RAM
running Windows 7 64-bit.
To make the comparison between Floyd-Warshall and its modified versions fairer, we
improved the Floyd-Warshall algorithm with a simple modification skipping combinations
of i and k where Wik = ∞, and consequently reducing the number of relaxations of the
algorithm to RFW ≤ n3.
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 17
The experiments were conducted on the following random digraphs: (i) uniform ran-
dom digraphs with arc weights uniformly distributed on the interval [0, 1], and (ii) un-
weighted random digraphs. In both cases, the digraphs were constructed by first setting
the desired vertex count and density. Then, a random Hamiltonian cycle was constructed,
ensuring the strong connectivity of the digraph. After the cycle was constructed, the re-
maining n(n − 2) arcs were put into a list and randomly permuted, and then added into
the digraph until the desired density was reached. Finally, algorithms were executed on
the instance, and their running times were recorded. Tests were conducted ten times and
averaged, with each test running on a different randomly generated graph.
6.1 Empirical comparison of the number of relaxations
Our motivation when designing the Hourglass and Tree algorithms was to skip relaxations
that are not contributing to the result. To verify the theoretical results on the expected
number of relaxations in practice we conducted two experiments in which we counted the
number of relaxations by different algorithms. For the first experiment we generated a sub-
family of digraphs from (i), mentioned above, consisting of complete digraphs of varying
size vertex set. On contrary, for the second experiment we generated another subfamily
of digraphs from (i), now consisting of sparse digraphs with fixed vertex set and variable
arc density. The results of experiments are presented in the plots relative to the number of
relaxations performed by the Floyd-Warshall algorithm; i.e. all numbers of relaxations are
divided by RFW .
The results of the first experiment, in which RFW = n3 since digraphs are complete,
are presented in Figure 1. To relate the theoretical upper bound of O(n2 lg2 n) of the Tree
algorithm and the experimental results, we added also the plot of the function 60n
2 lg2 n
n3 .
We chose the constant 60 so that the plots of the Tree algorithm and the added function start
at the same initial point, namely at 28 vertices. The results of the second experiment for
28 29 210 211 212
0
1
2
3
4
5
7.5
10
15
Number of vertices (n)
R
el
ax
at
io
ns
/n
3
Tree
Hourglass
60n
2 lg2 n
n3
Figure 1: Complete digraphs of various sizes with the number of relaxations of algorithms
divided by n3.
18 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
n = 1024 vertices and sizes of the arc set varying between n2/10 and 8n2/10 are shown
in Figure 2.
10 20 30 40 50 60 70 80
1
2
3
4
5
6
Density (100% = n2 arcs)
R
el
ax
at
io
ns
/R
F
W
n = 1024 vertices
Tree
Hourglass
Figure 2: Digraphs with n = 1024 vertices and various arc densities with the number of
relaxations of algorithms divided by RFW .
In Figure 1 we see a significant reduction of relaxations which also implies the decrease
of running time of the Tree and Hourglass algorithms. From the plot we can also see that
the experimental results indicate that the theoretical upper bound of the Tree algorithm is
asymptotic tight. The experiments on sparse digraphs (see Figure 2) also show a reduction
in relaxations as digraphs become sparser.
6.2 Empirical comparison of running times
As discussed in the introduction, we compared the Tree3 and Hourglass algorithms with the
Floyd-Warshall [11, 26] and Dijkstra [10] algorithms, as well as the algorithm of Brodnik
and Grgurovič [4], which we refer to as Propagation. These algorithms were chosen since
they proved to perform best out of a larger group of algorithms compared in [4].
It should be pointed out, that breadth-first search is extremely efficient in solving APSP
on unweighted digraphs. However, we did not include breadth-first search in comparisons,
because we consider unweighted graph instances only as the worst-case instances of the
general shortest path problem (each arc is part of at least one shortest path in such in-
stances).
The algorithms were tested on the graph families (i) and (ii) described at the beginning
of this section, with sizes of the vertex set varying between 512 and 4096, and sizes of the
arc set varying between n1.1 and n2. As the priority queue in the Dijkstra and Propaga-
tion algorithms we used pairing heaps since they are known to perform especially well in
solving APSP in practice [22], even though the amortized complexity of their decrease key
3In the tests, we used the implementation of the algorithm with improvements from Subsection 3.1.
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 19
operation takes O(22
√
lg lgn) in comparison to O(1) of Fibonacci heaps [21]. We used the
implementation of pairing heaps from the Boost Library, version 1.55.
The results for uniform random digraphs presented in Figure 3 show that both, Prop-
agation and Tree, outperform the other algorithms on all vertex and arc densities. As the
size n of graphs increases, the running time of Hourglass approaches the running time of
Tree, but the constant factors still prove to be too large for Hourglass to prevail because of
a more clever exploration strategy. Moreover, it is interesting to see that Floyd-Warshall
based Tree and Hourglass outperform Dijkstra on sparse graphs.
n1.1 n1.325 n1.55 n1.775 n2
24
25
26
27
28
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 512 vertices
n1.1 n1.325 n1.55 n1.775 n2
26
27
28
29
210
211
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 1024 vertices
n1.1 n1.325 n1.55 n1.775 n2
28
29
210
211
212
213
214
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 2048 vertices
n1.1 n1.325 n1.55 n1.775 n2
211
212
213
214
215
216
217
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 4096 vertices
Dijkstra Propagation Floyd-Warshall
Tree Hourglass
Figure 3: Experimental results on family (i) – uniform digraphs.
The results for unweighted random digraphs are shown in Figure 4. What is interesting
is that Tree and Hourglass remain competitive with Dijkstra, and even outperforming it on
smaller graphs in some instances. In contrast, the performance of Propagation falls short
of Dijkstra because each arc is part of at least one shortest path in these graphs.
20 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
n1.1 n1.325 n1.55 n1.775 n2
24
25
26
27
28
29
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 512 vertices
n1.1 n1.325 n1.55 n1.775 n2
25
26
27
28
29
210
211
212
213
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 1024 vertices
n1.1 n1.325 n1.55 n1.775 n2
28
29
210
211
212
213
214
215
216
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 2048 vertices
n1.1 n1.325 n1.55 n1.775 n2
210
211
212
213
214
215
216
217
218
219
220
Density
Ti
m
e
(m
ill
is
ec
on
ds
)
n = 4096 vertices
Dijkstra Propagation Floyd-Warshall
Tree Hourglass
Figure 4: Experimental results on family (ii) – unweighted digraphs.
7 Conclusion
We theoretically analyzed the Tree algorithm which is a relatively simple modification
of the Floyd-Warshall algorithm. The analysis gives its expected-case time complexity
in the uniform model of O(n2 log2 n), which also explains the algorithm’s good practical
performance presented in Section 6. We also presented the Hourglass algorithm as a further
improvement of the Tree algorithm, but it remains an open question whether its expected-
case time complexity in the uniform model is o(n2 log2 n).
Next, since both the Tree and Hourglass algorithms allow negative arc weights, it would
be interesting to analyze their expected-case running time complexity for a model that
permits negative arcs such as the vertex potential model [5, 6].
Overall, the Tree algorithm is simple to implement and offers very good performance.
The Hourglass algorithm has the potential to be even better but probably requires a more
A. Brodnik et al.: Modifications of the Floyd-Warshall algorithm with nearly quadratic . . . 21
complex implementation. It is also worthwhile to note that the space requirement of the
Tree algorithm is not worse than the space requirement of any algorithm that reports all
shortest paths. The Hourglass algorithm requires an additional matrix of size n2.
ORCID iDs
Andrej Brodnik https://orcid.org/ 0000-0001-9773-0664
Rok Požar https://orcid.org/ 0000-0002-2037-9535
References
[1] L. Addario-Berry, N. Broutin and G. Lugosi, The longest minimum-weight path in a complete
graph, Comb. Probab. Comput. 19 (2010), 1–19, doi:10.1017/s0963548309990204.
[2] R. K. Ahuja, T. L. Magnanti and J. B. Orlin, Network Flows: Theory, Algorithms, and Appli-
cations, Prentice-Hall, Inc., USA, 1993.
[3] P. A. Bloniarz, A shortest-path algorithm with expected time O(n2logn log∗n), SIAM J. Com-
put. 12 (1983), 588–600, doi:10.1137/0212039.
[4] A. Brodnik and M. Grgurovič, Solving all-pairs shortest path by single-source computations,
Discrete Appl. Math. 231 (2017), 119–130, doi:10.1016/j.dam.2017.03.008.
[5] B. V. Cherkassky, A. V. Goldberg and T. Radzik, Shortest paths algorithms: theory and ex-
perimental evaluation, Math. Programming 73 (1996), 129–174, doi:10.1016/0025-5610(95)
00021-6.
[6] C. Cooper, A. Frieze, K. Mehlhorn and V. Priebe, Average-case complexity of shortest-paths
problems in the vertex-potential model, Random Struct. Algorithms 16 (2000), 33–46, doi:
10.1002/(sici)1098-2418(200001)16:1⟨33::aid-rsa3⟩3.0.co;2-0.
[7] T. H. Cormen, C. Stein, R. L. Rivest and C. E. Leiserson, Introduction to Algorithms, McGraw-
Hill Higher Education, 2nd edition, 2001.
[8] C. Demetrescu and G. F. Italiano, A new approach to dynamic all pairs shortest paths, J. ACM
51 (2004), 968–992, doi:10.1145/1039488.1039492.
[9] C. Demetrescu and G. F. Italiano, Experimental analysis of dynamic all pairs shortest path
algorithms, ACM Trans. Algorithms 2 (2006), 578–601, doi:10.1145/1198513.1198519.
[10] E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik 1
(1959), 269–271, doi:10.1007/bf01386390.
[11] R. W. Floyd, Algorithm 97: Shortest path, Commun. ACM 5 (1962), 345, doi:10.1145/367766.
368168.
[12] T. Hagerup and C. Rüb, A guided tour of Chernoff bounds, Inf. Process. Lett. 33 (1990), 305–
308, doi:10.1016/0020-0190(90)90214-I.
[13] Y. Han and T. Takaoka, An O(n3 log log n/ log2 n) time algorithm for all pairs shortest paths,
in: Proceedings of the 13th Scandinavian Conference on Algorithm Theory, Springer-Verlag,
Berlin, Heidelberg, SWAT’12, 2012 pp. 131–141, doi:10.1007/978-3-642-31155-0 12.
[14] S. Janson, One, two and three times logn/n for paths in a complete graph with random
weights, Combin. Probab. Comput. 8 (1999), 347–361, doi:10.1017/s0963548399003892.
[15] D. B. Johnson, Efficient algorithms for shortest paths in sparse networks, J. ACM 24 (1977),
1–13, doi:10.1145/321992.321993.
[16] D. Karger, D. Koller and S. J. Phillips, Finding the hidden path: time bounds for all-pairs
shortest paths, SIAM J. Comput. 22 (1993), 1199–1217, doi:10.1137/0222071.
22 Ars Math. Contemp. 22 (2022) #P1.01 / 1–22
[17] K. Mehlhorn and V. Priebe, On the all-pairs shortest-path algorithm of Moffat and Takaoka,
Random Struct. Algorithms 10 (1997), 205–220, doi:10.1002/(sici)1098-2418(199701/03)10:
1/2⟨205::aid-rsa11⟩3.3.co;2-J.
[18] A. Moffat and T. Takaoka, An all pairs shortest path algorithm with expected time O(n2 logn),
SIAM J. Comput. 16 (1987), 1023–1031, doi:10.1137/0216065.
[19] Y. Peres, D. Sotnikov, B. Sudakov and U. Zwick, All-pairs shortest paths in O(n2) time with
high probability, J. ACM 60 (2013), Article No. 26 (25 pages), doi:10.1145/2508028.2505988.
[20] S. Pettie, A new approach to all-pairs shortest paths on real-weighted graphs, Theor. Comput.
Sci. 312 (2004), 47–74, doi:10.1016/s0304-3975(03)00402-x.
[21] S. Pettie, Towards a final analysis of pairing heaps, in: 46th Annual IEEE Symposium on Foun-
dations of Computer Science (FOCS’05), IEEE Computer Society, Washington, DC, USA,
2005 pp. 174–183, doi:10.1109/sfcs.2005.75.
[22] S. Pettie, V. Ramachandran and S. Sridhar, Experimental evaluation of a new shortest path
algorithm, in: Proceedings of the 4th International Workshop on Algorithm Engineering and
Experiments 2002 (ALENEX ’02), Springer-Verlag, Berlin, Heidelberg, 2002 pp. 126–142, doi:
10.1007/3-540-45643-0 10.
[23] M. Raab and A. Steger, “Balls into bins”—a simple and tight analysis, in: Randomization and
Approximation Techniques in Computer Science (Barcelona, 1998), Springer, Berlin, volume
1518 of Lecture Notes in Computer Science, pp. 159–170, 1998, doi:10.1007/3-540-49543-6
13.
[24] P. M. Spira, A new algorithm for finding all shortest paths in a graph of positive arcs in average
time O(n2log2n), SIAM J. Comput. 2 (1973), 28–32, doi:10.1137/0202004.
[25] T. Takaoka and A. Moffat, An O(n2 logn log logn) expected time algorithm for the all shortest
distance problem, in: Mathematical Foundations of Computer Science 1980, Springer, Berlin,
Heidelberg, volume 88 of Lecture Notes in Computer Science, 1980 pp. 643–655, doi:10.1007/
bfb0022539, proceedings of the 9th Symposium Held in Rydzyna, Poland, September 1 – 5,
1980.
[26] S. Warshall, A theorem on boolean matrices, J. ACM 9 (1962), 11–12, doi:10.1145/321105.
321107.