Communication-Optimal Parallel Minimum Spanning Tree Algorithms
Ben H.H. Juurlink,
Petr Kolman,
Friedhelm Meyer auf der Heide,
Ingo Rieping
June 2000
Abstract
In this paper matching upper and lower bounds for broadcast on
general purpose parallel computation models that exploit network
locality are proven.
These models try to capture both the general purpose properties of models
like the PRAM or BSP on the one hand, and to exploit network locality
of special purpose models like meshes, hypercubes, etc.,
on the other hand.
They do so by charging a cost $l(|i-j|)$ for a communication
between processors $i$ and $j$, where $l$ is a suitably chosen latency
function.
An upper bound $T(p) = \sum_{i=0}^{\log \log p} 2^i \cdot l(p^{1/2^i})$
on the runtime of a broadcast on a $p$~processor H-PRAM is given,
for an arbitrary latency function $l(k)$.
The main contribution of the paper is a matching lower bound, holding
for all latency functions in the range $l(k)=\Omega(
\log k / \log \log k)$ and $l(k)=O(\log^2 k)$.
This is not a severe
restriction since for latency functions
$l(k) = O(\log k/ \log^{1+\epsilon} \log (k))$
with arbitrary $\epsilon > 0$,
the runtime of the algorithm matches the trivial lower bound
$\Omega(\log p)$
and for
$l(k) = \Theta (\log^{1+\epsilon} k)$ or
$l(k)=\Theta(k^\epsilon)$,
the runtime matches the other trivial lower bound
$\Omega(l(p))$.
Both upper and lower bounds apply for other parallel locality models like
H-PRAM, Y-PRAM, D-BSP and E-BSP, too.