Communication-Optimal Parallel Minimum Spanning Tree Algorithms

Ben H.H. Juurlink, Petr Kolman, Friedhelm Meyer auf der Heide, Ingo Rieping

June 2000


Abstract

In this paper matching upper and lower bounds for broadcast on general purpose parallel computation models that exploit network locality are proven. These models try to capture both the general purpose properties of models like the PRAM or BSP on the one hand, and to exploit network locality of special purpose models like meshes, hypercubes, etc., on the other hand. They do so by charging a cost $l(|i-j|)$ for a communication between processors $i$ and $j$, where $l$ is a suitably chosen latency function.

An upper bound $T(p) = \sum_{i=0}^{\log \log p} 2^i \cdot l(p^{1/2^i})$ on the runtime of a broadcast on a $p$~processor H-PRAM is given, for an arbitrary latency function $l(k)$.

The main contribution of the paper is a matching lower bound, holding for all latency functions in the range $l(k)=\Omega( \log k / \log \log k)$ and $l(k)=O(\log^2 k)$. This is not a severe restriction since for latency functions $l(k) = O(\log k/ \log^{1+\epsilon} \log (k))$ with arbitrary $\epsilon > 0$, the runtime of the algorithm matches the trivial lower bound $\Omega(\log p)$ and for $l(k) = \Theta (\log^{1+\epsilon} k)$ or $l(k)=\Theta(k^\epsilon)$, the runtime matches the other trivial lower bound $\Omega(l(p))$. Both upper and lower bounds apply for other parallel locality models like H-PRAM, Y-PRAM, D-BSP and E-BSP, too.