Shape Optimizing Graph Partitioning
If you are looking for graph sequences that can be used for benchmarking load balancers, go to this section for more information.
Load Balancing in Parallel Adaptive Numerical Simulations
Computer based numerical simulations often play an important role during the development of complex technical products. Such simulations require a discretization of the underlaying mathematical problem description. Therefore, the established "Finite Element Method" dissects the simulation area into simple geometric elements, e.g. tetrahedrons, and approximates the solution of the partial differential equations at the corner points. The finer this discretization is performed, the better becomes the approximation quality, but more elements also result in a higher computational effort. Uptodate simulations involve many millions of elements, hence a fast calculation can only be accomplished on a parallel computer.The efficient usage of a parallel system requires an even work load distribution onto the processors. The induced additional data transfer between the processors should be kept minimal due to the relatively high communication costs.
Since the approximation computation at a corner point relies on the data of all adjacent points, neighboring elements should be preferably placed on the same processor. Furthermore, during a simulation the discretization can turn out to be not fine enough in certain areas. To obtain a more exact solution, the number of elements in these areas has to be increased. This adaptation often results in a work load mismatch, therefore a rebalancing according to the mentioned criteria becomes necessary. Additionally, as few as possible elements should be migrated to other processors because this operation is comparatively expensive.
The relationships between the elements can be modeled as a graph, where the computations are represented by vertices and the data dependencies by edges. A common method to distribute the computational loads onto the processors consists in dividing the vertices of this graph into equally sized sets (partitions) such that as few edges as possible connect vertices that are placed in different partitions. This matches the classical graph partitioning problem.
However, the actual amount of communication can be represented much more accurately by the number of nodes at the partition boundaries. That is why we have developed a new technique which computes good, and depending on the simulation area as round as possible partition shapes. By this we obtain very often connected partitions with a small number of boundary nodes. Since our approach, in contrast to many other methods, is wellsuited for an incremental improvement of a given partitioning, it rebalances with small migration costs at the same time. The traditional method (left) and our new approach (right) can be compared by means of the adjacent figures. There it is apparent that the latter computes connected and more compact partitions with shorter and smoother boundaries.
Graph Sequences as Repartitioning Benchmarks
In our repartitioning experiments we have used graph sequences as benchmarks that have been generated to resemble adaptive numerical simulations in 2D. Twelve small instances were used that are described in more detail (and can be dowloaded from) here. Moreover, we have used the three large instances bigtric, bigtrace, and bigbubbles, which have 46 or 101 frames and between 100,000 and 1,000,000 nodes per frame each. They are similar in structure as their smaller counterparts without the big prefix and the file structure is the same. New larger sequences are now available: biggerslowtric. You can download the data by clicking on the instance names.
Results
The results of the load balancing experiments are very promising. The solutions of our shapeoptimizing methods are usually better than those computed by stateoftheart tools. Details will follow.
When partitioning static graphs, we have improved the best known partitions (with respect to the edgecut) for six of the eight largest graphs of Chris Walshaw's partition archive widely used for benchmarking. The tables below show our improved edgecut results (followed by the previously best known results in parantheses) as of 4 Oct 2007. By clicking on the edgecut value the corresponding partition can be downloaded.
Note that our algorithm does not explicitly focus on good edgecuts nor any other metric. It rather aims at finding partitions with good shapes. This notion includes connectedness, low diameter, and few boundary vertices. This is achieved by two diffusive approaches which identify dense regions of graphs. In contrast to that, the edgecut is the main optimization criterion of most stateoftheart partitioners. Hence, a large amount of data is available for this metric, while it is widely acknowledged that it is not necessarily the best criterion. It should also be noted that our program is still significantly slower than stateoftheart libraries for minimizing the edgecut in graph partitions (like Metis and Jostle). Current work therefore includes a further acceleration of our algorithm to reduce this runtime gap.
News
 10 July 2007: All partition files are now up to date. They can be downloaded separately from the tables below or from this package. Total number of improvements: 63.
 11 July 2007: Chris Walshaw has added new results to his archive today; some of our values for the graph fe_tooth have been superseded by this. These changes will be incorporated into our tables soon. New total number of improvements: 57.
 12 July 2007: After some changes in our implementation directed towards an even higher quality, we have computed new improved results. They are stored in this package file. The table values are now all updated. New total number of improvements: 75 (more than half of all 144 partitionings for these six graphs).
 4 Oct 2007: Some records have been lost, more have been gained. The new ones are stored in this package file. New total number of improvements: 85.
 9 Dec 2007: Our paper A New Diffusionbased Multilevel Algorithm for Computing Graph Partitions of Very High Quality containing our recent results described on this page has been accepted at IPDPS'08 as best algorithms paper!
 21 Sep 2010: If you would like to have a copy of our parallel DibaP software, please sign the license and send a scanned version to our developer mailing list dibap at lists dot upb dot de. After that, we will send you the current version of the software package.
 24 Sep 2010: The DibaPlite library is now available. DibaPlite implements our diffusionbased graph partitioning algorithms, but does not have all features contained in the original DibaP implementation. DibaPlite is sequential, but also contains thread parallelism for computeintensive routines. Please send an email to our developer mailing list dibap at lists dot upb dot de to receive the code.
Improved edgecut results for balanced partitions as of 4 Oct
2007 (0% imbalance)
graph  4  8  16  32  64 

fe_tooth  7325 (7327)  12175 (12354)  18391 (18804)  26346 (26619)   
598a  8243 (8277)  16884 (17223)  27404 (28152)  41538 (42088)  59498 (59708) 
144    26762 (27751)  39568 (39856)  58599 (58659)  81973 (82123) 
wave    31697 (32204)  44711 (47277)  65772 (66891)  88986 (91784) 
m14b  13564 (13597)  26989 (28128)  44541 (45362)  68027 (68468)  101551 (103946) 
auto  28324 (29409)  48901 (53326)  81500 (83748)  125477 (131236)  176435 (182934) 
All 0% partition files: part_0.tgz (as of 10 July 2007)
Improved edgecut results for partitions with up to 1% imbalance
as of 4 Oct 2007
graph  4  8  16  32  64 

fe_tooth    12060 (12329)  18285 (18435)  25977 (26385)   
598a  8197 (8222)    27009 (27344)  40962 (41124)  59098 (59152) 
144      39352 (39856)  58126 (58659)   
wave    31883 (31965)  44711 (46903)  64860 (65926)  88863 (91304) 
m14b  13403 (13430)  26989 (28128)  44330 (44515)  67770 (68468)  101551 (101980) 
auto  27790 (29158)  48442 (53158)  81339 (82901)  124991 (130818)  175975 (182890) 
All 1% partition files: part_1.tgz (as of 10 July 2007)
Improved edgecut results for partitions with up to 3% imbalance
as of 4 Oct 2007
graph  4  8  16  32  64 

fe_tooth    11957 (12177)       
598a        40718 (40792)   
144        57354 (58659)  80767 (80894) 
wave  17407 (17599)  29776 (30895)  43791 (46487)  63675 (64953)  87957 (88383) 
m14b    26765 (27711)  43962 (44174)  67551 (68468)  101019 (101980) 
auto  26509 (29158)  48263 (48329)  78901 (82901)  124251 (130310)  174904 (180562) 
All 3% partition files: part_3.tgz (as of 10 July 2007)
Improved edgecut results for partitions with up to 5% imbalance
as of 4 Oct 2007
graph  4  8  16  32  64 

fe_tooth    11957 (12141)       
598a           
144        57839 (58142)  80257 (80445) 
wave  17306 (17320)  29912 (30583)  44355 (44625)  63675 (63725)  87654 (88383) 
m14b    26719 (27711)  43747 (44174)  67551 (68468)  100183 (101385) 
auto  26298 (29158)  48174 (48221)  78901 (82901)  124251 (127433)  173205 (179464) 
All 5% partition files: part_5.tgz (as of 10 July 2007)
Applications:
Supported by:
PaSCo graduate school (DFG GRK 693)DFG Sonderforschungsbereich 376
DFG SPP 1307 "Algorithm Engineering"
AEOLUS (funded by EU)
Contact:
Dr. Henning MeyerhenkeTel.: +49 (0) 52 51 60 67 30
henningm (at) upb.de
http://www.upb.de/cs/henningm
Joint work with:
 Stefan Schamberger
 Burkhard Monien
 Thomas Sauerwald
Publications:
 H. Meyerhenke, J. Gehweiler: On Dynamic Graph Partitioning and Graph Clustering using Diffusion. In: Algorithm Engineering, 27.06.  02.07.2010. Dagstuhl Seminar Proceedings 10261, Internationales Begegnungs und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, 2010.
 H. Meyerhenke: Beyond Good Shapes. Diffusionbased Graph Partitioning is Relaxed Cut Optimization. Accepted for publication in Proc. 21st International Symposium on Algorithms and Computation (ISAAC'10), 2010.
 J. Gehweiler, H. Meyerhenke: A Distributed Diffusive Heuristic for Clustering a Virtual P2P Supercomputer. To appear in Proc. 7th HighPerformance Grid Computing Workshop (HPGC'10), in conjunction with 24th IEEE Internatl. Parallel and Distributed Processing Symposium (IPDPS'10), IEEE, 2010.
 H. Meyerhenke: Dynamic Load Balancing for Parallel Numerical Simulations based on Repartitioning with Disturbed Diffusion. In Proc. 15th Internatl. Conference on Parallel and Distributed Systems (ICPADS'09). IEEE, 2009.
 H. Meyerhenke, B. Monien, S. Schamberger: Graph Partitioning and Disturbed Diffusion. Parallel Computing, 35(1011):544569, 2009.

H. Meyerhenke: Disturbed Diffusive Schemes for Solving Partitioning Problems
on
Graphs.
Dissertation, University of Paderborn, April 2008.
[abstract (en)] [abstract (de)] [bibtex] [pdf]  H. Meyerhenke, B. Monien, T. Sauerwald: A New Diffusionbased Multilevel Algorithm for Computing Graph Partitions. Journal of Parallel and Distributed Computing, 69(9):750761, 2009. Best Paper Awards and Panel Summary: 22nd International Parallel and Distributed Processing Symposium (IPDPS 2008).
 H. Meyerhenke, B. Monien, T. Sauerwald: A New Diffusionbased Multilevel Algorithm for Computing Graph Partitions of Very High Quality. To appear in Proc. 22nd IEEE Internatl. Parallel and Distributed Processing Symposium (IPDPS'08). Winner of the Best Algorithms Paper Award.

H. Meyerhenke, T. Sauerwald: Analyzing Disturbed Diffusion on Networks. In Proc. 17th International Symposium on Algorithms and Computation (ISAAC'06), LNCS 4288, pp. 429438. SpringerVerlag, 2006.
[pdf at SpringerLink] 
H. Meyerhenke, S. Schamberger: A Parallel Shape Optimizing Load Balancer. In Proc. 12th Int. EuroPar Conf. 2006, LNCS 4128, pp. 232242. SpringerVerlag, 2006.
[pdf at SpringerLink] 
H. Meyerhenke, B. Monien, S. Schamberger: Accelerating Shape Optimizing
Load Balancing for Parallel FEM Simulations by Algebraic Multigrid. In Proc. 20th IEEE Internatl. Parallel and Distributed Processing Symposium (IPDPS'06), p. 57 (CD), IEEE Computer Society, 2006.
[pdf at IEEE Xplore, © IEEE Computer Society 2006]