Universität Paderborn - Home Universität Paderborn
Die Universitä der Informationsgesellschaft

Shape Optimizing Graph Partitioning



If you are looking for graph sequences that can be used for benchmarking load balancers, go to this section for more information.

Load Balancing in Parallel Adaptive Numerical Simulations

Computer based numerical simulations often play an important role during the development of complex technical products. Such simulations require a discretization of the underlaying mathematical problem description. Therefore, the established "Finite Element Method" dissects the simulation area into simple geometric elements, e.g. tetrahedrons, and approximates the solution of the partial differential equations at the corner points. The finer this discretization is performed, the better becomes the approximation quality, but more elements also result in a higher computational effort. Up-to-date simulations involve many millions of elements, hence a fast calculation can only be accomplished on a parallel computer.

The efficient usage of a parallel system requires an even work load distribution onto the processors. The induced additional data transfer between the processors should be kept minimal due to the relatively high communication costs.

Since the approximation computation at a corner point relies on the data of all adjacent points, neighboring elements should be preferably placed on the same processor. Furthermore, during a simulation the discretization can turn out to be not fine enough in certain areas. To obtain a more exact solution, the number of elements in these areas has to be increased. This adaptation often results in a work load mismatch, therefore a rebalancing according to the mentioned criteria becomes necessary. Additionally, as few as possible elements should be migrated to other processors because this operation is comparatively expensive.

The relationships between the elements can be modeled as a graph, where the computations are represented by vertices and the data dependencies by edges. A common method to distribute the computational loads onto the processors consists in dividing the vertices of this graph into equally sized sets (partitions) such that as few edges as possible connect vertices that are placed in different partitions. This matches the classical graph partitioning problem.

Repartitioning with
new shape-optimizing method Repartitioning with
classical method However, the actual amount of communication can be represented much more accurately by the number of nodes at the partition boundaries. That is why we have developed a new technique which computes good, and depending on the simulation area as round as possible partition shapes. By this we obtain very often connected partitions with a small number of boundary nodes. Since our approach, in contrast to many other methods, is well-suited for an incremental improvement of a given partitioning, it rebalances with small migration costs at the same time. The traditional method (left) and our new approach (right) can be compared by means of the adjacent figures. There it is apparent that the latter computes connected and more compact partitions with shorter and smoother boundaries.

Graph Sequences as Repartitioning Benchmarks

In our repartitioning experiments we have used graph sequences as benchmarks that have been generated to resemble adaptive numerical simulations in 2D. Twelve small instances were used that are described in more detail (and can be dowloaded from) here. Moreover, we have used the three large instances bigtric, bigtrace, and bigbubbles, which have 46 or 101 frames and between 100,000 and 1,000,000 nodes per frame each. They are similar in structure as their smaller counterparts without the big prefix and the file structure is the same. New larger sequences are now available: biggerslowtric. You can download the data by clicking on the instance names.

Results

The results of the load balancing experiments are very promising. The solutions of our shape-optimizing methods are usually better than those computed by state-of-the-art tools. Details will follow.

When partitioning static graphs, we have improved the best known partitions (with respect to the edge-cut) for six of the eight largest graphs of Chris Walshaw's partition archive widely used for benchmarking. The tables below show our improved edge-cut results (followed by the previously best known results in parantheses) as of 4 Oct 2007. By clicking on the edge-cut value the corresponding partition can be downloaded.

Note that our algorithm does not explicitly focus on good edge-cuts nor any other metric. It rather aims at finding partitions with good shapes. This notion includes connectedness, low diameter, and few boundary vertices. This is achieved by two diffusive approaches which identify dense regions of graphs. In contrast to that, the edge-cut is the main optimization criterion of most state-of-the-art partitioners. Hence, a large amount of data is available for this metric, while it is widely acknowledged that it is not necessarily the best criterion. It should also be noted that our program is still significantly slower than state-of-the-art libraries for minimizing the edge-cut in graph partitions (like Metis and Jostle). Current work therefore includes a further acceleration of our algorithm to reduce this runtime gap.

News

Improved edge-cut results for balanced partitions as of 4 Oct 2007 (0% imbalance)

graph 4 8 16 32 64
fe_tooth 7325 (7327) 12175 (12354) 18391 (18804) 26346 (26619) ---
598a 8243 (8277) 16884 (17223) 27404 (28152) 41538 (42088) 59498 (59708)
144 --- 26762 (27751) 39568 (39856) 58599 (58659) 81973 (82123)
wave --- 31697 (32204) 44711 (47277) 65772 (66891) 88986 (91784)
m14b 13564 (13597) 26989 (28128) 44541 (45362) 68027 (68468) 101551 (103946)
auto 28324 (29409) 48901 (53326) 81500 (83748) 125477 (131236) 176435 (182934)
All 0% partition files: part_0.tgz (as of 10 July 2007)

Improved edge-cut results for partitions with up to 1% imbalance as of 4 Oct 2007

graph 4 8 16 32 64
fe_tooth --- 12060 (12329) 18285 (18435) 25977 (26385) ---
598a 8197 (8222) --- 27009 (27344) 40962 (41124) 59098 (59152)
144 --- --- 39352 (39856) 58126 (58659) ---
wave --- 31883 (31965) 44711 (46903) 64860 (65926) 88863 (91304)
m14b 13403 (13430) 26989 (28128) 44330 (44515) 67770 (68468) 101551 (101980)
auto 27790 (29158) 48442 (53158) 81339 (82901) 124991 (130818) 175975 (182890)
All 1% partition files: part_1.tgz (as of 10 July 2007)

Improved edge-cut results for partitions with up to 3% imbalance as of 4 Oct 2007

graph 4 8 16 32 64
fe_tooth --- 11957 (12177) --- --- ---
598a --- --- --- 40718 (40792) ---
144 --- --- --- 57354 (58659) 80767 (80894)
wave 17407 (17599) 29776 (30895) 43791 (46487) 63675 (64953) 87957 (88383)
m14b --- 26765 (27711) 43962 (44174) 67551 (68468) 101019 (101980)
auto 26509 (29158) 48263 (48329) 78901 (82901) 124251 (130310) 174904 (180562)
All 3% partition files: part_3.tgz (as of 10 July 2007)

Improved edge-cut results for partitions with up to 5% imbalance as of 4 Oct 2007

graph 4 8 16 32 64
fe_tooth --- 11957 (12141) --- --- ---
598a --- --- --- --- ---
144 --- --- --- 57839 (58142) 80257 (80445)
wave 17306 (17320) 29912 (30583) 44355 (44625) 63675 (63725) 87654 (88383)
m14b --- 26719 (27711) 43747 (44174) 67551 (68468) 100183 (101385)
auto 26298 (29158) 48174 (48221) 78901 (82901) 124251 (127433) 173205 (179464)
All 5% partition files: part_5.tgz (as of 10 July 2007)

Applications:

Supported by:

PaSCo graduate school (DFG GRK 693)
DFG Sonderforschungsbereich 376
DFG SPP 1307 "Algorithm Engineering"
AEOLUS (funded by EU)

Contact:

Dr. Henning Meyerhenke
Tel.: +49 (0) 52 51 60 67 30
henningm (at) upb.de
http://www.upb.de/cs/henningm

Joint work with:

Publications:

Index A – Z | Impressum | Webmaster