Seminar: Sublinear Algorithms (MUA)

Wintersemester 2003/04

Christian Sohler




First meeting and assignment of topics (list of topics below):

Seminar:

Contents:

Sublinear algorithms deal with the question how to evaluate data sets that are too large even for standard linear time algorithms. Such data sets arise in many applications, e.g., in telecommunication industries, data mining, and internet applications.
One example are so called call-detail data sets where information about phone calls is stored. Such data sets contain information about the connection, rate, time of call, etc. Their size can be several GByte a day. Phone companies want to analyze such daza sets to adjust their rates to the demands.
In this seminar we consider different models from the field of sublinear algorithms including property testing, streaming algorithms, and sublinear time approximation algorithms.

Links below go to web pages where you can download the corresponding paper!

Property Testing:

Property Testing is the computational task to decide whether a given object (e.g., a graph, a function, or a point set) has a certain predetermined property (e.g., bipartiteness, monotonicity, or convex position) or is far away from every object having this property. If the input object neither has the property nor is far away from it, the algorithm may answer arbitrarily. In contrast to traditional algorithms a property testing algorithm does not get a complete description of the object under consideration as input. Indead, it has the ability to perform queries about the (local) structure of the object. This way it is possible that a property testing algorithm achieves its goal by looking only at a small (usually randomly selected) part of the whole object.

Monotonicity:

One of the first properties considered in property testing is monotonicity of boolean functions.

Improved testing algorithms for monotonicity
Y. Dodis, O. Goldreich, E. Lehman, S. Raskhodnikova, D. Ron, A. Samorodnitsky
Randomization and Approximation Techniques in Computer Science, Proc. 3rd International Workshop RANDOM'99, 1999.

Graph colorability:

Is it possible to assign k colors to the vertices of a graph such that there is no edge connecting two vertices of the same color? This problem is known to be NP-hard and there are a number of hardness results on its approximability. Surprisingly, it turns out that the corresponding property testing problem can be solved in time independent of the size of the input graph (for dense graph instances).
Graph colorability has first been considered in

Property testing and its connection to learning and approximation
O. Goldreich, S. Goldwasser, D. Ron
Journal of the ACM, 45(4): 653 - 750, 1998.

For the preparation of the talk the following paper is recommended

Testing k-colorability

N. Alon, M. Krivelevich
SIAM J. on Discrete Mathematics, 15(2):211-227, 2002.

Sorted Sequences:

Sorting is one of the fundamental problems in computer science. One can easily verify in linear time whether a given input sequence of numbers is sorted.
In this paper it is shown that one can test in logarithmic time whether a sequence is sorted or has large edit distance to any sorted sequence.

Spot checkers
F. Ergün, S. Kannan, S.R. Kumar, R. Rubinfeld, M. Viswanathan
J. Comput. Syst. Sci., 60: 717-751, 2000.

Clustering:

Clustering large data sets is important in many applications including search engines, data mining, etc. In this paper the authors present property testing algorithms for two clustering problems. The algorithms run in time independent of the input size and can be used to output cluster centers that are good for the point set except for a small fraction of outliers.

Testing of Clustering

N. Alon, S. Dar, M. Parnas, D. Ron
Proceedings of the 41st Annual Symposium on Foundations of Computer Science

Geometric properties:

This paper initiated the study of property testing for geometric properties. Some geometric properties like convex position, intersection of objects, Euclidean minimum spanning trees are shown to be efficiently testable.

Property testing in computational geometry
A. Czumaj, C. Sohler, M. Ziegler
Algorithms - ESA '2000, Proc. of the 8th Annual European Symposium, pp. 155 - 166, 2000.

Geometric properties with range queries:

Is it possible to improve the results on convex position, if the point set can be accessed by more complex queries? The answer to the question is yes, if the point set is in the plane and if it is possible to access this point set using triangle range queries. It is also possible to obtain efficient property testers for other problems of geometric nature that cannot be tested without access using geometric queries.

Property testing with geometric queries

A. Czumaj, C. Sohler
Algorithms - ESA '2001, Proc. of the 9th Annual European Symposium, pp. 266 - 277, 2001.

Streaming Algorithms:

Imagine you want to maintain some statistics about internet traffic routed by a backbone server. You might observe several GByte of data each day that arrives packet by packet and can be viewed as a stream of data. It is unlikely that you have enough storage capacity to store the origin and destination of each packet. It would be easier to view each data item when it is generated and discard it afterwards while storing only a sketch of the observed data. An algorithm that works in this way is called a streaming algorithm. Data streams also arise in other applications such as distributed data bases and physical measurements.

Histograms:

Histograms are widely used to capture data distribution, to represent the data by a small number of step functions. In this paper a streaming algorithm for histogram computation is given.

Data streams and histograms
S. Guha, N. Koudas, K. Shim
Proc. of the 33rd Annual ACM Symposium on Theory of Computing, 2001.

Inversions in data streams:

In this paper an algorithm to approximately count the number of inversions in a data stream of n numbers from {1,...,m}.

Counting inversions in lists
A. Gupta, F. X. Zane
Proc. of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, 2003.

Sublinear time approximation algorithms:

MST weight:

Assume you are given access to a huge graph. What can you find out about the graph by looking only at a small (possibly randomly selected) part of the graph. In this paper the authors show that one can approximate the weight of the minimum spanning tree of the graph without looking at the whole graph.

Approximating the Minimum Spanning Tree Weight in Sublinear Time
,
B. Chazelle, R. Rubinfeld, L. Trevisan
Automata, Languages and Programming, Proc. of the 28th International Colloquium, pp. 190 - 200, 2001.

k-Median:

Given n points in a metric space the k-median problem is to find k centers that minimize the sum of the distances (over all points) to the nearest center. in this paper the author presents a sublinear approximation algorithm for the k-median problem.

Sublinear time algorithm for metric space problems
P. Indyk
Proc. of the 31st Symposium on Theory of Computing, pp. 428-434, 1999

Point location:

Given a (planar) subdivision the point location problem is to find the cell of the subdivision that contains a given query point. In this paper the authors show that one can do a point location in a Delaunay triangulation of n points in sublinear time when the points are distributed uniformly in the unit square.
 
Fast randomized point location without preprocessing in two- and three-dimensional Delaunay triangulations

E. Mücke, I. Saias, B. Zhu
Proc. of the 12th annual Symposium on Computational Geometry, pp. 274-283, 1996

Sublinear algorithms in geometry:

In this paper sublinear algorithms for some geometric problems like intersection detection for polygons and polyhedra is given.

Sublinear geometric algorithms
B. Chazelle, D. Liu, Avner Magen: Sublinear geometric algorithms.
Proc. of the 35th Symposium on Theory of Computing, pp. 531-540, 2003.