Seminar: Sublinear
Algorithms (MUA)
Wintersemester 2003/04
Christian Sohler
First meeting and assignment of topics (list of topics below):
Seminar:
- Wednesday, 9-11 am
- F U.116
Contents:
Sublinear algorithms deal with the question how to evaluate data sets
that are too large even for standard linear time algorithms. Such data
sets arise in many applications, e.g., in telecommunication industries,
data mining, and internet applications.
One example are so called call-detail data sets where information about
phone calls is stored. Such data sets contain information about the
connection, rate, time of call, etc. Their size can be several GByte a
day. Phone companies want to analyze such daza sets to adjust their
rates to the demands.
In this seminar we consider different models from the field of
sublinear algorithms including property testing, streaming algorithms,
and sublinear time approximation algorithms.
Links below go to web pages where you can download the
corresponding paper!
Property Testing:
Property Testing is the computational task to decide whether a given
object (e.g., a graph, a function, or a point set) has a certain
predetermined property (e.g., bipartiteness, monotonicity, or convex
position) or is far away from every object having this property. If the
input object neither has the property nor is far away from it, the
algorithm may answer arbitrarily. In contrast to traditional algorithms
a property testing algorithm does not get a complete description of the
object under consideration as input. Indead, it has the ability to
perform queries about the (local) structure of the object. This way it
is possible that a property testing algorithm achieves its goal by
looking only at a small (usually randomly selected) part of the whole
object.
Monotonicity:
One of the first properties considered in property testing is
monotonicity of boolean functions.
Improved
testing algorithms for monotonicity
Y. Dodis, O. Goldreich, E. Lehman, S. Raskhodnikova, D. Ron, A. Samorodnitsky
Randomization and Approximation Techniques in Computer Science, Proc.
3rd International Workshop RANDOM'99, 1999.
Graph colorability:
Is it possible to assign k colors to the vertices of a graph such that
there is no edge connecting two vertices of the same color? This
problem is known to be NP-hard and there are a number of hardness
results on its approximability. Surprisingly, it turns out that the
corresponding property testing problem can be solved in time
independent of the size of the input graph (for dense graph instances).
Graph colorability has first been considered in
Property
testing and its connection to learning and approximation
O. Goldreich, S. Goldwasser, D. Ron
Journal of the ACM, 45(4): 653 - 750, 1998.
For the preparation of the talk the following paper is recommended
Testing k-colorability
N. Alon, M. Krivelevich
SIAM J. on Discrete Mathematics, 15(2):211-227, 2002.
Sorted Sequences:
Sorting is one of the fundamental problems in computer science. One can
easily verify in linear time whether a given input sequence of numbers
is sorted.
In this paper it is shown that one can test in logarithmic time whether
a sequence is sorted or has large edit distance to any sorted sequence.
Spot checkers
F. Ergün, S. Kannan, S.R. Kumar, R. Rubinfeld, M. Viswanathan
J. Comput. Syst. Sci., 60: 717-751, 2000.
Clustering:
Clustering large data sets is important in many applications including
search engines, data mining, etc. In this paper the authors present
property testing algorithms for two clustering problems. The algorithms
run in time independent of the input size and can be used to output
cluster centers that are good for the point set except for a small
fraction of outliers.
Testing of Clustering
N. Alon, S. Dar, M. Parnas, D. Ron
Proceedings of the 41st Annual Symposium on Foundations of Computer
Science
Geometric properties:
This paper initiated the study of property testing for geometric
properties. Some geometric properties like convex position,
intersection of objects, Euclidean minimum spanning trees are shown to
be efficiently testable.
Property
testing in computational geometry
A. Czumaj, C. Sohler, M. Ziegler
Algorithms - ESA '2000, Proc. of the 8th Annual European Symposium, pp.
155 - 166, 2000.
Geometric properties with range queries:
Is it possible to improve the results on convex position, if the point
set can be accessed by more complex queries? The answer to the question
is yes, if the point set is in the plane and if it is possible to
access this point set using triangle range queries. It is also possible
to obtain efficient property testers for other problems of geometric
nature that cannot be tested without access using geometric queries.
Property testing with geometric queries
A. Czumaj, C. Sohler
Algorithms - ESA '2001, Proc. of the 9th Annual European Symposium, pp.
266 - 277, 2001.
Streaming Algorithms:
Imagine you want to maintain some statistics about internet traffic
routed by a backbone server. You might observe several GByte of data
each day that arrives packet by packet and can be viewed as a stream of
data. It is unlikely that you have enough storage capacity to store the
origin and destination of each packet. It would be easier to view each
data item when it is generated and discard it afterwards while storing
only a sketch of the observed data. An algorithm that works in this way
is called a streaming algorithm. Data streams also arise in other
applications such as distributed data bases and physical measurements.
Histograms:
Histograms are widely used to capture data distribution, to represent
the data by a small number of step functions. In this paper a streaming
algorithm for histogram computation is given.
Data
streams and histograms
S. Guha, N. Koudas, K. Shim
Proc. of the 33rd Annual ACM Symposium on Theory of Computing, 2001.
Inversions in data streams:
In this paper an algorithm to approximately count the number of
inversions in a data stream of n numbers from {1,...,m}.
Counting
inversions in lists
A. Gupta, F. X. Zane
Proc. of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms,
2003.
Sublinear time approximation algorithms:
MST weight:
Assume you are given access to a huge graph. What can you find out
about the graph by looking only at a small (possibly randomly selected)
part of the graph. In this paper the authors show that one can
approximate the weight of the minimum spanning tree of the graph
without looking at the whole graph.
Approximating the Minimum Spanning Tree Weight in Sublinear Time,
B. Chazelle, R. Rubinfeld, L. Trevisan
Automata, Languages and Programming, Proc. of the 28th International
Colloquium, pp. 190 - 200, 2001.
k-Median:
Given n points in a metric space the k-median problem is to find k
centers that minimize the sum of the distances (over all points) to the
nearest center. in this paper the author presents a sublinear
approximation algorithm for the k-median problem.
Sublinear time
algorithm for metric space problems
P. Indyk
Proc. of the 31st Symposium on Theory of Computing, pp. 428-434, 1999
Point location:
Given a (planar) subdivision the point location problem is to find the
cell of the subdivision that contains a given query point. In this
paper the authors show that one can do a point location in a Delaunay
triangulation of n points in sublinear time when the points are
distributed uniformly in the unit square.
Fast randomized point location without preprocessing in two- and
three-dimensional Delaunay triangulations
E. Mücke, I. Saias, B. Zhu
Proc. of the 12th annual Symposium on Computational Geometry, pp.
274-283, 1996
Sublinear algorithms in geometry:
In this paper sublinear algorithms for some geometric problems like
intersection detection for polygons and polyhedra is given.
Sublinear
geometric algorithms
B. Chazelle,
D. Liu,
Avner Magen:
Sublinear geometric algorithms.
Proc. of the 35th Symposium on Theory of Computing, pp. 531-540,
2003.