Data management techniques for data streams has gained much importance
recently. The talk addresses techniques for approximately answering
aggregate SQL queries over continuous data streams with limited
memory. Our method relies on randomizing techniques that compute small
"sketches" of the streams that can be used to provide approximate
query answers with provable guarantees on the approximation error.
In this talk we give an introduction to sketches and we present our
two main contributions. First, we show how existing statistical
information about the data (e.g., histograms) can be used to improve
the quality of the approximations provided by our algorithms. The key
idea is to intelligently partition the domain of the join attributes
in a way that provably improves the approximation guarantees. Second,
in the presence of multiple queries, intelligent sharing of sketches
among concurrent queries can result in dramatic improvements in the
utilization of the available sketching space and in the quality of the
resulting approximation error. We will discuss some of the novel
optimization problems that arise in the context of multi-query sketch
sharing and describe our solutions.
(Joint work with Johannes Gehrke, Minos Garofalakis and Rajeev Rastogi)
Alin Dobra is a faculty candidate.
Thursday, April 10 at 4:00 p.m. in DH 1070
Reception preceding the talk at 3:30 p.m. in DH 3076