Efficient algorithms for proximity problems have been among the most
commonly used tools for data mining and data analysis. Two such
problems are similarity search and clustering. Similarity search is
necessary in order to locate relevant information in a
database. Clustering, a form of data summarization, can facilitate
visualization, understanding, and knowledge extraction.
Proximity problems are typically studied for data represented as
points in the Euclidean space. In this talk I will investigate
instances of proximity problems for data that are more complex than
Euclidean points. First, I will describe a hashing scheme for
efficient indexing of data represented as sets. I will show how this
scheme can be applied to provide a scalable solution to the problem of
retrieving web pages similar to a query page. Also I will present a
technique for automatic evaluation of web page representation
strategies.
In the second part of the talk I will address the problem of
clustering temporal data. I will motivate the problem with an
application in the area of genomic data analysis, and I will discuss
the connection of temporal clustering with the problem of segmenting
sequences and with traditional clustering.
Aristides Gionis is a faculty candidate.
Thursday, April 24th at 4:00 p.m. in DH 1070
Reception at 3:30 p.m. in DH 3092