Rice Computer Science: <title>Rice Computer Science-Colloquia
[RiceCS]
DEPARTMENT
RESEARCHACADEMICS
PEOPLENEWS
[Rice]
Rice Computer Science
  SEARCH:
  
Rice University
Department of Computer Science
presents

Aristides Gionis
Stanford University

Proximity problems for non-Euclidean data

Abstract

Efficient algorithms for proximity problems have been among the most commonly used tools for data mining and data analysis. Two such problems are similarity search and clustering. Similarity search is necessary in order to locate relevant information in a database. Clustering, a form of data summarization, can facilitate visualization, understanding, and knowledge extraction.

Proximity problems are typically studied for data represented as points in the Euclidean space. In this talk I will investigate instances of proximity problems for data that are more complex than Euclidean points. First, I will describe a hashing scheme for efficient indexing of data represented as sets. I will show how this scheme can be applied to provide a scalable solution to the problem of retrieving web pages similar to a query page. Also I will present a technique for automatic evaluation of web page representation strategies.

In the second part of the talk I will address the problem of clustering temporal data. I will motivate the problem with an application in the area of genomic data analysis, and I will discuss the connection of temporal clustering with the problem of segmenting sequences and with traditional clustering.

Aristides Gionis is a faculty candidate.

Thursday, April 24th at 4:00 p.m. in DH 1070
Reception at 3:30 p.m. in DH 3092

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---