Unusually frequent or rare words are implicated in various facets of
biological function and structure. With sequence data becoming
massively available, tasks akin to an exhaustive enumeration and
testing of word frequencies in a whole genome become
increasingly appealing, and yet pose significant computational
burdens even when limited to words of bounded maximum length. In
addition, the display of the huge tables possibly resulting from
these counts poses significant problems of visualization and inference.
In this talk Lonardi will show efficient and practical algorithms for the
problem of detecting words that are, by some measure, over- or
under-represented in the context of larger sequences. He will also show
that such anomaly detectors can be used successfully to discover
(exact) patterns in biological sequences.
(Joint work with A. Apostolico, M. E. Bock, and F. Gong)
Thursday, March 22, 2001 @ 4:00 in DH1070
A reception will be held BEFORE the talk - at 3:30 in DH3076
About Stefano Lonardi
Stefano Lonardi is a Ph.D. candidate at the Department of Computer
Sciences of Purdue University, West Lafayette, IN. In 1994 he
received the "Laurea" degree (cum laude) from the
University of Pisa, Italy. In 1996 he joined the graduate program at
Purdue University. In the summer of 1999 he joined Celera Genomics,
Rockville, MD, for a summer internship. This year he received the
Student Reseach Award from the Purdue Chapter of Upsilon Pi Epsilon.
His main research interests include data compression, algorithms on
strings, computational molecular biology, and statistical analysis of
sequences. He is also member of the ACM and the honor societies
Upsilon Pi Epsilon and Phi Kappa Phi.
Stefano Lonardi is a faculty candidate.