Abstract
Effective algorithms for interacting with people-oriented data (text, video, speech, music, etc.) will ultimately be grounded in perceptual and cognitive psychology, just as compression algorithms for these media have a psychophysical basis. This reflects the simple fact that these media will be organized and searched by meaning. To this end, I will present systems that produce codings of video and text that are both efficient (in the information-theoretic sense) and psychologically meaningful. These systems are built around novel maximum-likelihood algorithms that leverage psychological datasets into workable interpreters of human-generated signals. In their raw form, the algorithms address general problems such as modeling non-Markovian processes and finding structure in non-metric data. Conjoined to psychological meta-data and/or trained, they can find themes in text, extract action scripts from video, and recognize complex gestures. Other applications include gene classification, network configuration, lip-reading, and over-the-shoulder tutors - machines that watch you work and unobtrusively augment your activity. I'll conclude by considering how the meta-data itself may be acquired.
Note: Matthew Brand is a faculty candidate.