COMP 648: Computer Vision Seminar | Fall 2025

Instructor: Vicente Ordóñez-Román (vicenteor at rice.edu)
Class Time: Tuesdays from 4pm to 5:15pm Central Time (Howard Keck Hall Room 105).

Course Description: This seminar will explore and analyze the current literature in computer vision, especially focusing on computational methods for visual recognition. Our topics include image classification and understanding, object detection, image segmentation, and other high-level perceptual tasks. We will explore this semester recent vision foundation models, and multimodal foundation models that involve images and video. This is a 1-credit graduate seminar with student-led weekly presentations.

Prerrequisite: COMP 646 (Deep Learning for Vision and Language) or research experience in deep learning, computer vision or related fields. Ask the instructor if you are unsure.

Schedule

Date Topic  
Aug 26th Welcome & Overview
  • Significant progress has been made to develop large scale efforts to build foundation models for images and video. We will do introductions and provide an overview of topics to be covered this semester.
Sep 2nd Self-Supervised Learning at Scale: DINOv3. August 2025.
  • DINOv3: [technical report] [code]
  • A self-supervised ViT that scales to 7B parameters on 1.7B images. Introduces Gram anchoring to prevent dense feature degradation, achieving SOTA performance on several dense tasks with a frozen backbone.
  • Presentation led by Jefferson Hernandez
Sep 9th Decomposable Flow Matching (DFM). June 2025.
  • DFM: [technical report] [project page]
  • A simple framework to progressively generate visual modalities scale-by-scale, achieving up to 50% faster convergence compared to Flow Matching. DFM applies Flow Matching independently at each level of a user-defined multi-scale representation (such as Laplacian pyramid), leading to improved visual quality for image and video generation.
  • Presentation led by Moayed Haji-Ali
Sep 16th Hierarchical Reasoning Model (HRM). June 2025.
  • HRM: [technical report] [code]
  • A novel AI architecture that overcomes the limitations of large language models by using two specialized modules—one for abstract planning and one for detailed computation. This design allows it to solve complex problems efficiently with very little data, outperforming much larger models on key reasoning tasks.
  • Presentation led by Zilin Xiao
Sep 23rd ICLR Deadline + Overlap with Richard Tapia's talk in McMutry.
Sep 30th Think Beyond Images: Reasoning with Images
  • Thyme: [technical report] [code]
  • A framework that adds rotation, cropping and contrast enhancement as intermedia actions in a reasoning path to answer general quetions about images.
  • Presentation led by Jaywon Koo
Oct 7th Qwen-Image: Image Generation at Scale
  • Qwen-Image: [technical report] [code]
  • A new image generation model that significantly improves text rendering and image editing. It uses a comprehensive data pipeline and a progressive training strategy to handle complex text, while a multi-task training paradigm enhances its image editing capabilities.
  • Presentation led by Catherine He
Oct 14th MIDTERM RECESS (NO SCHEDULED CLASSES)
Oct 21st ICCV Conference
Oct 28th Video Agents
  • VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding [technical report] [code]
  • A method to scale reasoning over long form video understanding through retrieval and memory-augmented models.
  • Presentation led by Xin Yao
Nov 4th Diffusion Transformers with Representation Autoencoders
  • Representation AutoEncoders [technical report] [code]
  • Diffusion models have long relied on a VAE trained from scratch, this work explores the idea of having a pretrained frozen! representation encoder in the encoder for the VAE part of the model.
  • Presentation led by H.T. Simon Dang
Nov 11th Agents for ML
  • Language Modeling by Language Models [technical report]
  • How do we leverage LLMs to improve LLMs? A proper answer to this question can lead to self improvement of current AI systems.
  • Presentation led by Jaywon Koo
Nov 18th Flexible Hierarchical Multimodal Embeddings at Scale
  • MetaEmbed [technical report]
  • Zilin's own work with Meta Superintelligence Labs on developing a framework for flexible multimodal embeddings that can capture content at different levels of detail for multimodal retrieval.
  • Presentation led by Zilin Xiao
Nov 25th Thanksgiving -- Fall Recess
Dec 2nd The Platonic Representation Hypothesis
  • Words That Make Language Models Perceive [technical report]
  • Hypothesis: As language models become more capable, they start to converge in representations with models for other modalities (e.g. images and sound).
  • Presentation led by Joshua-James Claybon

Disclaimer: The topics on this list are tentative and subject to adjustments throughout the semester as interests in the group evolve.

Logistics: This is a seminar with a pass/fail grade. Registered students are required to participate and present a recent work in a topic of interest of the seminar at least once throughout the semester. A Satisfactory grade requires participating presenting a paper at least once during the semester and actively participating in discussions throughout the semester.

Honor Code and Academic Integrity: "In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution. If you are unfamiliar with the details of this code and how it is administered, you should consult the Honor System Handbook at http://honor.rice.edu/honor-system-handbook/. This handbook outlines the University's expectations for the integrity of your academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process."

Title IX Support: Rice University cares about your wellbeing and safety. Rice encourages any student who has experienced an incident of harassment, pregnancy discrimination or gender discrimination or relationship, sexual, or other forms interpersonal violence to seek support through The SAFE Office. Students should be aware when seeking support on campus that most employees, including myself, as the instructor/TA, are required by Title IX to disclose all incidents of non-consensual interpersonal behaviors to Title IX professionals on campus who can act to support that student and meet their needs. For more information, please visit safe.rice.edu or email titleixsupport@rice.edu.

Disability Resource Center: "If you have a documented disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with the Disability Resource Center (Allen Center, Room 111 / adarice@rice.edu / x5841) to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs."