COMP 648: Computer Vision Seminar | Fall 2025

Instructor: Vicente Ordóñez-Román (vicenteor at rice.edu)
Class Time: Tuesdays from 4pm to 5:15pm Central Time (Howard Keck Hall Room 105).

Course Description: This seminar will explore and analyze the current literature in computer vision, especially focusing on computational methods for visual recognition. Our topics include image classification and understanding, object detection, image segmentation, and other high-level perceptual tasks. We will explore this semester recent vision foundation models, and multimodal foundation models that involve images and video. This is a 1-credit graduate seminar with student-led weekly presentations.

Prerrequisite: COMP 646 (Deep Learning for Vision and Language) or research experience in deep learning, computer vision or related fields. Ask the instructor if you are unsure.

Schedule

Date Topic  
Aug 26th Welcome & Overview
  • Significant progress has been made to develop large scale efforts to build foundation models for images and video. We will do introductions and provide an overview of topics to be covered this semester.
Sep 2nd Self-Supervised Learning at Scale: DINOv3. August 2025.
  • DINOv3: [technical report] [code]
  • A self-supervised ViT that scales to 7B parameters on 1.7B images. Introduces Gram anchoring to prevent dense feature degradation, achieving SOTA performance on several dense tasks with a frozen backbone.
  • Presentation led by Jefferson Hernandez
Sep 9th Decomposable Flow Matching (DFM). June 2025.
  • DFM: [technical report] [project page]
  • A simple framework to progressively generate visual modalities scale-by-scale, achieving up to 50% faster convergence compared to Flow Matching. DFM applies Flow Matching independently at each level of a user-defined multi-scale representation (such as Laplacian pyramid), leading to improved visual quality for image and video generation.
  • Presentation led by Moayed Haji-Ali
Sep 16th Hierarchical Reasoning Model (HRM). June 2025.
  • HRM: [technical report] [code]
  • A novel AI architecture that overcomes the limitations of large language models by using two specialized modules—one for abstract planning and one for detailed computation. This design allows it to solve complex problems efficiently with very little data, outperforming much larger models on key reasoning tasks.
  • Presentation led by Zilin Xiao
Sep 23rd Segmentation at Scale: SAM 2. August 2024.
  • SAM 2: [technical report] [code]
  • Segment Anything Model 2 (SAM 2), a new foundation model for visual segmentation in both images and videos. The model, trained on the largest video segmentation dataset ever collected, is more accurate and significantly faster than its predecessor, requiring fewer user interactions to achieve better results.
Sep 30th PyVision: Agentic Vision with Dynamic Tooling
  • PyVision: [technical report] [code]
  • Current AI systems for visual reasoning are limited by static, predefined tools. PyVision enables multimodal large language models (MLLMs) to dynamically generate and refine Python tools for specific visual tasks in a multi-turn, interactive process. This approach allows models to "invent" their own tools, significantly improving performance and moving visual reasoning closer to truly agentic behavior.
Oct 7th Qwen-Image: Image Generation at Scale
  • Qwen-Image: [technical report] [code]
  • A new image generation model that significantly improves text rendering and image editing. It uses a comprehensive data pipeline and a progressive training strategy to handle complex text, while a multi-task training paradigm enhances its image editing capabilities.
Oct 14th MIDTERM RECESS (NO SCHEDULED CLASSES)
Oct 21st Hunyuan-3D: Generation of 3D Assets at Scale
  • Hunyuan-3D 2.0: [technical report] [code]
  • An advanced 3D synthesis system that generates high-resolution, textured 3D assets using two core models: Hunyuan3D-DiT for shape generation and Hunyuan3D-Paint for high-resolution texture synthesis. This system, along with the user-friendly Hunyuan3D-Studio platform, allows both professionals and amateurs to efficiently create and manipulate 3D models.
Oct 28th Test Time Training (TTT)
  • Learning to (Learn at Test Time): RNNs with Expressive Hidden States [technical report] [code]
  • Test-Time Training layers address the limitations of both self-attention and traditional RNNs by using a machine learning model as the hidden state, which is updated even during testing. This approach, implemented in TTT-Linear and TTT-MLP, demonstrates a superior ability to handle long-context sequences compared to both Transformers and modern RNNs like Mamba, showing promising potential for future research.
Nov 4th Foundation Models for Video
Nov 11th Foundation Models in Medical Imaging/Satellite and/or other domains
Nov 18th Efficiency in Foundation Models (Quantization, Distillation)
Nov 25th Future Directions & Open Problems
Dec 2nd Final Activity

Disclaimer: The topics on this list are tentative and subject to adjustments throughout the semester as interests in the group evolve.

Logistics: This is a seminar with a pass/fail grade. Registered students are required to participate and present a recent work in a topic of interest of the seminar at least once throughout the semester. A Satisfactory grade requires participating presenting a paper at least once during the semester and actively participating in discussions throughout the semester.

Honor Code and Academic Integrity: "In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution. If you are unfamiliar with the details of this code and how it is administered, you should consult the Honor System Handbook at http://honor.rice.edu/honor-system-handbook/. This handbook outlines the University's expectations for the integrity of your academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process."

Title IX Support: Rice University cares about your wellbeing and safety. Rice encourages any student who has experienced an incident of harassment, pregnancy discrimination or gender discrimination or relationship, sexual, or other forms interpersonal violence to seek support through The SAFE Office. Students should be aware when seeking support on campus that most employees, including myself, as the instructor/TA, are required by Title IX to disclose all incidents of non-consensual interpersonal behaviors to Title IX professionals on campus who can act to support that student and meet their needs. For more information, please visit safe.rice.edu or email titleixsupport@rice.edu.

Disability Resource Center: "If you have a documented disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with the Disability Resource Center (Allen Center, Room 111 / adarice@rice.edu / x5841) to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs."