COMP 648: Computer Vision Seminar | Fall 2022

Instructor: Vicente Ordóñez-Román (vicenteor at
Class Time: Tuesdays from 4pm to 5:15pm Central Time (Keck Hall 101).

Course Description: This seminar will explore and analyze the current literature in computer vision, especially focusing on computational methods for visual recognition. Our topics include image classification and understanding, object detection, image segmentation, and other high-level perceptual tasks. Particularly, we will explore this semester recent topics such as: Contrastive-learning (e.g SimCLR, CLIP, BLIP), Vision-language Transformers (e.g. ALBEF, UNITER, VisualBERT), Diffusion Models (e.g. DALLĀ·E 2, Imagen), Learning with Synthetic Data (Hypersim, ThreeDWorld, etc), Biases in Computer Vision Models, Zero-shot Visual Recognition, Open Vocabulary Visual Recognition, Weakly Supervised Visual Grounding Models, Computer Vision for Image Generation (e.g. Stable Diffusion), among other topics.

Recommended Prerrequisites: COMP 547 (Computer Vision) or COMP 646 (Deep Learning for Vision and Language) or COMP 546/ELEC 546 (Intro to Computer Vision) or COMP 576 (Intro to Deep Learning) or COMP 647 (Deep Learning) or research experience in any of these topics.


Date Topic  
Aug 23th Welcome: Introduction [pptx] [pdf]
Aug 30th Contrastive Pre-training: SimCLR, CLIP, ALBEF -- Presenter: Ziyan Yang [slides] [pdf]
  • A Simple Framework for Contrastive Learning of Visual Representations. ICML 2020. [link]
  • Learning Transferable Visual Models From Natural Language Supervision. ICML 2021. [link]
  • Align before Fuse: Vision and Language Representation Learning with Momentum Distillation. NeurIPS 2021. [link]
Sep 6th Text-to-Image Synthesis with Conditional Diffusion Models -- Presenter: Aman Shrivastava [pdf]
  • Denoising Diffusion Probabilistic Models. NeurIPS 2020. [link]
  • Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022. [link]
  • Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv 2022. [link]
Sept 13th Masked Self-supervised Pretraining for Visual Recognition -- Presenter: Jefferson Hernandez [pdf]
  • Masked Autoencoders Are Scalable Vision Learners. CVPR 2022. [link]
  • Masked Autoencoders As Spatiotemporal Learners. arXiv 2022. [link]
  • Masked Vision and Language Modeling for Multi-modal Representation Learning. arXiv 2022. [link]
Sep 20th Visio-Linguistic Reasoning: Winoground, VL-Checklist -- Presenter: Paola Cascante-Bonilla [pdf]
  • Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality. CVPR 2022. [link]
  • VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations. arXiv 2022. [link]
Sep 27th Visual Grounding: Learning to Localize Objects -- Presenter: Atanu Dahari [pdf]
  • Grounded Language-Image Pre-Training. CVPR 2022. [link]
  • Simple Open-Vocabulary Object Detection with Vision Transformers. arXiv 2022. [link]
Oct 4th Structured Training and Subnetworks -- Presenter: Vicente [pdf]
  • The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. ICLR 2019. [link]
  • Knowledge Evolution in Neural Networks. CVPR 2021. [link]
  • Robust Fine-tuning of Zero-shot Models. CVPR 2022. [link]
Oct 18th Semi-Supervised Learning -- Presenter: Maojie Tang
  • Fixmatch: Simplifying Semi-supervised learning with Consistency and Confidence. NeurIPS 2020. [link]
  • Semi-supervised Vision Transformers at Scale. arXiv 2022. [link]
Oct 25th Deep Image Retrieval and Matching
  • SuperGlue: Learning Feature Matching With Graph Neural Networks. CVPR 2020. [link]
  • Instance-level Image Retrieval using Reranking Transformers. ICCV 2021. [link]
  • Efficient Large-Scale Localization by Global Instance Recognition. CVPR 2022. [link]
Nov 1st Universal Computer Vision Models
  • Flamingo: a Visual Language Model for Few-Shot Learning. 2022. [link]
  • A Generalist Agent. arXiv 2022. [link]
Nov 8th Recent Topics on Diffusion Models for Tasks other than Text-to-Image Generation
Nov 15th Recent Topics on Zero-shot Learning through Large-scale Pretraining
Nov 22nd Recent Topics in Bias and Fairness in Visual Recognition
Nov 29th Recent Topics in Any Visual Recognition Task

Disclaimer: The topics on this list are tentative and subject to adjustments throughout the semester as interests in the group evolve.

Logistics: This is a pass/fail one credit seminar. Registered students are required to participate and present a recent work in a topic of interest of the seminar at least once throughout the semester. A Satisfactory grade requires participating presenting a paper at least once during the semester, and actively participating in discussions throughout the semester. As a rule of thumb to get a satisfactory grade you are expected to attend at least 10 of the 14 sessions in the semester.

Honor Code and Academic Integrity: "In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution. If you are unfamiliar with the details of this code and how it is administered, you should consult the Honor System Handbook at This handbook outlines the University's expectations for the integrity of your academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process."

Title IX Support: Rice University cares about your wellbeing and safety. Rice encourages any student who has experienced an incident of harassment, pregnancy discrimination or gender discrimination or relationship, sexual, or other forms interpersonal violence to seek support through The SAFE Office. Students should be aware when seeking support on campus that most employees, including myself, as the instructor/TA, are required by Title IX to disclose all incidents of non-consensual interpersonal behaviors to Title IX professionals on campus who can act to support that student and meet their needs. For more information, please visit or email

Disability Resource Center: "If you have a documented disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with the Disability Resource Center (Allen Center, Room 111 / / x5841) to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs."