COMP 646: Deep Learning for Vision and Language | Spring 2023

Instructor: Vicente Ordóñez-Román (vicenteor at, Office Hours: 10am to 11am on Tuesdays at DH 3098.
TA: Jefferson Hernandez (jeh16 at, Office Hours: Mondays 2:30pm to 3:30pm at DH 3036.
TA: Arnold Kazadi (akn7 at, Office Hours: Thursdays 11am to 12pm at DH 3036.
TA: Sangwon Seo (ss202 at, Office Hours: Wednesdays 10am to 11am at DH 3002.
TA: Gaotian Wang (gw23 at, Office Hours: Wednesdays 3pm to 4pm at DH 3036.
Class Time: Tuesdays and Thursdays from 4pm to 5:15pm Central Time (Herzstein Hall 210).
Piazza: link

Course Description: Visual recognition and language understanding are two challenging tasks in AI. In this course we will study and acquire the skills to build machine learning and deep learning models that can reason about images and text for generating image descriptions, visual question answering, image retrieval, and other tasks involving both text and images. On the technical side we will leverage models such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and transformer networks (e.g. BERT, GPT-3, ViTs), among others.

Learning Objectives: (a) Develop intuitions about the connections between language and vision, (b) Understanding foundational concepts in representation learning for both images and text, (c) Become familiar with state-of-the-art models for tasks in vision and language, (d) Obtain practical experience in the implementation of these models.

Prerrequisites: There are no formal pre-requisities for this class. However a basic command of machine learning, deep learning or computer vision will be useful when taking this class. Students should have knowledge of linear algebra, differential calculus, and basic statistics and probability. Moreover students are expected to have attained some level of proficiency in Python programming or be willing to learn Python programming. Students are encouraged to complete the following activity before the first lecture: [Primer on Image Processing].


Date Topic  
Tue, Jan 10 Introduction to Vision and Language [pptx] [pdf]
Thu, Jan 12 Machine Learning I: Supervised vs Unsupervised Learning, Linear Classifiers [pptx] [pdf]
Assignment on Pytorch + Image Classification [colab]
Due Monday January 30th, 11:59pm (CT).
Tue, Jan 17 Machine Learning II: Stochastic Gradient Descent / Regularization [pptx] [pdf]
Thu, Jan 19 Neural Networks: Multi-layer Perceptrons and Backpropagation [pptx] [pdf]
Tue, Jan 24 Computer Vision I: The Convolutional Operator and Image Filtering [pptx] [pdf]
Thu, Jan 26 Computer Vision II: Convolutional Neural Networks: LeNet, AlexNet [pptx] [pdf]
Assignment on Movie Posters and Plots [colab]
Due Monday February 20th, 11:59pm (CT).
Tue, Jan 31 Computer Vision III: Convolutional Neural Networks: VGG, InceptionNets, ResNets [pptx] [pdf]
Thu, Feb 2 Natural Language Processsing I: Introduction: Bag of Words, N-gram Language Models, Word Embeddings [pptx] [pdf]
Tue, Feb 7 Natural Language Processsing II: Representations and Tokenization Issues [pptx] [pdf]
Thu, Feb 9 Spring recess (No Scheduled Classes)
Tue, Feb 14 Natural Language Processsing III: Recurrent Neural Networks (RNNs, LSTMs, GRUs, Seq-to-Seq, CNNs+RNNs) [pptx] [pdf]
Thu, Feb 16 Transformers I: Transformer Models and Multi-head Self-Attention [pptx] [pdf]
Tue, Feb 21 Transformers II: BERT, GPT-2, ViT, CLIP [pptx] [pdf]
Thu, Feb 23 Quiz
Assignment on Auto-regressive and Generative Models [colab]
Due Monday March 20th, 11:59pm (CT).
Tue, Feb 28 Quiz Discussion: Mid-course Recap
Thu, Mar 2 Computer Vision IV: Convolutional Neural Networks for Object Detection [pptx] [pdf]
Tue, Mar 7 Computer Vision V: Convolutional Neural Networks for Image Segmentation [pptx] [pdf]
Thu, Mar 9 Adversarial Examples, Generative Adversarial Networks (GANs) and Text-conditioned GANs [pptx] [pdf]
Tue, Mar 14 Spring Break (No Scheduled Classes)
Thu, Mar 16 Spring Break (No Scheduled Classes)
Tue, Mar 21 Generative Adversarial Networks (GANs) and Auto-Encoders [pptx] [pdf]
Thu, Mar 23 Discrete and Vector-quantized Auto-Encoders, DALLE-v1 and Style Transfer [pptx] [pdf]
Tue, Mar 28 Text-to-Image Networks: Diffusion Models [pptx] [pdf]
Thu, Mar 30 Working with training large scale jobs in practice (Wandb, Containers, SLURM)
Tue, Apr 4 Back to Language Models: Instruction Tuning and Multimodality [pptx] [pdf]
Thu, Apr 6 In Class Activity: Feature Inversion: Work from Home Activity
Tue, Apr 11 Referring Expressions, Visual Question Answering and Explainability [pptx] [pdf]
Thu, Apr 13 Explainability, Self-supervision and Video [pptx] [pdf]
Tue, Apr 18 Self-Supervision and other Recent Topics [pptx] [pdf]
Thu, Apr 20 Course Recap [pptx] [pdf]

Disclaimer: The professor reserves to right to make changes to the syllabus, including assignment due dates. These changes will be announced as early as possible.

Grading: Assignments: 30% (3 assignments), Class Project: 60%, Quiz: 10%. Grade cutoffs -- no stricter than the following: A [between 90% and 100%], B [between 80% and 90%), C [between 70% and 80%), D [between 55% and 70%), F [less than 55%)

COVID-19 Notice: Please pay close attention to university officials and the instructor regarding modifications to course delivery or content due to the pandemic. If a student is personally affected by the pandemic course staff will also make special considerations on a case-by-case basis as allowed -- students however should first follow any guidance put forward by official university channels of communication. In general, if you have any symptoms you should really to stay home. There is no grade for attendance in this class and in general we will be supportive of other ways of attending and participating to the maximum extent possible.

Late Submission Policy: No late assignments will be accepted in this class. Unless the student has procured special accommodations for warranted circumstances -- or due to exceptional personal situations. If you consider this might be your case please contact the instructor directly as early as possible. If you contact the instructor on the day of the deadline but before the deadline is past due, then you are required to also submit a copy of your notebook with the progress you have made so far in order to make your request. In general, unless a medical condition or other serious situation is affecting you, please do not email the instructor or TAs requesting to have special considerations. If you suffer from an ongoing medical condition there is support through the Disability Resource Center at Rice, please follow the advice on that section of this syllabus.

Honor Code and Academic Integrity: "In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution. If you are unfamiliar with the details of this code and how it is administered, you should consult the Honor System Handbook at This handbook outlines the University's expectations for the integrity of your academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process." For this class: If assignments are individual then no collaboration is expected, no two students should submit the same source code. Regardless of circumstances I will assume that any source code, text, or images submitted alongside reports or projects are of the authorship of the students unless otherwise explicitly stated through appropriate means. Any missing information regarding sources will be regarded potentially as a failure to abide by the academic integrity statement even if that was not the intent. Please be careful about citing sources and clearly stating what is your original work and what is not in all assignments and projects. Especially avoid vague statements such as "we built our model based on X", instead be explicit e.g. "we downloaded X and modified the encoder so that it can work with videos instead of images by adding three more layers". Avoid vague statements that make it difficult to understand what you did from what was done by others. Sometimes great projects consist in simply putting together two existing components that someone else developed, however this has to be clearly acknowledged as such.

Title IX Support: Rice University cares about your wellbeing and safety. Rice encourages any student who has experienced an incident of harassment, pregnancy discrimination or gender discrimination or relationship, sexual, or other forms interpersonal violence to seek support through The SAFE Office. Students should be aware when seeking support on campus that most employees, including myself, as the instructor/TA, are required by Title IX to disclose all incidents of non-consensual interpersonal behaviors to Title IX professionals on campus who can act to support that student and meet their needs. For more information, please visit or email

Disability Resource Center: "If you have a documented disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with the Disability Resource Center (Allen Center, Room 111 / / x5841) to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs."