CS 6501-003: Deep Learning for Visual Recognition

Instructor: Vicente Ordóñez-Román (vicente at virginia.edu). Office Hours: Tuesdays 3 to 5pm (Rice 310)
Teaching Assistant: Ziyan Yang (zy3cx at virginia.edu) -- Hours: Thursdays 3pm to 5pm (Rice 442)
Teaching Assistant: Paola Cascante-Bonilla (pc9za at virginia.edu) -- Office Hours: Fridays 2 to 4pm (Rice 442)
Class Time: Mondays & Wednesdays between 3:30PM and 4:45PM, at Olsson Hall 005.
Discussion Forum: Piazza

Course Description: How can we use computers to recognize objects, people, actions, animals, places, etc from images? This seemingly trivial task that people perform without much effort has remained one of the core problems in Computer Vision. Recent advances in representation learning using multiple layers of abstraction (deep learning) have demonstrated to be an important aspect for designing artificial systems for visual recognition. In this class we will study, conceive, and implement deep learning models and learning algorithms for computational visual recognition. After this class you will be able to understand, design, implement, and assess the impact of deep learning techniques for a diverse range of visual recognition tasks.

Learning Objectives: (a) Develop intuitions between aspects in human vision and computer vision, (b) Understanding foundational concepts for representation learning using neural networks, (c) Become familiar with state-of-the-art models for tasks such as image classification, object detection, image segmentation, scene recognition, etc, and (d) Obtain practical experience in the implementation of visual recognition models using deep learning.

Prerrequisites: This course requires no previous background in computer vision or machine learning but knowledge in either of those will be helpful. You need to know about matrices, calculating derivatives, and probabilities (bayes rule). You will also need to be at least a moderately proficient programmer in python. There will be several lab assignments. These assignments will show you the basics of modern general visual recognition algorithms and models, and will give you the tools for implementing more advanced ones. Finally, we will have a class project where you will be able to work on something beyond your assignments and where you will have more freedom to pursue a focused problem that is of your interest and better matches your background. Finally we will be using python/pytorch in the lecture notes, so being proficient in Python by completing a few projects in this language before the class starts is helpful. You should install python, jupyter, and pytorch, and complete the following notebook [pytorch_tensors].

Syllabus

Date     Topic 
Mon, January 13th Introduction to Visual Recognition [pptx] [pdf] + Primer on Image Processing [link]
Assignment on Image Processing and Manipulation [Colab]. Due January 26th 5pm EST.
Wed, January 15th Image Processing and Image Manipulations [pptx] [pdf]
Mon, January 20th MLK Holiday -- no class this day  
Assignment on Image Classification [Colab]. Due February 3rd 11:59pm EST.
Wed, January 22nd Softmax Classifier + Stochastic Gradient Descent [pptx] [pdf]
Mon, January 27th Shallow Image Features and the Bag of Features model [pptx] [pdf]  
Assignment on Deep Learning Basics [Colab]. Due February 10th 11:59pm EST.
Wed, January 29th Neural Networks and the Multi-layer Perceptron Model [pptx] [pdf]  
Mon, February 3rd Convolutional Neural Networks (CNNs) [pptx] [pdf]  
Assignment on Convolutional Neural Networks [Colab]. Due February 24th 11:59pm EST.
Wed, February 5th
Speaker: Dr. Catherine Schuman (Oak Ridge National Laboratory)
Guest Lecture: Neuromorphic Computing
More information: Dr. Catherine Schuman works as Research Scientist at the Oak Ridge National Lab (ORNL) in Tennessee in Neuromorphic computing and Spiking Neural Networks. These are models that function in some ways more similarly to processes in the brain and seem to be promising in terms of efficiency.
Mon, February 10th Convolutional Neural Network Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet [pptx] [pdf]  
Wed, February 12th Deep Learning-based Object Detection [pptx] [pdf]
Mon, February 17th Deep Learning-based Semantic Image Segmentation [pptx] [pdf]
Wed, February 19th Generative Adversarial Networks (GANs) [pptx] [pdf]
Mon, February 24th Paper Review: CNNs as Features for Transfer Learning
  • CNN Features off-the-shelf: an Astounding Baseline for Recognition. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, Stefan Carlsson. CVPR 2014 Workshops. [arxiv] (Presented by Ziyan Yang)
  • Do Better ImageNet Models Transfer Better? Simon Kornblith, Jonathon Shlens, Quoc V. Le CVPR 2019 [arxiv] (Presented by Paola Cascante-Bonilla)
Wed, February 26th Recurrent Neural Networks (RNNs) [pptx] [pdf]
Mon, March 2nd Paper Review: Face Recognition and Pose Estimation
  • Deep Face Recognition. Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. BMVC 2015. [pdf] (Presented by Nazanin and Navreet)
  • Deep High-Resolution Representation Learning for Human Pose Estimation. Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang. CVPR 2019 [arxiv] (Presented by Minjie and Leizhen).
Wed, March 4th Paper Review: Recent Methods for Object Detection and Instance Segmentation.
  • Mask R-CNN. by Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick. ICCV 2017 [arxiv] (Presented by Andrew and Soneya).
  • CornerNet: Detecting Objects as Paired Keypoints. by Hei Law, Jia Deng . ECCV 2018. [arxiv] (Presented by Fazlay and Matthew).
Mon, March 9th Spring recess -- no class this day  
Wed, March 11th Spring recess -- no class this day  
Mon, March 16th Extended Spring recess due to COVID-19 -- no class this day -- stay safe!  
Wed, March 18th Extended Spring recess due to COVID-19 -- no class this day -- stay safe!
Mon, March 23rd Paper Review: Interpreting and Explaining Deep Neural Networks
  • Network Dissection: Quantifying Interpretability of Deep Visual Representations. David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba. CVPR 2017 [arxiv] (Presented by Zhe and Will).
  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. ICCV 2017. [arxiv] (Presented by Ruipeng and Zhiming).
Wed, March 25th Paper Review: Image to Text: Image Captioning
  • Show and Tell: A Neural Image Caption Generator. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. CVPR 2015 [arxiv] (Presented by Jacob and Ahsan).
  • Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang. CVPR 2018. [arxiv] (Presented by Jiaxin and Zheng).
Mon, March 30th Paper Review: Structured Prediction with Partial Labels + Efficient NNs I
  • Learning Structured Inference Neural Networks with Label Relations. Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao, Greg Mori CVPR 2016 [arxiv] (Presented by Anshuman and Kamya).
  • Feedback-prop: Convolutional Neural Network Inference under Partial Evidence Tianlu Wang, Kota Yamaguchi, Vicente Ordonez. CVPR 2018. [arxiv] (Presented by Zijie and Lulu).
  • Efficient NNs I: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. [arxiv] (Presented by Arjit and Gaurav).
 
Wed, April 1st Paper Review: Conditional Generative Adversarial Networks (GANs)
  • Image-to-Image Translation with Conditional Adversarial Networks. By Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. CVPR 2017 [arxiv] (Presented by Shivani and Akhila).
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. ICCV 2017. [arxiv] (Presented by Sanchit and Rishab).
 
Mon, April 6th Paper Review: Avoiding Visual Bias in Computer Vision
  • Women also Snowboard: Overcoming Bias in Captioning Models. By Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach. ECCV 2018 [arxiv] (Presented by Tina and Junyu).
  • Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations. Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez ICCV 2019. [arxiv] (Presented by Nidhi and Daniel C.).
 
Wed, April 8th Paper Review: Video Recognition + Efficient NNs II
  • Efficient NNs II: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun. ECCV 2018 [arxiv] (Presented by Fuxiao and Dexuan).
  • Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Joao Carreira, Andrew Zisserman. CVPR 2017. [arxiv] (Presented by Mustafa and Will).
  • SlowFast Networks for Video Recognition. Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He. ICCV 2019. [link] (Presented by Tx and yf7da).
 
Mon, April 13th Paper Review: Transformer Networks
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee NeurIPS 2019 [arxiv] (Presented by Mofijul and Arash).
  • VisualBERT: A Simple and Performant Baseline for Vision and Language. Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang . [arxiv] (Presented by Sanxing and Zhe).
 
Wed, April 15th Paper Review: Self-supervised Learning
  • Self-Supervised Learning of Pretext-Invariant Representations. Ishan Misra, Laurens van der Maaten . [arxiv] (Presented by Martin and Leticia).
  • Momentum Contrast for Unsupervised Visual Representation Learning. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick. [arxiv] (Presented by Rasool and Seyed).
 
Mon, April 20th Paper Review: Colorization and Super-resolution
  • ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang. ECCV 2018 Workshops [arxiv] (Presented by Aniruddha and Akhil).
  • Learning Diverse Image Colorization. Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, Min Jin Chong, David Forsyth. CVPR 2017. [arxiv] (Presented by Phillip and Colin).
 
Wed, April 22nd Paper Review: Neural Architecture Design and Search
  • Exploring Randomly Wired Neural Networks for Image Recognition. Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He. [arxiv] (Presented by ki5hd and zm8bh).
  • DARTS: Differentiable Architecture Search. Hanxiao Liu, Karen Simonyan, Yiming Yang. ICLR 2019. [arxiv] (Presented by Aashikur and Daniel W.).
 
Mon, April 27th Course Re-cap and Good Bye! [slides]  

Disclaimer: The professor reserves to right to make changes to the syllabus, including assignment due dates. These changes will be announced as early as possible.

Grading: ** Due to the COVID-19 state of emergency we have exceptionally changed the grading scheme** to use by default the following distribution: Assignments: 600pts (4 assignments: 150pts + 150pts + 150pts + 150pts), Class Project: 200pts, Reading Summaries: 100pts, Class Presentation: 100pts. Letter grades to be decided as follows: A+ (950pts), A (850pts), A- (800pts), B+ (770pts), B (750pts), B- (730pts), C+ (700pts), C (670pts), C- (650pts), D+ (630pts), D (600pts), D- (570pts).

Note: The old grading scheme will be applied if this leads to a more favorable grade for the student: Assignments: 400pts (4 assignments: 100pts + 100pts + 100pts + 100pts), Class Project: 400pts, Reading Summaries: 100pts, Class Presentation: 100pts. Letter grades to be decided as follows: A+ (1000pts), A (930pts), A- (900pts), B+ (870pts), B (830pts), B- (800pts), C+ (770pts), C (730pts), C- (700pts), D+ (670pts), D (630pts), D- (600pts).

Late Submission Policy: No late assignments will be accepted in this class. Unless the student has procured special accommodations for this class.

Academic Integrity Statement: "The School of Engineering and Applied Science relies upon and cherishes its community of trust. We firmly endorse, uphold, and embrace the University’s Honor principle that students will not lie, cheat, or steal, nor shall they tolerate those who do. We recognize that even one honor infraction can destroy an exemplary reputation that has taken years to build. Acting in a manner consistent with the principles of honor will benefit every member of the community both while enrolled in the Engineering School and in the future. Students are expected to be familiar with the university honor code, including the section on academic fraud."

Accessibility Statement: "The University of Virginia strives to provide accessibility to all students. If you require an accommodation to fully access this course, please contact the Student Disability Access Center (SDAC) at (434) 243-5180 or sdac@virginia.edu. If you are unsure if you require an accommodation, or to learn more about their services, you may contact the SDAC at the number above or by visiting their website at https://www.studenthealth.virginia.edu/student-disability-access-center/about-sdac."

Other similar courses or courses with useful related material:

Department of Computer Science, University of Virginia, Spring 2020.