Computational Visual Recognition

CS 6501-009: Computational Visual Recognition

Instructor: Vicente Ordóñez R (vicente at virginia.edu)

Instructor's Office Hour: Tuesdays 3pm to 4pm at Rice Hall 310

TA: Tianlu Wang (tw8cb at virginia.edu) -- TA Office Hour: Wednesdays 5pm to 6pm at Rice 430 (desk 12)

TA: Siva Sitaraman (ks6cq at virginia.edu) -- TA Office Hour: Fridays 3pm to 4pm at Rice 204

Time: Tuesday & Thursday between 11:00AM and 12:15PM, at Olsson Hall 005.

Discussion Forum: http://piazza.com/virginia/fall2017/cs6501009/home

How can we use computers to recognize objects, people, actions, animals, places, etc from images? This seemingly trivial task that people perform without much effort has remained one of the core problems in Computer Vision. In this class we will study, play with, and implement algorithms for computational visual recognition using machine learning and deep learning. The class sessions will consists of lectures by the instructor for the most foundational topics, and several student-led paper review sessions to study more recent developments. After this class you will be able to use computational visual recognition for problems ranging from classifying images, to detecting and outlining every object in an image. In summary, after successful completion of this course you should be able to teach a robot how to distinguish dogs from cats.

More about this class

Topics: Signup for this class if you are interested in any of the following:

Recognizing people in images, automatic image classification, object detection/localization.
Scene recognition, scene parsing, place detection.
Action recognition from still images, action recognition from video.
Recognizing attributes, aesthetics, other perceptual qualities.

Prerrequisites: This course requires no previous background in computer vision or machine learning but knowledge in either of those will be helpful. You need to know about matrices, calculating derivatives, and probabilities (bayes rule). You will also need to be at least a moderately proficient programmer in python. There will be several lab assignments. These assignments will show you the basics of modern general visual recognition algorithms and models, and will give you the tools for implementing more advanced ones. There will also be a couple of quizzes directly related to the assignments and material covered during class. Finally, we will have a class project where you will be able to work on something beyond your assignments and where you will have more freedom to pursue a focused problem that is of your interest and better matches your background. Finally we will be using python/pytorch in the lecture notes, so being proficient in Python by completing a few projects in this language before the class starts is helpful. You should install python, jupyter, and pytorch, and complete the following notebook before our first day of class [pytorch_tensors].

Grading: Labs: 30pts (Lab-1: 5pts, Lab-2: 5pts, Lab-3: 10pts, Lab-4: 10pts), Paper presentation and summaries: 10pts, Quiz: 20pts, Project: 40pts.

Summary of Hands-on Activities:

Lab-1 -- Image Processing Lab: [preview] [download]

Lab-2 -- Softmax Classifier Lab: [preview] [download]

Lab-3 -- Deep Learning Lab: [preview] [download]

Lab-4 -- Recurrent Neural Networks + Feature inversion Lab: [preview] [download]

Syllabus

Date	Topic
Tues, August 22th	Lecture: Introduction [slides] Welcome Why is Visual Recognition hard? Challenges in Computer Vision Problems and applications Lab-1 (Due Tuesday August 29th 11:59pm) -- Image Processing Lab: [preview] [download]
Thurs, August 24th	Lecture: Image Processing Basics & Image Features [slides] Overview of the field Basic Image Processing Convolutions, and filtering	Reading: Szeliski Book, Chapter 3.
Tues, August 29th	Lecture: Machine Learning for Vision I [slides] Discussion: Supervised vs Unsupervised Learning Supervised learning: k-Nearest neighbors Pedestrian Detection using Histogram of Oriented Gradients Unsupervised learning: Clustering
Thurs, August 31st	Lecture: Machine Learning for Vision II [slides] Supervised learning: Linear models Gradient Descent Stochastic Gradient Descent Regularization Lab-2 (Due Tuesday September 12th 11:59pm) -- Softmax Classifier Lab: [preview] [download]
Tues, September 5th	Lecture: Deep Learning for Vision I [no slides, only chalkboard] More on Softmax Classifier More on Stochastic Gradient Descent
Thurs, September 7th	TA Lecture: Categorization and the Perceptron Model Perceptron Model Required Reading: Pictures and Names: Making the Connection [Jolicoeur, Gluck, Kosslyn, 1984]
Tues, September 12th	No class this day -- Please use this time to work on your Lab.
Thurs, September 14th	Lecture: Deep Learning for Vision II [some slides, mostly chalkboard] Lab Review Perceptron Multi-layer Perceptron Neural Networks	Supplementary Reading: Neural Networks by Steve Renals
Tues, September 19th	Lecture: Deep Learning for Vision III: Intro to Convnets [slides] Neural Networks Imagenet and Big Data Convolutional Neural Networks Lab-3 (Due Thursday September 28th 11:59pm) -- Deep Learning Lab: [preview] [download]
Thurs, September 21st	Lecture: Deep Learning for Vision IV: Classification [slides] Convolutional Network Architectures I LeNet and Alexnet	Extra readings: [Alexnet paper], [VGG-16 slides] [VGG-16 paper]
Tues, September 26th	Lecture: Deep Learning for Vision V: Detection [slides] Convolutional Network Architectures II VGG-net, GoogLenet, ResNet	Extra readings: [GoogLeNet], [ResNet]
Thurs, September 28th	Lecture: Deep Learning for Vision VI: Segmentation [slides] R-CNN, Fast-RCNN, Faster-RCNN	Extra readings: [R-CNN], [Fast-RCNN], [Faster-RCNN], [FCNs]
Tues, October 3rd	No class this day -- Reading Days / Fall Break.
Thurs, October 5th	Lecture: Deep Learning for Vision VII [see previously posted slides / chalkboard / in-class python demonstration] Convolutional Networks with Variable-sized Inputs Intro to YOLO - Single Shot Object Detection
Submit a 1 or 2 page project proposal in PDF on UVA Collab (Deadline: Thursday October 5th at 5pm).
Tues, October 10th	Lecture: Deep Learning for Vision VIII [slides] YOLO continuation / SSD Fully-Convolutional FCN networks Convolutional Networks for Segmentation Intro to Recurrent Neural Networks RNNs I Long-short Term Memory Networks LSTMs	Extra Reading: [Image-Captioning], [Question-Answering]
Thurs, October 12th	Lecture: Deep Learning for Vision IX Unenrolled LSTMs Recurrent Neural Networks RNNs II Bi-Directional Long-short Term Memory Networks LSTMs Sequence-to-sequence models Image Captioning	Extra Reading: [Generative Adversarial Networks] Check this tutorial on how to implement style transfer in pytorch: [here]
Tues, October 17th	Lecture: Generative Adversarial Networks [slides] Generating Adversarial Examples Generative Adversarial Networks Style-transfer Networks Lab-4 (Due Tuesday November 7th 11:59pm) -- Recurrent Neural Networks Lab + 4pts optional on feature inversion: [preview] [download]	Here is a pytorch code you might want to try to adversarially learn to generate samples from any image collection using pytorch: [here]
Thurs, October 19th	Student Paper Review: Style-transfer Models Perceptual Losses for Real-Time Style Transfer and Super-Resolution, ECCV 2016. [arxiv] by Justin Johnson, Alexandre Alahi, Li Fei-Fei Deep Feature Interpolation for Image Content Changes, CVPR 2017.[arxiv] by Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, Kilian Weinberger	Check this Mobile App that does something like what is shown in the second paper: [here]
Tues, October 24th	Student Paper Review: Unsupervised learning of Deep Neural Networks. Unsupervised Visual Representation Learning by Context Prediction, ICCV 2015. [arxiv] by Carl Doersch, Abhinav Gupta, Alexei A. Efros Learning Visual Groups From Co-occurrences in Space and Time, ICLR 2016. [arxiv] by Phillip Isola, Daniel Zoran, Dilip Krishnan, Edward H. Adelson
Thurs, October 26th	Student Paper Review: Recent Advances in Generative Adversarial Networks Unsupervised representation learning with deep convolutional generative adversarial networks, ICLR 2016. [arxiv] by Alec Radford, Luke Metz, Soumith Chintala Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. [arxiv] by Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
Tues, October 31st	Student Paper Review: People Recognition Deep Face Recognition, BMVC 2015. [pdf] by Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman Stacked Hourglass Networks for Human Pose Estimation, ECCV 2016. [arxiv] by Alejandro Newell, Kaiyu Yang, Jia Deng
Submit a 2 or 3 page project progress report in PDF on UVA Collab (Deadline: Tuesday October 31st at 5pm). Use this template.
Thurs, November 2nd	In-Class Activity: Quiz Preparation.
Tues, November 7th	In-Class Activity: Quiz Preparation II Quiz Preparation Review
Thurs, November 9th	Student Paper Review: Motion, Tracking, and Video Two-Stream Convolutional Networks for Action Recognition in Videos. [arxiv], NIPS 2014. by Karen Simonyan, and Andrew Zisserman Re3 : Real-Time Recurrent Regression Networks for Object Tracking. [arxiv] by Daniel Gordon, Ali Farhadi, Dieter Fox
Tues, November 14th	No class this day -- Please use this time to work on your projects, and attend the talk on Friday by Prof. Yuille instead.
Thurs, November 16th	Lecture: Course Recap Course Overview and recap Scholarship and Ethics in AI
Friday November 17th, Attend Talk by Prof. Alan Yuille at 11:00AM in Monroe Hall 130. Speaker: Prof. Alan Yuille (Johns Hopkins University / MIT's Center for Brain, Minds and Machines) Talk: Representing Objects by Binary Visual Concepts Encoding More information: Prof. Yuille, who obtained his PhD in theoretical physics under Prof. Stephen Hawking in Cambridge, will talk about his research on models that could be better than deep neural networks and rely on binary representations. See more details in attached poster here. Invite more people!
Tues, November 21st	Quiz (20 pts)
Thurs, November 23rd	Thanksgiving recess - no classes this day.
Tues, November 28th	Project Presentations
Thurs, November 30th	Project Presentations
Tues, December 5th	Project Presentations
Submit a 4 to 5 page Final project report in PDF on UVA Collab + Link to your code (Deadline: Thursday December 5th). Use this template.

Academic Integrity

"The School of Engineering and Applied Science relies upon and cherishes its community of trust. We firmly endorse, uphold, and embrace the University’s Honor principle that students will not lie, cheat, or steal, nor shall they tolerate those who do. We recognize that even one honor infraction can destroy an exemplary reputation that has taken years to build. Acting in a manner consistent with the principles of honor will benefit every member of the community both while enrolled in the Engineering School and in the future. Students are expected to be familiar with the university honor code, including the section on academic fraud."

Instructor's Note: In this class particularly, lab assignments are individual. You can still discuss them in a group or with your friends but you should not be straight up copying somebody else's solution or code. Not even a single line of code. You might be tempted to think, well, in how many ways could I write c = c + c * c - 2? You are probably right but what if that's actually an espectacularly wrong solution, and only two students turn a solution with this unlikely expression on it? If there are two assignments where I notice something even slightly as suspicious as this, I, the instructor, Vicente, will refer the case to the Honor Code system where the outcome, if the academic misconduct is proven, will probably be a harsh dismissal from the university. Also, do not try to get solutions from the previous versions of this class, I keep those solutions on file and I am good at remembering code I have seen before. The UVA Honor Code system is harsh indeed, there are not many possible outcomes as in other systems. I strongly advise you not to do anything bad. It is not worth it. Most of the grade in this course will be the course project in any case. Not turning in a lab assignment is much preferrable than turning in something that contains academic misconduct. Beyond the possible academic consequences that this might entail, it will be incredibly dissappointing to me if I find any traces of this in lab assignments. Be clear about what are your original contributions in the class project, and enjoy doing the work on your lab assignments. So let's just all enjoy the class, and avoid this.

Other similar courses that might be of interest:

Deep Learning for Perception (Dhruv Batra, Virginia Tech)
The Unreasonable Effectiveness of Big Visual Data (Jianxiong Xiao, Princeton University)
Visual Recognition (Yong Jae Lee, UC Davis)
Introduction to Computer Vision (James Hays, Brown University / Georgia Tech)
Convolutional Neural Networks for Visual Recognition (Fei-fei Li, Andrej Karpathy and Justin Johnson, Stanford University)
Machine Learning (Nando de Freitas, University of Oxford)
Visual Recognition (Adriana Kovashka, University of Pittsburgh)
Recognizing People, Objects and Actions (Tamara L. Berg, UNC Chapel Hill)