Welcome to the Course#

The automatic analysis and understanding of images and videos, a field called Computer Vision, occupies significant importance in applications including security, healthcare, entertainment, mobility, etc. The recent success of deep learning methods has revolutionized the field of computer vision, making new developments increasingly closer to deployment that benefits end users. This course will introduce the students to traditional computer vision topics, before presenting deep learning methods for computer vision. The course will cover basics as well as recent advancements in these areas, which will help the student learn the basics as well as become proficient in applying these methods to real-world applications. The course assumes that the student has already completed a full course in machine learning, and some introduction to deep learning preferably, and will build on these topics focusing on computer vision.

Fall 2022 Link : https://onlinecourses.nptel.ac.in/noc22_cs76/preview

Course Cirriculum#

Week 1:Introduction and Overview:#

Course Overview and Motivation; Introduction to Image Formation, Capture and Representation; Linear Filtering, Correlation, Convolution

Week 2:Visual Features and Representations:#

Edge, Blobs, Corner Detection; Scale Space and Scale Selection; SIFT, SURF; HoG, LBP, etc.

Week 3:Visual Matching:#

Bag-of-words, VLAD; RANSAC, Hough transform; Pyramid Matching; Optical Flow

Week 4:Deep Learning Review:#

Review of Deep Learning, Multi-layer Perceptrons, Backpropagation

Week 5:Convolutional Neural Networks (CNNs):#

Introduction to CNNs; Evolution of CNN Architectures: AlexNet, ZFNet, VGG, InceptionNets, ResNets, DenseNets

Week 6:Visualization and Understanding CNNs:#

Visualization of Kernels; Backprop-to-image/Deconvolution Methods; Deep Dream, Hallucination, Neural Style Transfer; CAM,Grad-CAM, Grad-CAM++; Recent Methods (IG, Segment-IG, SmoothGrad)

Week 7:CNNs for Recognition, Verification, Detection, Segmentation:#

CNNs for Recognition and Verification (Siamese Networks, Triplet Loss, Contrastive Loss, Ranking Loss); CNNs for Detection: Background of Object Detection, R-CNN, Fast R-CNN, Faster R-CNN, YOLO, SSD, RetinaNet; CNNs for Segmentation: FCN, SegNet, U-Net, Mask-RCNN

Week 8:Recurrent Neural Networks (RNNs):#

Review of RNNs; CNN + RNN Models for Video Understanding: Spatio-temporal Models, Action/Activity Recognition

Week 9:Attention Models:#

Introduction to Attention Models in Vision; Vision and Language: Image Captioning, Visual QA, Visual Dialog; Spatial Transformers; Transformer Networks

Week 10:Deep Generative Models:#

Review of (Popular) Deep Generative Models: GANs, VAEs; Other Generative Models: PixelRNNs, NADE, Normalizing Flows, etc

Week 11:Variants and Applications of Generative Models in Vision:#

Applications: Image Editing, Inpainting, Superresolution, 3D Object Generation, Security; Variants: CycleGANs, Progressive GANs, StackGANs, Pix2Pix, etc

Instructor: Vineeth N Balasubramanian#

Vineeth N. Balasubramanian is an Associate Professor in the department of Computer Science and Engineering at the Indian Institute of Technology, Hyderabad (IIT-H), India. His research interests include deep learning, machine learning, and computer vision. His research has resulted in over 100 peer-reviewed publications at various international venues, including top-tier ones such as ICML, NeurIPS, CVPR, ICCV, KDD, AAAI, etc. His PhD dissertation at Arizona State University on the Conformal Predictions framework was nominated for the Outstanding PhD Dissertation at the Department of Computer Science. For more details, please visit his page, https://iith.ac.in/~vineethnb/

Teaching Assistants#

  • Rishabh Lalla (Research Assistant, Machine Learning and Vision Lab, IIT Hyderabad)

  • Charchit Sharma (Research Assistant, Machine Learning and Vision Lab, IIT Hyderabad)

  • Divyagna Bavikadi (Research Assistant, Machine Learning and Vision Lab, IIT Hyderabad)