Image and Video Generations

The University of Texas at Austin
Computer Science Department

Computer Science 378
Generative Visual Computing

Spring 2025


General Information:

Time: Mondays and Wednesdays 9:30AM-11:00AM,
Place: GDC 1304
Instructor: Qixing Huang
TA: Jiaxin Lu
Office hour: Mondays and Wednesdays 11:00am - 12:00pm at GDC 7802.

Introduction:

This is an intro course in image and video generative models. It is intended for upper-level undergraduate students.

Visual generative models is an active field that deals with representations and algorithms that generate visual contents in the form of images, videos, animations, and 3D content. This is an active and fast evolving field across industrial and scientific domains. It builds on image representations, deep learning, and algrotihms for generative modeling. Our goal is to teach basics from scratch towards writing programs to generate visual contents. Our goal is to cover the gap between current needs in industry and what are taught in related courses such as computer vision, computer graphics, and machine learning. This is self-contained class but good knowledge of algorithms, linear algebra, and probability theory are required.

After covering the fundamentals for computer vision and deep learning, we will emphasize generative models from GAN, Auto-regressive models, VAE, Diffusion Models, and Normalizing Flows. We emphasize machine learning basics and applications in image/animation/video generation.

Prereqs:

Basic knowledge of probability and linear algebra; data structures, algorithms; programming experience. Previous experience with image processing will be useful but is not assumed.

Assignments will consist largely of Pytorch or Matlab programming problems. There will be a warm-up assignment to get familiar with basic Pytorch/Matlab commands. We will recommend useful functions to check out per assignment. However, students are expected to practice and pick up Pytorch on their own in order to complete the assignments. The instructor and TAs are happy to help with Pytorch issues during office hours.

If you are unsure if your background is a good match for this course, please come talk to the instructor.

Textbooks:

The main textbook we will use is the following:

  • Generative Deep Learning: Teaching Machines To Paint, Write, Compose, and Play (2nd Edition), David Foster
  • This book covers GAN, VAE, and Diffusion Models we teach in the lectures. It is can be downloaded online.

    The following books are also useful.

  • Computer Vision: Algorithms and Applications, by Rick Szeliski.
  • It is covers basics of image filtering, which we teach for the first few lectures. It also covers texture synthesis.

  • Generative Adversarial Networks for Image Generation, Xudong Mao and Qing Li
  • This book has good material on GAN.

  • Generative AI with Python and TensorFlow 2: Create images, text, and music with VAEs, GANs, LSTMs, Transformer models, Joseph Babcock and Raghav Bali
  • This book covers coding on training genartive models using Python and TensorFlow.

  • Hands-On Image Generation with TensorFlow: A practical guide to generating images and videos using deep learning, Soon Yau Cheong
  • Another book that covers coding in TensorFlow.


    Schedule

    Date Topics Reading Notes
    January 13th Introduction A0 out, due Tues Jan 22th
    January 15th Computer Vision Basics I (Filtering & Smoothing)
    January 22th Computer Vision Basics II (Feature Detection)
    January 27th Texture Synthesis A1 out, due Tues Feb 10th
    January 29th Deep Learning Basics I (Convets)
    February 3th Deep Learning Basics II (Recurrent, Attention, Transformers)
    February 5th Generative Adversarial Networks
    February 10th Generative Adversarial Networks II (Recent advances) A2 out, due Tues Feb 24th
    February 12th Generative Adversarial Networks III (Conditional Generation)
    February 17th Auto-regressive Models I
    February 19th Auto-regressive Models II
    February 24th Variational auto-encoder I (theory)
    February 26th Variational auto-encoder II (applications) A3 out, due Tues March 24th
    March 3rd Diffusion methods I (theory)
    March 5th Normalizing Flows Practice midterm handout before midnight on Canvas.
    Mar. 10th Diffusion methods II (Latent diffusion, stable diffusion)
    March 12th In-class mid-term
    March 24th Diffusion methods III (sampling)
    March 26th Diffusion methods IV (classifier-guidance/classifer-free) A4 out, due Wednesday April 23th
    March 31th Diffusion methods V (conditional generation)
    April 2nd Diffusion methods VI (flow matching/rectified flow)
    April 7th Applications (Image editing)
    April 9th Applications (training data generation)
    April 14th Video generation I
    April 16th Video generation II
    April 21th Video generation III
    April 23th 3D Generation
    April 28th In Class Final Exam


    Course requirements:

    Assignments: Assignments will be given approximately every two to four weeks (to accomodate midterm and final term preparations). The programming problems will provide hands-on experience working with techniques covered in or related to the lectures. All code and written responses must be completed individually. Most assignments will take significant time to complete. Please start early, and use Ed and/or see us during office hours for help if needed. Please follow instructions in each assignment carefully regarding what to submit and how to submit it.

    Extension policy: If you turn in your assignment late, expect points to be deducted. Extensions will be considered on a case-by-case basis, but in most cases they will not be granted. The greater the advance notice of a need for an extension, the greater the likelihood of leniency. For programming assignments, by default, 10 points (out of 100) will be deducted for lateness for each day late. We will use the submission program timestamp to determine time of submission. One day late = from 1 minute to 24 hours past the deadline. Two days late = from 24 hours and 1 minute to 48 hours past the deadline. We will not accept assignments more than 4 days late, or once solutions have been discussed in class, whichever is sooner.

    Exams: There is an in-class midterm and an in-class final exam. Both exams will be offered at the listed time only. Neither exam will be offered at a different time to accommodate personal travel plans, internship start dates, interviews, etc.

    Participation/attendance: Regular attendance is expected. If for whatever reason you are absent, it is your responsibility to find out what you missed that day. Note that attendance does factor into the final grade. (See Section II of the UTCS Code of Conduct regarding attendance expectations.)

    General responsibilities: Beyond the above, your responsibilities in the class are:

  • Come to lecture on time.
  • Check the class webpage for assignment files, notes, announcements etc.
  • Use Ed for class-related discussion and assignment help (no spoilers, please!).
  • Complete the readings prior to lecture. The reading assignments listed on the schedule should be read before the associated class lecture.
  • Please do not use a laptop, cell phone, tablet, etc. during class.
  • Please read and follow the UTCS Code of Conduct.

  • Important Dates

    Please note the following important dates and deadlines.

  • A0 due Jan 22
  • A1 due Feb 10
  • A2 due Feb 14
  • Midterm exam Mar 12 (in class)
  • A3 due Mar 24
  • A4 due April 23
  • Final exam April 28
  • Assignments are due about every two or four weeks. The assignment deadlines above are tentative and are provided to help your planning. They are subject to minor shifts if the lecture plan needs to be adjusted slightly according to our pace in class.