Large-Scale Multimedia Analysis / Fall 2024


Course Description

Can a robot watch Youtube to learn about the world? What makes us laugh? How to bake a cake? Why is Kim Kardashian famous? Large-scale multi-media is an incomparable window into our world, with thousands of hours of data available on almost every aspect of our everyday life. The analysis of such data is a unique opportunity to perform deep multi-modal analysis that goes beyond image or video retrieval, speech to text, or other existing tasks. This is a 12-unit class or lab covering fundamentals of large-scale computer vision, audio and speech processing, multi-media files and streaming, multi-modal signal processing, video retrieval, semantics, and text (possibly also: speech, music) generation.


Dates and Time: 17:00-18:20, Mondays and Wednesdays

Location: Posner Hall 145

Material: Canvas, Piazza

Grading Policy:

  • 3 Homeworks (30%)
  • 1 Final Project (60%)
  • 10 Weekly Quizzes (10%)
  • Attendance penalty for missing lectures without a permission (-0.25% per lecture)
  • Bonus for everyone if more than 80% of the class completed the final course survey (+1%)

Previous Offerings


Rita Singh

GHC 6703

Teaching Assistants

Kashu Yamazaki

GHC 6413