Large-Scale Multimedia Analysis / Fall 2024
Updates
- New Assignment released: [Assignment #3 - Sequence Models]
- New Assignment released: [Assignment #2 - Computer Vision]
- New Lab is up: Neural Networks Lab [code]
- New Lecture is up: Neural Networks
- New Assignment released: [Assignment #1 - Information Retrieval]
- New Lab is up: Sound and Audio Lab [code]
- New Lecture is up: Sound and Audio
Course Description
Can a robot watch Youtube to learn about the world? What makes us laugh? How to bake a cake? Why is Kim Kardashian famous? Large-scale multi-media is an incomparable window into our world, with thousands of hours of data available on almost every aspect of our everyday life. The analysis of such data is a unique opportunity to perform deep multi-modal analysis that goes beyond image or video retrieval, speech to text, or other existing tasks. This is a 12-unit class or lab covering fundamentals of large-scale computer vision, audio and speech processing, multi-media files and streaming, multi-modal signal processing, video retrieval, semantics, and text (possibly also: speech, music) generation.
Lectures
Dates and Time: 17:00-18:20, Mondays and Wednesdays
Location: Posner Hall 145
Material: Canvas, Piazza
Grading Policy:
- 3 Homeworks (30%)
- 1 Final Project (60%)
- 10 Weekly Quizzes (10%)
- Attendance penalty for missing lectures without a permission (-0.25% per lecture)
- Bonus for everyone if more than 80% of the class completed the final course survey (+1%)
Previous Offerings
Instructors
GHC 6703