Large-Scale Multimedia Analysis / Fall 2024

Updates

New Assignment released: [Assignment #3 - Sequence Models]
New Assignment released: [Assignment #2 - Computer Vision]
New Lab is up: Neural Networks Lab [code]
New Lecture is up: Neural Networks
New Assignment released: [Assignment #1 - Information Retrieval]
New Lab is up: Sound and Audio Lab [code]
New Lecture is up: Sound and Audio

Course Description

Can a robot watch Youtube to learn about the world? What makes us laugh? How to bake a cake? Why is Kim Kardashian famous? Large-scale multi-media is an incomparable window into our world, with thousands of hours of data available on almost every aspect of our everyday life. The analysis of such data is a unique opportunity to perform deep multi-modal analysis that goes beyond image or video retrieval, speech to text, or other existing tasks. This is a 12-unit class or lab covering fundamentals of large-scale computer vision, audio and speech processing, multi-media files and streaming, multi-modal signal processing, video retrieval, semantics, and text (possibly also: speech, music) generation.

Lectures

Dates and Time: 17:00-18:20, Mondays and Wednesdays

Location: Posner Hall 145

Material: Canvas, Piazza

Grading Policy:

3 Homeworks (30%)
1 Final Project (60%)
10 Weekly Quizzes (10%)
Attendance penalty for missing lectures without a permission (-0.25% per lecture)
Bonus for everyone if more than 80% of the class completed the final course survey (+1%)

Previous Offerings

Fall 2022

Instructors

Rita Singh

GHC 6703

Teaching Assistants

Kashu Yamazaki

GHC 6413

Sunghwan Baek