Seminar Audio Processing and Indexing

(Last updated: 1 - 12  2011)

Contents

 

 



Period: September 8th - December 23rd

Time:   Thursday 15.45 – 17.30

Place:  LIACS, Room 403

 

Organizers:

 

Lecturer:

Dr Erwin M. Bakker ( erwin@liacs.nl )

Room 147 and LIACS Media Lab (LML)

 

Teaching assistant:

Drs Xiaomeng Li

 

NB E-mail your name and student number to erwin@liacs.nl

 

Abstract:

During this seminar the fundamentals of audio processing and indexing will be studied. Applications in the area of speech recognition, audio synthesis and content based audio retrieval will be discussed. State of the art work on content based audio retrieval will be studied and presented by the participants.

The seminar starts with several lectures and accompanying assignments in the form of workshops; followed by a literature selection, study, and presentations by all the students; the seminar ends with final project demos / presentations.

Requirements: C, C++

Grading (6 ECTS): Presentations and Project (60% of grade). Class discussions, attendance, and workshops (40% of grade). It is necessary to be at every class. If you can not be there, you must contact Dr. E.M. Bakker before class!

Materials:

 

Lecture slides and further materials will be made available on this site.

 

List of recommended books:

 

Discrete-Time Speech Signal Processing, Principles and Practice by T.F. Quatieri, Prentice Hall PTR; ISBN 013242942, 2002.

 

Fundamentals of Speech Recognition by Lawrence Rabiner, and Biing-Hwang Juang (Hardcover, 507 pages; Publisher: Pearson Education POD; ISBN: 0130151572; 1st edition, April 12, 1993)

 

Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang , Alex Acero , Hsiao-Wuen Hon , Raj Reddy (Hardcover, 980 pages; Publisher: Prentice Hall PTR; ISBN: 0130226165; 1st edition, April 25, 2001) 

 

Speech Recognition: Theory and C++ Implementation by Claudio Bechetti and Lucio Prina Ricotti (Hardcover, 407 pages; Publisher: John Wiley & Sons; ISBN: 0471977306; 1st edition April, 1999)

Links

 

 

Schedule (tentative):
8-9  Organization and Introduction.
15-9  Audio Production and Processing; Vocal  Tract  Workshop (see assignments)
22-9  ADC and  an Algebraic Introduction to FT pg8
29-9  FFT and FFT Workshop: see Assignment 2
6-10  Filter Workshop: See Assignment 3
13-10  Audio Features; Project Discussion;
20-10  Machine Learning; Project Proposals
27-10  Audio Indexing (Student Presentations)
3-11  Audio Indexing (Student Presentations)       
10-11  Audio Indexing (Student Presentations)
17-11  Audio Indexing (Student Presentations)
24-11  Audio Indexing (Student Presentations) &  Workshop & Progress Reports
1-12  Audio Indexing (Student Presentations)
15-12  Final Project Presentations / Demo
23-12  Final Technical Project Paper and Site

Assignments

  1. Vocal Tract Workshop
  2. FFT Workshop Code and data, VoiceBox
  3. Filter Workshop data
  4. Audio Indexing Workshop (provided by e-mai)
Project Links

Previous Project Links

Please note: some projects have not been finished.

Presentations

During the seminar we will study state-of-the-art audio indexing methods and techniques using recent scientific publications from international journals, workshops, and conferences on content based audio retrieval.

There will be two kinds of presentations. Each student will perform both kinds of presentations. The first kind of presentation is a technical paper presentation:

  • Each student will have to propose a list of 3 very recent scientific publications on audio processing and/or indexing.

  • Each student will select 1 paper of their proposed list of papers, study the paper into great detail, and present it during a 20 minutes talk for a critical audience.

  • In order to ensure a critical audience, every student in class is expected to have read the papers to be presented in detail. Also every student is expected to have prepared several questions to the speaker.

The second type of presentation is the survey / state of the art presentation:

  • Each student selects a subject in the field of audio processing and indexing of his/her choice.

  • Each student will perform a literature study on this subject, defines the state of the art, and selects the paper best representing the state of the art.

  • Each student will present an introduction to the subject (i.e., problem definition), an overview / survey of methods and solutions, the state of the art based on the selected paper.

Projects

During the seminar each student has to do a project related to audio processing/synthesis/indexing.

The agenda for the projects is as follows:

  1. Project proposal presentation using the website for the project. It has to address:

    • Title of the proposal.

    • Reference(s) of the paper(s).

    • A short description of the problem(s) to be solved.

    • The state of the art with respect to these problems.

    • A (realistic) goal of the proposed project.

  2. Project status reports.

  3. Final project presentation / demo.

  4. Final technical project paper.

Project Web Pages

Every student has to maintain a project web page on which progress, documents, code, links, etc. related to the project are maintained. Here you can find an example project page. Feel free to design your own project web page though. Do not forget to mail me the link to your project page.

Journals

You have electronic access to all of the listed journals by using your ULCN-account (for further details see http://www.bibliotheek.leidenuniv.nl/ ):