Seminar Speech Recognition 2008

In this seminar all aspects of a state of the art speech recognition systems will be studied. The participation and contributions of the students is of great importance. There will be several assignments, reports, and presentations by the students.

Requirements: C, C++

Grading (7 ECTS): Design, Programming and Documentation for your speech recognition project plus further presentations (60% of grade). Class discussions, attendance, and problem sets (40% of grade). It is necessary to be at every class. If you can not be there, you must contact Dr. E.M. Bakker before class!

Materials (Mandatory):

Fundamentals of Speech Recognition by Lawrence Rabiner, and Biing-Hwang Juang (Hardcover, 507 pages; Publisher: Pearson Education POD; ISBN: 0130151572; 1st edition, April 12, 1993)

List of recommended books:

Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang , Alex Acero , Hsiao-Wuen Hon , Raj Reddy (Hardcover, 980 pages; Publisher: Prentice Hall PTR; ISBN: 0130226165; 1st edition, April 25, 2001)

Speech Recognition: Theory and C++ Implementation by Claudio Bechetti and Lucio Prina Ricotti (Hardcover, 407 pages; Publisher: John Wiley & Sons; ISBN: 0471977306; 1st edition April, 1999)

Links

VoiceXML

Schedule (Tentative!):

13 - 2: Introduction Speech Recognition I (pdf)
20 - 2: Introduction Speech Recognition II
27 - 2: Presentations Project Proposals
5 - 3: Speech Corpora
12 - 3: Speech Preprocessing
19 - 3: Projects: Presentations
26 - 3: Speech Feature Extraction I
2 - 4: Speech Feature Extraction II
9 - 4: Projects: Solutions
16 - 4: Audio Indexing I
23 - 4: Audio Indexing II
7 - 5: To be announced
14-5: Final Project Demos.

Assignments

To be announced.

Presentations

Every student has to present two 20 - 30 minute lectures. The first presentation will be on the state of the art with respect to the subject of the selected project. The second presentation will be on a solution for his/her selected project from the research literature. The students have to rehearse their lecture beforehand with the organizers. In general, this will be scheduled on Tuesday at 12.30, 1 day before the actual class.

Articles have to be made available by the student at least three days before the presentation. All other students have to read the articles and prepare at least two questions for the speaker.

Projects

During the seminar students have to do a project related to speech recognition. The project has to be based on problems identified in recent speech recognition research. The research publications can be found in the journals listed under Project Literature.

The agenda for the projects is as follows:

Teams of 2 -3 students has to be formed.
Proposals: Each member of the team has to propose a different subject based on a recent research publication. The (short) presentation of the proposal has to address:
- Title of the proposal
- Reference of the article
- A short description of the problem(s) to be solved.
- The state of the art with respect to these problems.
- A (realistic) goal of the proposed project.
Proposal selection by the organizers.
Projects: State of the Art
Projects: Solutions
Final project demos.

Teams of 2 - 3 students have to propose a project

Project Web Pages

Every team has to maintain a project web page on which progress, documents, code, links, etc. related to the project are maintained. Here you can find the example project page. Do not forget to mail me the link to your project page.

Project Literature

A couple of journals that can be used to find a suitable problem for the speech recognition projects. We have electronic access to all of these (to access these journals use (1) your U-LIP account or (2) any computer within the university as these electronic sites use IP access checking). If you have problems accessing one of these journals, please contact the teaching assistant.

To efficiently search articles on a certain topic go to:
http://www.sciencedirect.com, click "Search", and type for example: Speech recognition (in first field) and review (second field) Or go to: www.ieee.org or www.acm.org and search for "speech recognition" or "speech synthesis". Many useful links can be found here.

Some review articles about problems or overviews of specific programs related to SR and technologies that can be found in the list of journals above when searching for "speech recognition" and "review":

Speech recognition by machines and humans
Speech recognition in adverse environments
Speech recognition algorithms for voice control interface.
Use of Speech Recognition Software: A Vocal Endurance Test for the New Millennium
Statistical language model adaptation: review and perspectives
Recent advances in the multi-stream HMM/ANN hybrid approach to noise
robust ASR
Speech recognition with unknown partial feature corruption – a review of the union model
A survey of hybrid ANN/HMM models for automatic speech recognition
The SPHINX-II speech recognition system: an overview
Speech recognition evaluation: a review of the U.S. CSR and LVCSR programmes

Some cool but more specific ones:

Robust speech recognition based on adaptive classification and decision strategies
Towards situated speech understanding: visual context priming of language models