Courses:

Automatic Speech Recognition >> Content Detail

Syllabus

Syllabus

Help support MIT OpenCourseWare by shopping at Amazon.com! MIT OpenCourseWare offers direct links to Amazon.com to purchase the books cited in this course. Click on the book titles and purchase the book from Amazon.com, and MIT OpenCourseWare will receive up to 10% of all purchases you make. Your support will enable MIT to continue offering open access to MIT courses.

6.345 Course Information

Description

This course introduces students to the rapidly developing field of automatic speech recognition. Its content is divided into three parts. Part I deals with background material in the acoustic theory of speech production, acoustic-phonetics, and signal representation. Part II describes algorithmic aspects of speech recognition systems including pattern classification, search algorithms, stochastic modelling, and language modelling techniques. Part III compares and contrasts the various approaches to speech recognition, and describes advanced techniques used for acoustic-phonetic modelling, robust speech recognition, speaker adaptation, processing paralinguistic information, speech understanding, and multimodal processing.

Organization

There will be two 90 minute lectures per week. To facilitate the coverage of a large quantity of material, copies of the lecture viewgraphs will be handed out. There will be no final exam for the course. Instead there will be two in-class quizzes each counting approximately 15% towards the final grade.

There will be weekly assignments consisting of both problems and mandatory laboratory work, so that students will be able to gain hands-on experience with the materials covered. Linux workstations will be made available to conduct laboratory work. A sign-up mechanism will be available via the 6.345 web-site to reserve time on these machines. Assignments must be turned in by the due date. Solutions will be provided along with the graded assignments. Each of the nine assignments will count approximately 5% towards the final grade.

During the last quarter of the course, assignments will end, and students will work on a term project that will count approximately 25% towards the final grade. Projects will be chosen in consultation with staff members, and typically involve creating and evaluating a speech recognizer along a dimension of interest to the student. Tool kits of key recognizer components will be provided, so that minimal programming skills are necessary.

Schedule

Lecture:
Two Sessions/Week
1.5 Hours/Session

A detailed outline of the class lectures and assignments is also available.

Staff

Lecturer: Jim Glass

References

Huang, Acero, and Hon. Spoken Language Processing. Upper Saddle River, NJ: Prentice-Hall, 2001. ISBN: 0130226165.

Jelinek. Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998. ISBN: 0262100665.

Rabiner & Juang. Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice-Hall, 1993. ISBN: 0130151572.

Duda, Hart, and Stork. Pattern Classification. New York, NY: Wiley & Sons, 2000. ISBN: 0471056693.

Stevens. Acoustic Phonetics. MIT Press, 1998. ISBN: 0262692503.

Course Home

Courses:

Automatic Speech Recognition >> Content Detail

Table of Contents

Syllabus