How we did it:
For any feedback, any questions, any notes or just for chat - feel free to follow us on social networks
Lawrence R. Rabiner, Biing-Hwang Juang
Provides a theoretically sound, technically accurate, and complete description of the basic knowledge and ideas that constitute a modern system for speech recognition by machine. Covers production, perception, and acoustic-phonetic characterization of the speech signal; signal processing and analysis methods for speech recognition; pattern comparison techniques; speech recognition system design and implementation; theory and implementation of hidden Markov models; speech recognition based on connected word models; large vocabulary continuous speech recognition; and task- oriented application of automatic speech recognition. For practicing engineers, scientists, linguists, and programmers interested in speech recognition.
Xuedong Huang, Alejandro Acero, Hsiao-Wuen Hon
Preface Our primary motivation in writing this book is to share our working experience to bridge the gap between the knowledge of industry gurus and newcomers to the spoken language processing community. Many powerful techniques hide in conference proceedings and academic papers for years before becoming widely recognized by the research community or the industry. We spent many years pursuing spoken language technology research at Carnegie Mellon University before we started spoken language R&D at Microsoft. We fully understand that it is by no means a small undertaking to transfer a state-of-the-art spoken language research system into a commercially viable product that can truly help people improve their productivity. Our experience in both industry and academia is reflected in the context of this book, which presents a contemporary and comprehensive description of both theoretic and practical issues in spoken language processing. This book is intended for people of diverse academic and practical backgrounds. Speech scientists, computer scientists, linguists, engineers, physicists, and psychologists all have a unique perspective on spoken language processing. This book will be useful to all of these special interest groups. Spoken language processing is a diverse subject that relies on knowledge of many levels, including acoustics, phonology, phonetics, linguistics, semantics, pragmatics, and discourse. The diverse nature of spoken language processing requires knowledge in computer science, electrical engineering, mathematics, syntax, and psychology. There are a number of excellent books on the subfields of spoken language processing, including speech recognition, text-to-speech conversion, and spoken language understanding, but there is no single book that covers both theoretical and practical aspects of these subfields and spoken language interface design. We devote many chapters systematically introducing fundamental theories needed to understand how speech recognition, text-to-speech synthesis, and spoken language understanding work. Even more important is the fact that the book highlights what works well in practice, which is invaluable if you want to build a practical speech recognizer, a practical text-to-speech synthesizer, or a practical spoken language system. Using numerous real examples in developing Microsoft's spoken language systems, we concentrate on showing how the fundamental theories can be applied to solve real problems in spoken language processing.
This book reflects decades of important research on the mathematical foundations of speech recognition. It focuses on underlying statistical techniques such as hidden Markov models, decision trees, the expectation-maximization algorithm, information theoretic goodness criteria, maximum entropy probability estimation, parameter and data clustering, and smoothing of probability distributions. The author's goal is to present these principles clearly in the simplest setting, to show the advantages of self-organization from real data, and to enable the reader to apply the techniques.
Michael H. Cohen, Michael Harris Cohen, James P. Giangola, Jennifer Balogh
This book is a comprehensive and authoritative guide to voice user interface (VUI) design. The VUI is perhaps the most critical factor in the success of any automated speech recognition (ASR) system, determining whether the user experience will be satisfying or frustrating, or even whether the customer will remain one. This book describes a practical methodology for creating an effective VUI design. The methodology is scientifically based on principles in linguistics, psychology, and language technology, and is illustrated here by examples drawn from the authors' work at Nuance Communications, the market leader in ASR development and deployment. The book begins with an overview of VUI design issues and a description of the technology. The authors then introduce the major phases of their methodology. They first show how to specify requirements and make high-level design decisions during the definition phase. They next cover, in great detail, the design phase, with clear explanations and demonstrations of each design principle and its real-world applications. Finally, they examine problems unique to VUI design in system development, testing, and tuning. Key principles are illustrated with a running sample application. A companion Web site provides audio clips for each example: www.VUIDesign.org The cover photograph depicts the first ASR system, Radio Rex: a toy dog who sits in his house until the sound of his name calls him out. Produced in 1911, Rex was among the few commercial successes in earlier days of speech recognition. Voice User Interface Design reveals the design principles and practices that produce commercial success in an era when effective ASRs are not toys but competitive necessities.
John E. Sarno
Dr. John E. Sarno's Healing Back Pain is a New York Times bestseller that has helped over 500,000 readers. Continuing the research since his ground-breaking book, the renowned physician now presents his most complete work yet on the vital connection between mental and bodily health.... Musculoskeletal pain disorders have reached epidemic proportions in the United States, with most doctors failing to recognize their underlying cause. In this acclaimed volume, Dr. Sarno reveals how many painful conditions-including most neck and back pain, migraine, repetitive stress injuries, whiplash, and tendonitises-are rooted in repressed emotions...and shows how they can be successfully treated without drugs, physical measures, or surgery. His innovative program has already produced gratifying results for thousands of patients. The Mindbody Prescription is your invaluable key to a healthy and pain-free life.
Claudio Becchetti, Lucio Prina Ricotti
Automatic Speech Recognition (ASR) is the enabling technology for hands-free dictation and voice-triggered computer menus. It is becoming increasingly prevalent in environments such as private telephone exchanges and real-time information services. Speech Recognition introduces the principles of ASR systems, including the theory and implementation issues behind multi-speaker continuous speech recognition. Focusing on the algorithms employed in commercial and laboratory systems, the treatment enables the reader to devise practical solutions for ASR system problems. It addresses in detail C++ programming techniques used to develop ASR applications, thus offering skills that will prove useful in any large C++ based software project. Possible extensions of the well-established ASR technology are highlighted, based on "Hidden Markov Models" applied to fields such as modelling and prediction of econometric series. Features include: * Accompanying website containing all C++ source code of a complete laboratory multi-speaker continuous-speech ASR system (e.g. Initialisation, Training, Recognition, Evaluation, etc.) www.wiley.com/go/becchetti_speech * Detailed theoretical, mathematical and technical explanations of ASR * A practical account of the functioning of ASR A crucial source of information for researchers, developers and project managers involved with ASR systems, Speech Recognition is also structured for use by students of digital signal processing, speech recognition and C++ programming techniques.