Monday, September 5th Opening Session Welcome from the General Chair
Keynote speaker: Graeme M. Clark, Bionic Ear Institute, Melbourne, Australia (ISCA Medalist) Title: The Multiple-Channel Cochlear Implant: Interfacing Electronic Technology to Human Consciousness Fundamental research on electrical stimulation of the auditory pathways resulted in the Multiple Channel Cochlear Implant, a device which provides understanding of speech to severely-to-profoundly deaf people. The device, a miniaturized receiver-stimulator with multiple electrodes fed with power and speech data through two separate aerials was first implanted in a patient in 1978 as a prototype, and since 1982, was commercially produced by Cochlear Limited, Australia. Speech processing is based on the discovery that the sensation at each electrode is vowel-like. Initially, the second formant was coded as a place of stimulation, the sound pressure was coded as a current level, and the voicing frequency as a pulse rate. Further research showed that there were progressively better open-set word and sentence scores for the extraction of the first formant in addition to the second formant (the F0/F1/F2 processor), the addition of high fixed filter outputs (MULTIPEAK) and then finally 6 to 8 maximal filter outputs at low rates (SPEAK) and high rates (ACE). All the frequencies were coded on a place basis. World trials completed for the US FDA on late-deafened adults in 1985 and in 1990 on children from two years to 17 years proved that a 22-channel cochlear implant was safe and effective in enabling them to understand speech both with and without lip-reading.
Graeme Clark Graeme M. Clark was born and raised in Australia. He received the Bachelor of Medicine (MB) and Bachelor of Surgery (BS) degrees in 1957, the Master of Surgery (MS) in 1968 and the Doctor of Philosophy (PhD) in 1969, all from the University of Sydney, Australia. In 1970 he became the foundation Professor of Otolaryngology at the University of Melbourne and he retired in 2004 to become full-time Director of the Bionic Ear Institute, which he established in 1984. After commencing research on electrical stimulation of the auditory pathways in 1967, Graeme Clark systematically initiated and led the fundamental research resulting in the multiple-channel cochlear implant. It is the first major advance in restoring speech perception in tens of thousands of severely-to-profoundly deaf people worldwide and has given spoken language to children born deaf or deafened early in life. Thus, it is the first clinically effective and safe interface between electronic technology and human consciousness. In addition, Clark has played a key role in the development of the Automatic Brainwave Audiometer, the first method for objective accurate measurement of hearing thresholds for low and high frequencies in infants and young children, and the Tickle Talker, a device enabling deaf children to understand speech through electro-tactile stimulation of the nerves of the fingers. Professor Clark holds honorary doctorates of Medicine (Hon. MD), Law (Hon. LLD), Engineering (Hon. DEng), and Science (Hon. DSc) from Australian and international Universities. His has also been made a Fellow of the Australian Academy of Science, a Fellow of the Royal Society of London, and Honorary Fellows of the Royal Society of Medicine, the Royal College of Surgeons of England, and the Australian Acoustic Society. In 2004 he received the Australian Prime Minister�s Prize for Science, Australia�s pre-eminent award in science and technology, and was made a Companion of the Order of Australia, the country�s highest civil honour. In 2005 he received the Award of Excellence in Surgery from the Royal Australasian College of Surgeons, the A. Charles Holland Foundation International Prize in Audiology and Otology, and the Royal College of Surgeons of Edinburgh Medal at the College Quincentenary celebrations.
Tuesday, September 6th Keynote speaker: Fernando Pereira, University of Pennsylvania, USA Title: Linear Models for Structure Prediction
Over the last few years, several groups have been developing models and algorithms for learning to predict the structure of complex data, sequences in particular, that extend well-known linear classification models and algorithms, such as logistic regression, the perceptron algorithm, and support vector machines. These methods combine the advantages of discriminative learning with those of probabilistic generative models like HMMs and probabilistic context-free grammars. I will introduce linear models for structure prediction and their simplest learning algorithms, and exemplify their benefits with applications to text and speech processing, including information extraction, parsing, and language modeling.
Fernando Pereira Fernando C. N. Pereira is the Andrew and Debra Rachleff Professor and chair of the department of Computer and Information Science, University of Pennsylvania. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982. Before joining Penn, he held industrial research and management positions at SRI International, at AT&T Labs, where he led the machine learning and information retrieval research department from September 1995 to April 2000, and at WhizBang Labs, a Web information extraction company. His main research interests are in machine-learnable models of language and other natural sequential data such as biological sequences. His contributions to finite-state models for speech and text processing are in everyday industrial use. He has 80 research publications on computational linguistics, speech recognition, machine learning, bioinformatics, and logic programming, and several issued and pending patents on speech recognition, language processing, and human-computer interfaces. He was elected Fellow of the American Association for Artificial Intelligence in 1991 for his contributions to computational linguistics and logic programming, and he is a past president of the Association for Computational Linguistics.
Wednesday, September 7th
Keynote Speaker: Elizabeth Shriberg, SRI and ICSI, USA Title: Spontaneous Speech: How People Really Talk, and Why Engineers Should Care Most of the speech we produce and comprehend each day is spontaneous. This “speech in the wild” requires no special training, is remarkably efficient, imposes minimal cognitive load, and carries a wealth of information at multiple levels. Spontaneous speech differs, however, from the types of speech for which spoken language technology is often developed. This talk will illustrate some interesting, important and even amusing characteristics of spontaneous speech — including disfluencies, dialog phenomena, turn-taking patterns, emotion, and speaker differences. The talk will overview research in these areas, outline current challenges, and hopefully convince the more technology-minded members of the audience that modeling these aspects of our everyday speech has much to offer for spoken language applications.
Elizabeth Shriberg Elizabeth Shriberg is a Senior Researcher in the speech groups at both SRI International and the International Computer Science Institute. She received a Ph.D. in Cognitive Psychology from U.C. Berkeley (1994) and was an NSF-NATO postdoc at IPO (the Netherlands, 1995). Her main interest is spontaneous speech. Her work aims to combine linguistic knowledge with corpora and techniques from speech and speaker recognition, to advance both scientific understanding and recognition technology. Over the last decade she has led projects on modeling disfluencies, punctuation, dialog, emotion, and speakers, using lexical and prosodic features. She has published over 100 journal and conference papers in speech science, speech technology, and related fields. She serves as an Associate Editor of Language and Speech, on the boards of Speech Communication and other journals, on the ISCA Advisory Council, and on the ICSLP Permanent Council.
Thursday, September 8th Panel: Ubiquitous Speech Processing
Roger Moore Chair: Roger K. Moore, University of Sheffield, UK
Alex Acero Jordan Cohen Paul Dalsgaard Sadaoki Furui
Recent years have seen significant advances in the capabilities of practical speech technology systems. A growing number of ordinary people have either used dictation software to create documents on their own PC, spoken to Interactive Voice Response (IVR) systems or used voice-dialling on their mobile phone. Speech technology applications are certainly becoming more commonplace, but there is arguably someway to go before it could be called ubiquitous. For example, speech-based interaction is not commonly used in the home, at work, at school or on holiday.
In his book The Age of Spiritual Machines (Phoenix Press), Ray Kurzweil predicts that language user interfaces will be ubiquitous by 2009. He also says that the majority of text will be created using continuous speech recognition, there will be listening machines for the deaf and that translating telephones will commonly be used for many language pairs.
A panel of experts drawn from academia and industry will discuss these issues and will address the core question Will speech technology become truly ubiquitous and, if so, what applications will there be and when will it happen?.
Thursday, September 8th Closing Session (pdf) FADO Explained and Performed In this show, Teresa Machado and Daniel Gouveia share with us singing and instrumental examples of the typical Lisbon-Portugal urban song named Fado, its different styles and features, ways of singing, poetic themes and historical evolution.
Informally and in dialogue with the audience, they sing lyrics that have been previously translated. The historical evolution of the Fado is briefly presented, the differences between Traditional and Modern Fado are explained, the importance of improvisation is pointed out, the original Fados (Menor, Mouraria, Corrido) are sung, among many others. They stress the importance of the Portuguese Guitar by showing the instrument and demonstrating its technique through playing an instrumental virtuoso piece.