Boulder Language Technologies

  • Narrow screen resolution
  • Wide screen resolution
  • Auto width resolution
  • Decrease font size
  • Default font size
  • Increase font size
Home arrow BLT's Roots arrow Spoken Dialog Systems
Spoken Dialog and Information Retrieval Systems PDF Print E-mail

Spoken Dialogs for Enabling Spoken Dialogs with Virtual Humans

CSLR is developing a free toolkit to support research and development of programs that incorporate spoken dialog interaction with lifelike computer characters.

SONIC: Large Vocabulary Speech Recognition System

SONIC is a large vocabulary speech recognition system used in a number of research projects at CSLR. The software is being made available for non-commercial use. Executables are provided for Linux (kernel 2.2/2.4), MS Windows, Sun Solaris, and Mac OS X. For more information about speech recognition research at CSLR and the SONIC LVCSR system (click here )

AQUAINT

This project is funded by ARDA and is a collaboration between CSLR and Columbia University. We are developing an integrated system for answering complex questions; these are questions that require interacting with the user to refine and clarify the context of the question, whose answer may be located in non­homogenous databases of speech and text, and for which presenting the answer requires combining and summarizing information from multiple sources and over time. We are developing an integrated statistical and semantic approach by integrating multiple models of the information in the text at the word, clause, event, and document level. We will automatically derive links between related events and descriptions and associate questions with multiple answers even when the answer uses very different terms than those in the question. Generating a satisfactory answer to complex questions requires the ability to collect all relevant answers from multiple documents in different media, intelligently weigh their relative importance, and generate a coherent summary of the multiple facts and opinions reported. We are investigating new paradigms for the organization and presentation of information, including a detailed, yet efficiently computable semantic model, and an event­based model for organizing and linking parts of documents. This will be coupled with innovations in the processing of spoken material, modeling of the context for handling connected questions and answers (such as follow­ups and clarifications) within a session, and summarization technology for combining hundreds or thousands of potentially related text pieces while eliminating redundancies and identifying conflicts and contradictions.

CU Communicator Project

CU Communicator is a conversational dialogue system that interacts with users over the telephone to provide information on air travel, hotel and rental car reservations. This system serves as a test bed for our research to improve the state of the art in spoken dialogue systems. This project is funded by the Defense Advanced Research Projects Agency (DARPA). For more information on the CU Communicator Project, click here .

CU Move: DARPA In-Vehicle Automatic Speech Recognition & Navigation System

The goal of the CU-Move project is to develop algorithms and technology for robust access to information via spoken dialog systems in mobile, hands free environments. The novel aspects include the formulation of a new microphone array and multi-channel noise suppression front-end, corpus development for speech and acoustic vehicle conditions, environmental classification for changing in-vehicle noise conditions, and a back-end dialog navigation information retrieval sub-system connected to the WWW. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. System advances include intelligent microphone arrays, auditory and speaker based constrained speech enhancement methods, environmental noise characterization, and speech recognizer model adaptation methods for changing acoustic conditions in the car. Our initial prototype system allows users to get driving directions for the Boulder area via a hands free cell phone, while driving in a car. For more information on the CU Move Project, click here .

 
CU Communicator Spoken Dialog System PDF Print E-mail

Project Overview:

The University of Colorado (CU) Communicator is an over-the-phone spoken dialog system for accessing live airline, hotel, and car rental availability and pricing. Users can dial up the system using a telephone and use natural language to interact with our automated travel agent. The agent then accesses the internet for flight, hotel, and car rental information. 

The Communicator system, sponsored by DARPA, is our initial testbed for research leading to advanced dialog systems enabling robust and graceful human computer interaction. It is a Hub compliant system for the DARPA Communicator task, and was demonstrated at the DARPA workshop in June 1999.  Robustness and portability of spoken dialog systems are two of the issues we attempt to address in the project. We use robust parsing and dialog control strategies to be as flexible as possible to user  variance. In order to make the systems easier to develop, we have adopted a largely declarative representation where the bulk of the domain specific information is provided in external files.

Project Members:

Wayne Ward
Dialog Management & Natural Language Parsing
Bryan Pellom
Speech Recognition, Synthesis & Natural Language Generation
Kadri Hacioglu
Language Modeling & Speech Recognition
Sameer Pradhan
System Integration, Dialog Grammar Development

The Communicator Task and Research Approach:

In April 1999, the University of Colorado speech group began development of the CU Communicator system, a Hub-compliant implementation of the DARPA Communicator task. The system combines continuous speech recognition, natural language understanding and flexible dialog control to enable natural conversational interaction by telephone callers to access information about airline flights, hotels and rental cars. In June 1999, the system was fully functional, and was demonstrated at the June 1999 DARPA Communicator meeting. This system connects live over the web to get real up-to-date air travel, hotel and rental car information. Users call the system on the telephone and make travel plans for a modeled set of cities. We are using the system as a testbed for conversational system technology development. For this system, we developed a new Dialog Manager using an event driven strategy. Our natural language and dialog specification for the task is very much declarative.

Our approach to robustness in natural language understanding is to use robust parsing strategies and event driven dialog strategies. These strategies stress semantic content and coherence over syntactic form. We have extended the Phoenix system as the basis for our robust parsing. This system uses a hierarchical frame parse representation. Our "event driven" dialog manager does not use an explicit script. The developer creates a set of hierarchical forms, representing the information that the system and user interact about. Prompts are associated with fields in the forms. The dialog manager decides what to do next from the current system context, not from a script. Our natural language and dialog specification for the task is very much declarative.

The CU Communicator uses a DARPA Hub compliant architecture, which means that the system is composed of a number of servers that interact with each other through the DARPA Hub. Our system is composed of the Hub and eight servers: 
Audio Server receives signals from telephone and sends to recognizer.  Sends synthesized speech to telephone
Speech Recognizer Takes signals from audio server and produces a word lattice
Semantic Parser Takes word lattice from recognizer and  produces the  best interpretation
Dialogue Manager Resolves ambiguities in the interpretation, estimates confidence in  the extracted information, clarifies with user if required, integrates with current dialog context, builds database queries (SQL), sends data to NL generation for presentation to user, prompts user for  information
Database / Backend Receives SQL queries from Dialog Manager, interfaces to SQL database and returns data,  Retrieves live data from the web.
Language Generator Receives speech act and data from the Dialog Manager and generates text to be spoken by the Text-to-Speech Synthesizer
Speech Synthesizer Receives word strings from NL generation, synthesizes them to be sent to the audio server.
User Profiler Allows users to enter preferences profile which is used by the Dialog Manager to supply default values for airlines, hotels, cars, etc.

Servers do not interact directly with each other, but pass frames to the hub. The hub acts as a router which send the frames to the receiving server. A simple script controls hub actions.

Our approach is to conduct research in the context of working systems. We build systems for specific application and evaluate new algorithms in the live system. Within the general paradigm this project will involve three specific research issues:

  • A flexible integrated architecture - Currently, most speech systems combine knowledge sources in rigid and inconsistent ways. Acoustic scores and language model scores are combined by fixed-weight interpolation. A recognizer forms a set or graph of hypotheses which is passed to the natural language system. This system then uses an ad-hoc combination of the score from the recognizer and the parse result to find the best hypothesis. This hypothesis is then passed to a dialogue manager. Each module is optimized individually and scores are combined in a fixed way, which has been set to optimize performance on a development test set. The combination of sources is fixed and unable to respond to the reliability of the knowledge sources in a specific situation. Pruning decisions are made at earlier stages without the benefits of all of the knowledge sources. We propose to use decision trees to dynamically combine knowledge sources to assign scores to hypotheses. These scores would reflect knowledge from all sources concurrently, rather than be applied serially (which generates sub-optimal pruning).
  • Robust Natural Language Understanding - Our approach to robustness in natural language understanding is to use robust parsing strategies and event driven dialogue strategies. These strategies stress semantic content over syntactic form. We are extending the Phoenix system as the basis for our robust parsing. This system uses a hierarchical frame parse representation. Our "event driven" dialogue manager does not use an explicit script. The developer creates a set of hierarchical forms, representing the information that the system and user interact about. Prompts are associated with fields in the forms. The dialogue manager decides what to do next from the current system context, not from a script. This architecture is similar in spirit to a production system.
  • Authoring and portability - Building a spoken dialogue system is labor intensive and requires a great deal of expertise. A major focus of our research is to develop tools to facilitate rapid development of spoken dialogue applications by developers who are not speech and NL experts. Our initial steps will be: a) Declarative representation - To the extent possible, we are representing natural language and dialogue information in external files (like grammars) rather than having the developer write C code. b) Libraries - We are creating a library of common grammars that would be applicable to many tasks (like times, dates, numbers). Developers can use these grammars rather than writing their own. We are also developing a library of functions which will map these to a canonical form (like date to mm-dd-yy and time to 24 hour, etc). These libraries will be distributed with the natural language system.
  • Related Publications:

  • Kadri Hacioglu, Wayne Ward , "Figure of Merit for the Analysis of Spoken Dialogg Systems" ICSLP'2002, Denver-Colorado, September 2002. Full Article  
  • Sameer Pradhan, Wayne Ward, "pdf Estimating Semantic Confidence for Spoken Dialogue Systems" ICASSP'2002, Orlando-Florida, May 2002.
  • Kadri Hacioglu, Wayne Ward , "pdf Does Confidence Annotation Meet the Dialog Goal? A Quantitative Analysis" HLT'2002, San Diego-California, March 2002.
  • Kadri Hacioglu, Wayne Ward, "pdf A Concept Graph Based Confidence Measure" to appear in ICASSP'2002, Orlando-Florida, May 2002. 
  • Jianping Zhang, Wayne Ward and Bryan Pellom, "pdf Phone Based Voice Activity Detection Using Online Bayesian Adaptation with Conjugate Normal Distributions," to appear in ICASSP'2002, Orlando Florida, May 2002.
  • Kadri Hacioglu and Wayne Ward, "pdf A Word Graph Interface for a Flexible Concept Based Speech Understanding Framework", Eurospeech 2001, Aalborg Denmark, Sept. 2001. 
  • Jianping Zhang, Wayne Ward, Bryan Pellom, Xiuyang Yu, Kadri Hacioglu, "pdf Improvements in Audio Processing and Language Modeling in the CU Communicator", Eurospeech 2001, Aalborg Denmark, Sept. 2001.
  • K. Hacioglu, W. Ward, "pdf Combining Language Models: Oracle Approach", Human Language Technology Conference (HLT-2001) San Diego, March 2001. 
  • B. Pellom, W. Ward, J. Hansen, K. Hacioglu, J. Zhang, X. Yu, S. Pradhan, "pdf University of Colorado Dialog Systems for Travel and Navigation", Human Language Technology Conference (HLT-2001) San Diego, March 2001. 
  • K. Hacioglu, W. Ward, "pdf Dialog-Context Dependent Language Modeling Using N-Grams and Stochastic Context-Free Grammars", to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-2001), Salt Lake City, May 2001.
  • R. San-Segundo, B. Pellom, K. Hacioglu, W. Ward, J.M. Pardo, "pdf Confidence Measures for Dialogue Systems," to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-2001), Salt Lake City, May 2001.
  • B. Pellom, W. Ward, S. Pradhan, "pdf The CU Communicator: An Architecture for Dialogue Systems", International Conference on Spoken Language Processing (ICSLP), Beijing China, November 2000. 
  • W. Ward, B. Pellom , K. Hacioglu, "pdf The CU Communicator: Status and NIST Data Collection; Knowledge Integration", DARPA Workshop, Communicator Principal Investigators Meeting, Philadelphia, PA, September 11-13, 2000. 
  • R. San-Segundo, B. Pellom, W. Ward, J. M. Pardo, "pdf Confidence Measures for Dialogue Management in the CU Communicator System," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-2000), Istanbul Turkey, June 2000.
  • W. Ward, B. Pellom, "pdf The CU Communicator System," IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone Colorado, December, 1999.
  •