Boulder Language Technologies

  • Narrow screen resolution
  • Wide screen resolution
  • Auto width resolution
  • Decrease font size
  • Default font size
  • Increase font size
Home arrow BLT's Roots arrow Virtual Teachers and Therapists
Inventing Virtual Teachers and Therapists PDF Print E-mail

Project Overview:

The field of research concerned with the development of lifelike computer characters that behave like sensitive and effective teachers, testers, therapists, trainers and tutors is only ten years old, yet the effectiveness of systems already developed portend awesome future benefits to individuals and society. There are literally tens of millions of individuals on this planet who could benefit from individualized assessments and treatments, but it impossible to provide even a fraction of this population with the help they deserve. The vision is to invent virtual teachers and therapists that are imbued with the pedagogical and clinical knowledge and interpersonal behaviors of highly effective practitioners. Achieving this vision requires research breakthroughs in spoken dialog, computer vision and character animation technologies, and the integration of these technologies in computer programs that are informed by research on how to design multimedia programs that optimize clinical outcomes during interaction with virtual humans. My colleagues and I at Boulder Language Technologies and Mentor Interactive Inc.  are dedicated to activities that will accelerate the realization of this vision. Success will see a world in which effective treatments are accessible to all who need them. My colleagues and I have discussed some of the challenges involved in achieving this vision in Cole et al , 2003.  

Current Virtual Human R&D Activities at CSLR

Since its founding in October 1998, researchers at CSLR have focused on research and development of programs that incorporate lifelike computer characters that interact with people like sensitive and effective tutors and clinicians. Our work has focused on

(a) research and development of the language and animation technologies that power virtual humans,
(b) on the development of infrastructure to support these research and development activities, and
(c) developing and assessing programs that are designed to use virtual humans to provide effective clinical treatments to children and adults.

To date, CSLR has developed eight programs that use virtual humans. Each of these programs is currently under development, and being tested with human subjects. Each has been developed in close collaboration with "domain experts"-reading researchers, teachers and/or clinicians who have developed treatments that have demonstrated to be effective in the laboratory, classroom or clinic. The programs include:

  1. Foundations to Literacy , to teach children to read and learn from text.
  2. Flying Volando , to teach language, literacy, math, science and social studies to English language learners.
  3. LSVT VT , to automate portions of the LSVT speech and voice treatment for individuals with Parkinson disease.
  4. ORLA , to teach reading, speech and language generation and comprehension to individuals with aphasia.
  5. AphasiaScripts™ , to enable individuals with aphasia to design, learn and practice daily conversations
  6. Sentactics , to enable individuals with aphasia to comprehend and produce speech and language.

The lifelike computer characters in each of these programs use one of the 3D characters in the CU Animate system , developed by Dr. Jiyong Ma and his colleagues at CSLR. In all of our applications, the virtual tutor or therapist produces accurate and natural pdf visual speech, using a novel technique invented at CSLR that concatenates motions capture data collected from human lips. In all of our applications to date, the visual speech is synchronized with a recorded human voice, since the human voice is a remarkable instrument that conveys emotions, enthusiasm, etc., and imparts personality to the virtual human. The synchronization of the human voice to the movements of the lips in all of our programs occurs fully automatically; a voice talent (who may be a clinician) records an utterance, and the utterance and the associated text string are input to the alignment system. The system transforms the text string into a sequence of expected phonetic segments, and the SONIC speech recognition system aligns these phonetic segments to the recorded speech. The waveform is then played at the appropriate point in an application, and the time aligned phonetic segments are used to inform the CU Animate system when and how to move the lips and regions of the lower face of the 3D model. An algorithm developed by Jie Yan uses a set of rules to move the head and face while the character talks. In some applications, specific animation sequences are used to portray emotions when the virtual human is speaking or responding to the speech of the user.

Activities Designed to Stimulate Research and Development of Virtual Humans

I believe that a great way to advance science is to create and share tools, technologies and systems that enable research, and to test these products through collaboration with other scientists. I have followed this agenda since 1990, when I founded the Center for Spoken Language Understanding (CSLU) at the Oregon Institute in Beaverton Oregon. Between 1990 and 1998, CSLU developed and freely distributed over 20 different annotated speech corpora in 22 languages. These corpora are widely used today, and have stimulated research and development of speech recognition, speaker recognition and language identification systems around the world. In addition, my colleagues and I created and distribute the CSLU Toolkit, a software environment for collecting and annotating speech data, and for researching and developing spoken language applications that incorporate natural dialog interaction with virtual humans. The CSLU Toolkit has been installed in over 25,000 sites in over 100 countries and has become a major tool for research and education. For example, Cliff Nass and his students at Stanford have published over a dozen journal articles using the CSLU Toolkit (Cliff Nass articles). The Toolkit has also proven to be a powerful platform for application development; for example, the Vocabulary Wizard developed by Jacques de Villiers at CSLU is a wonderful platform for rapidly designing applications for teaching vocabulary. The Vocabulary Wizard was used to generate hundreds of applications at the Tucker Maxon School in Portland Oregon to teach new vocabulary, speech perception, speech production and reading skills to students with profound hearing loss. (See below for a brief description of this project.)

CSLR has continued this tradition, and today provides leadership in developing and distributing tools and technologies that will support research and development of a new generation of virtual human systems. These activities take many forms.

  • CSLR provides SONIC, Phoenix and other resources to researchers free of charge via the CSLR Web site. These toolkits are essential components in spoken dialogs with virtual humans.
  • With support from enlightened Program Mangers at the NSF, (including Mary Harper and Phil Rubin), my colleagues and I organized and held a workshop on virtual humans in San Diego in 2004 to identify key research challenges and to stimulate and sustain collaboration in virtual human research. In addition to the workshop report, the workshop produced several positive outcomes at CSLR. For example, I met Erika Rosenberg at the workshop, and we have collaborated with her to improve greatly the quality of the emotions produced by Marni and our other 3D models.

marni_emotions.jpg  
 

Marni Emotions

  • CSLR held a summer workshop , supported by the NSF and Coleman Institute for Cognitive disabilities, that brought 15 researchers from five countries to CSLR to develop applications in Interactive Books in which the virtual human interacted with users in French, German, Italian, Polish and Spanish.
  • This workshop led to the collaboration with Dr. Katarzyna (Kasia) Dziubalska-Kolaczyk, Head of the School of English at Adam Mickiewicz University (AMU) in Poznan, Poland. Kasia has since established the Center for Speech and Language Processing at AMU, and has received two grants from the Polish government to develop a Polish Literacy Tutor and Polish Version of LSVT Virtual Therapy for individuals with Parkinson disease.
  • In April 2006, CSLR received a two year grant from the NSF CRI (Computer Research Infrastructure) program to develop a toolkit to support research and development of Virtual Human systems. The goal of this grant is to develop a prototype toolkit for designing applications that enable natural spoken dialogs with virtual humans in a wide variety of applications.

How it All Began (A Personal Note)

My research on virtual humans was initiated in 1997 when I was awarded a three year "Challenge Grant" of $1.8 million from the National Science Foundation (with Dominic Massaro & Alex Waibel as co-Principal Investigators). The goal of this grant was to develop and integrate computer speech recognition, speech synthesis and character animation technologies into the CSLU Toolkit, and to use the toolkit to design applications to teach speech and language skills to students at the Tucker Maxon School in Portland Oregon. The research produced a number of articles,  and more importantly, significant and lasting benefits to the students who used the program (and to the many students who continue to use the program today).

Looking back on this wonderful project, it seems to me that a guardian angel must have guided our efforts. I was at CSLU in 1996, working on the CSLU Toolkit, which by then integrated computer speech recognition (Developed at CSLU) and the Festival speech synthesis system (developed by Paul Taylor and Alan Black at University of Edinburgh). Using the CSLU Toolkit's Rapid Application Developer, or RAD, a graphical user interface for designing spoken dialogs, it was possible to develop a number of sophisticated applications, such as conversing with the system to retrieve weather forecasts from a Web site. Also in 1996 I reconnected with Dom Massaro who invented the Baldi system with Michael Cohen. Baldi is a 3D talking head with very accurate lips. I had not seen Dom in over 25 years, when I was grad student at UC Riverside and Dom was a Postdoc at UC San Diego. Shortly after I read the NSF Challenge Grant program announcement, I ran into a colleague at a supermarket (that I had never gone to before) who used to work at the Oregon Graduate Institute. kathy told me she had left OGI and was now working at the Tucker Maxon Oral School (Now the Tucker Maxon School), a school that used an oral approach to instruction and language training for students with profound hearing loss.  At that moment, the proverbial light bulb went on in my mind. I asked her if she thought Tucker Maxon might like to partner on an NSF grant proposal to develop a talking head that would converse with their students to teach speech and language skills. Dom was as excited as I was, we submitted the proposal, and to our great surprise, it was awarded. The research was conducted between 1997-2000.

On March 15, 2001, ABV TV's Prime Time Thursday featured a segment in which Baldi, the 3D talking head invented by Dom Massaro and Michael Cohen at UC Santa Cruz, was shown helping children learn new vocabulary and dramatically improving their speech recognition and production skills. Prime Time introduced the segment with the words "This is what a small miracle looks like." The National Science Foundation also featured the Baldi project on the NSF home page during March and April 2001. While the Baldi project brought great benefits and joy to many children, and continues to benefit children with sensory and cognitive disabilities through the efforts of Animated Speech Corporation, which has licensed the technology and extended its capabilities, the premier of the Prime Time segment was overshadowed by the tragic death of Mike Macon, an exceptional speech researcher who invented the voice of Baldi, who died the evening of the broadcast.   After moving to CU and establishing the Center for Spoken Language Research, I applied for an NSF ITR grant to develop perceptive animated agents that could be used as virtual tutors and therapists.  This grant was awarded, and the technologies developed with support from this and other grants (e.g., the SONIC speech recognition system, the CU Animate system) led to subsequent grants / projects resulting in virtual tutoring and therapy programs.