The field of research concerned with the development of lifelike computer characters that behave like sensitive and effective teachers, testers, therapists, trainers and tutors is only ten years old, yet the effectiveness of systems already developed portend awesome future benefits to individuals and society. There are literally tens of millions of individuals on this planet who could benefit from individualized assessments and treatments, but it impossible to provide even a fraction of this population with the help they deserve. The vision is to invent virtual teachers and therapists that are imbued with the pedagogical and clinical knowledge and interpersonal behaviors of highly effective practitioners. Achieving this vision requires research breakthroughs in spoken dialog, computer vision and character animation technologies, and the integration of these technologies in computer programs that are informed by research on how to design multimedia programs that optimize clinical outcomes during interaction with virtual humans. My colleagues and I at Boulder Language Technologies and Mentor Interactive Inc. are dedicated to activities that will accelerate the realization of this vision. Success will see a world in which effective treatments are accessible to all who need them. My colleagues and I have discussed some of the challenges involved in achieving this vision in Cole et al , 2003.
Current Virtual Human R&D Activities at CSLR
Since its founding in October 1998, researchers at CSLR have focused
on research and development of programs that incorporate lifelike computer
characters that interact with people like sensitive and effective tutors
and clinicians. Our work has focused on
(a) research and development of the language and animation technologies
that power virtual humans,
(b) on the development of infrastructure to support these research and
development activities, and
(c) developing and assessing programs that are designed to use virtual
humans to provide effective clinical treatments to children and adults.
To date, CSLR has developed eight programs that use virtual humans. Each
of these programs is currently under development, and being tested with
human subjects. Each has been developed in close collaboration with "domain
experts"-reading researchers, teachers and/or clinicians who have
developed treatments that have demonstrated to be effective in the laboratory,
classroom or clinic. The programs include:
- Foundations to Literacy , to teach
children to read and learn from text.
- Flying Volando , to teach
language, literacy, math, science and social studies to English language
- LSVT VT , to automate portions
of the LSVT speech and voice treatment for individuals with Parkinson
- ORLA , to teach reading, speech
and language generation and comprehension to individuals with aphasia.
- AphasiaScripts™ , to enable
individuals with aphasia to design, learn and practice daily conversations
- Sentactics , to
enable individuals with aphasia to comprehend and produce speech and
The lifelike computer characters in each of these programs use one of
the 3D characters in the CU
Animate system , developed by Dr. Jiyong Ma and his colleagues at CSLR.
In all of our applications, the virtual tutor or therapist produces accurate
and natural visual speech, using a novel technique invented at CSLR that concatenates
motions capture data collected from human lips. In all of our applications
to date, the visual speech is synchronized with a recorded human voice,
since the human voice is a remarkable instrument that conveys emotions,
enthusiasm, etc., and imparts personality to the virtual human. The synchronization
of the human voice to the movements of the lips in all of our programs
occurs fully automatically; a voice talent (who may be a clinician) records
an utterance, and the utterance and the associated text string are input
to the alignment system. The system transforms the text string into a
sequence of expected phonetic segments, and the SONIC
speech recognition system aligns these phonetic segments to the recorded
speech. The waveform is then played at the appropriate point in an application,
and the time aligned phonetic segments are used to inform the CU Animate
system when and how to move the lips and regions of the lower face of
the 3D model. An algorithm developed by Jie Yan uses a set of rules to
move the head and face while the character talks. In some applications,
specific animation sequences are used to portray emotions when the virtual
human is speaking or responding to the speech of the user.
Activities Designed to Stimulate Research and Development of Virtual
I believe that a great way to advance science is to create and share
tools, technologies and systems that enable research, and to test these
products through collaboration with other scientists. I have followed
this agenda since 1990, when I founded the Center for Spoken Language
Understanding (CSLU) at the Oregon Institute in Beaverton Oregon. Between
1990 and 1998, CSLU developed and freely distributed over 20 different
corpora in 22 languages. These corpora are widely used today, and
have stimulated research and development of speech recognition, speaker
recognition and language identification systems around the world. In addition,
my colleagues and I created and distribute the CSLU
Toolkit, a software environment for collecting and annotating speech
data, and for researching and developing spoken language applications
that incorporate natural dialog interaction with virtual humans. The CSLU
Toolkit has been installed in over 25,000 sites in over 100 countries
and has become a major tool for research and education. For example, Cliff
Nass and his students at Stanford have published over a dozen journal
articles using the CSLU Toolkit (Cliff
Nass articles). The Toolkit has also proven to be a powerful platform
for application development; for example, the Vocabulary Wizard developed
by Jacques de Villiers at CSLU is a wonderful platform for rapidly designing
applications for teaching vocabulary. The Vocabulary Wizard was used to
generate hundreds of applications at the Tucker Maxon School in Portland
Oregon to teach new vocabulary, speech perception, speech production and
reading skills to students with profound hearing loss. (See below for
a brief description of this project.)
CSLR has continued this tradition, and today provides leadership in developing
and distributing tools and technologies that will support research and
development of a new generation of virtual human systems. These activities
take many forms.
- CSLR provides SONIC, Phoenix and other resources to researchers free
of charge via the CSLR Web site. These toolkits are essential components
in spoken dialogs with virtual humans.
- With support from enlightened Program Mangers at the NSF, (including
Mary Harper and Phil Rubin), my colleagues and I organized and held
a workshop on virtual humans in
San Diego in 2004 to identify key research challenges and to stimulate
and sustain collaboration in virtual human research. In addition to
the workshop report, the workshop produced several positive outcomes
at CSLR. For example, I met Erika Rosenberg at the workshop, and we
have collaborated with her to improve greatly the quality of the emotions
produced by Marni and our other 3D models.
- CSLR held a summer workshop , supported
by the NSF and Coleman Institute for Cognitive disabilities, that brought
15 researchers from five countries to CSLR to develop applications in
Interactive Books in which the virtual human interacted with users in
French, German, Italian, Polish and Spanish.
- This workshop led to the collaboration with Dr. Katarzyna (Kasia)
Dziubalska-Kolaczyk, Head of the School of English at Adam Mickiewicz
University (AMU) in Poznan, Poland. Kasia has since established the
Center for Speech and Language
Processing at AMU, and has received two grants from the Polish government
to develop a Polish Literacy Tutor and Polish Version of LSVT Virtual
Therapy for individuals with Parkinson disease.
- In April 2006, CSLR received a two year grant from the NSF CRI (Computer
Research Infrastructure) program to develop a toolkit to support research
and development of Virtual Human systems. The goal of this grant is
to develop a prototype toolkit for designing applications that enable
natural spoken dialogs with virtual humans in a wide variety of applications.
How it All Began (A Personal Note)
My research on virtual humans was initiated in 1997 when I was awarded a three year "Challenge Grant" of $1.8 million from the National Science Foundation (with Dominic Massaro & Alex Waibel as co-Principal Investigators). The goal of this grant was to develop and integrate computer speech recognition, speech synthesis and character animation technologies into the CSLU Toolkit, and to use the toolkit to design applications to teach speech and language skills to students at the Tucker Maxon School in Portland Oregon. The research produced a number of articles, and more importantly, significant and lasting benefits to the students who used the program (and to the many students who continue to use the program today).
Looking back on this wonderful project, it seems to me that a guardian angel must have guided our efforts. I was at CSLU in 1996, working on the CSLU Toolkit, which by then integrated computer speech recognition (Developed at CSLU) and the Festival speech synthesis system (developed by Paul Taylor and Alan Black at University of Edinburgh). Using the CSLU Toolkit's Rapid Application Developer, or RAD, a graphical user interface for designing spoken dialogs, it was possible to develop a number of sophisticated applications, such as conversing with the system to retrieve weather forecasts from a Web site. Also in 1996 I reconnected with Dom Massaro who invented the Baldi system with Michael Cohen. Baldi is a 3D talking head with very accurate lips. I had not seen Dom in over 25 years, when I was grad student at UC Riverside and Dom was a Postdoc at UC San Diego. Shortly after I read the NSF Challenge Grant program announcement, I ran into a colleague at a supermarket (that I had never gone to before) who used to work at the Oregon Graduate Institute. kathy told me she had left OGI and was now working at the Tucker Maxon Oral School (Now the Tucker Maxon School), a school that used an oral approach to instruction and language training for students with profound hearing loss. At that moment, the proverbial light bulb went on in my mind. I asked her if she thought Tucker Maxon might like to partner on an NSF grant proposal to develop a talking head that would converse with their students to teach speech and language skills. Dom was as excited as I was, we submitted the proposal, and to our great surprise, it was awarded. The research was conducted between 1997-2000.
On March 15, 2001, ABV TV's Prime Time Thursday featured a segment in which Baldi, the 3D talking head invented by Dom Massaro and Michael Cohen at UC Santa Cruz, was shown helping children learn new vocabulary and dramatically improving their speech recognition and production skills. Prime Time introduced the segment with the words "This is what a small miracle looks like." The National Science Foundation also featured the Baldi project on the NSF home page during March and April 2001. While the Baldi project brought great benefits and joy to many children, and continues to benefit children with sensory and cognitive disabilities through the efforts of Animated Speech Corporation, which has licensed the technology and extended its capabilities, the premier of the Prime Time segment was overshadowed by the tragic death of Mike Macon, an exceptional speech researcher who invented the voice of Baldi, who died the evening of the broadcast. After moving to CU and establishing the Center for Spoken Language Research, I applied for an NSF ITR grant to develop perceptive animated agents that could be used as virtual tutors and therapists. This grant was awarded, and the technologies developed with support from this and other grants (e.g., the SONIC speech recognition system, the CU Animate system) led to subsequent grants / projects resulting in virtual tutoring and therapy programs.