Vorlesung SS 2012 Multilinguale Mensch-Maschine Kommunikation · •Praktikum: Multilingual Speech...
Transcript of Vorlesung SS 2012 Multilinguale Mensch-Maschine Kommunikation · •Praktikum: Multilingual Speech...
Ein
führu
ng
1
Vorlesung SS 2012
Multilinguale Mensch-Maschine
Kommunikation
Prof. Dr. Tanja Schultz
Dipl.-Inform. Tim Schlippe
Dienstag, 17. April 2012
Ein
führu
ng
2
Überblick
Vorlesung 1: Übersicht und Einführung
• Allgemeine Informationen zur Vorlesung
• Vorstellen des Lehrstuhls
• interACT
• Hinführung zum Thema
• Anwendungsbeispiele
2
Ein
führu
ng
3
Allgemeine Informationen: Vorlesung
3
Weiterführende Vorlesung im Hauptdiplom
– Vorkenntnisse sind nicht erforderlich
Prüfungsmöglichkeit:
– Ja, in Kognitive Systeme und Anthropomatik
Turnus:
– Jährlich im SS, 4+0
– Prüfung nur während der Vorlesungszeit (frühzeitig anmelden!)
Termine:
– Di 14:00 – 15:30 (HS -101) und Do 14:00 – 15:30 (SR 131)
– Start 19.04.2012, Ende 19.07.2012
DozentInnen:
– Prof. Dr. Tanja Schultz – Dipl.-Inform. Tim Schlippe
– Weitere MitarbeiterInnen des LS
Ein
führu
ng
4
Allgemeine Informationen: Vorlesung
4
Alle Vorlesungsunterlagen befinden sich unter
http://csl.anthropomatik.kit.edu > Studium und Lehre
> SS2012 > Multilinguale Mensch-Maschine Kommunikation
– Alle Folien als pdf (kein passwd Schutz)
– Aktuelle Änderungen, Ankündigungen, Syllabus
– Gegebenenfalls zusätzliches Material (Papers)
Grundlagen für Prüfungen:
– Vorlesungsinhalt, Folien, zusätzliches Material
Fragen, Probleme und Kommentare sind jederzeit während der
Vorlesung willkommen, oder im persönlichen Gespräch: CSL,
Laborgebäude Kinderklinik, Geb. 50.21, Adenauerring 4
– Tanja Schultz ([email protected]), Raum 113
– Tim Schlippe ([email protected]), Raum 117
Sprechstunden Tanja Schultz nach Vereinbarung
Ein
führu
ng
5
Allgemeine Informationen: CSL
5
Lehrstuhl für Kognitive Systeme seit 1. Juni 2007
– Karlsruher Institut für Technologie, Fakultät für Informatik
– Institut für Anthropomatik (neu seit 2009)
– Homepage: http://csl.anthropomatik.kit.edu
– Adresse: Adenauerring 4, 76131 Karlsruhe
Kontakt:
– Prof. Dr.-Ing. Tanja Schultz
• +49 721 608 46300
– Sekretariat Frau Helga Scherer
• +49 721 608 46312
Ein
führu
ng
6
Forschung: Human-Centered Technologies
6
Technologien und Methoden:
Erkennen, Verstehen, Identifizieren
Statistische Modellierung, Klassifikation, ...
Anw
endun
gsfe
ld M
ensch
-Maschin
e I
nte
raktion
Hera
susfo
rerd
eru
ngen
und A
ufg
agen:
Pro
duktivität
und
Usabili
ty
Kommunikation des Menschen mit seiner Umwelt
im weitesten Sinn:
Sprache, Bewegung, Biosignale A
nw
endun
gsfe
ld M
ensch
-Mensch K
om
munik
atio
n
Hera
usfo
rderu
ng u
nd A
ufg
aben:
Spra
chenvie
lfalt, k
ultu
relle
Barrie
ren
Aufw
and u
nd K
oste
n
Ein
führu
ng
7
Lehre am CSL – Winter
7
Wintersemester • Biosignale und Benutzerschnittstellen
– 4+0, prüfbar in Kognitive Systeme und Anthropomatik
– Einführung in Erfassung und Interpretation von Biosignalen
– Anwendungsbeispiele
• Analyse und Modellierung menschlicher Bewegungen
– Einführung in die Analyse, Modellierung, und Erkennung mensch-licher Bewegungsabläufe (gemeinsam mit Dr. Annika Wörner)
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
• Design und Evaluation Innovativer Benutzerschnittstellen
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
• Multilingual Speech Processing
– 2+0, Praktikum
– Entwicklung von Spracherkennungssystemen mittels Rapid Language Adaptation Tools
Ein
führu
ng
8
Lehre am CSL – Winter
8
Wintersemester
• Praktikum Biosignale 2: Emotion und Kognition
– 2+0
– Aufzeichnung und Analyse von Biosignalen (z.B. Puls, Hautleitwert, Atmung) zur Erfassung emotionaler und kognitiver Prozesse des Menschen
Ein
führu
ng
9
Lehre am CSL – Sommer
9
Sommersemester
• Multilinguale Mensch-Maschine Kommunikation
– 4+0, prüfbar in Kognitive Systeme und Anthropomatik
– Einführung in die automatische Spracherkennung und -verarbeitung
– Signalverarbeitung, statistische Modellierung, praktische Ansätze
und Methoden, Multilingualität
– Anwendungen in Mensch-Mensch Kommunikation und Mensch-
Maschine Interaktion
– Anwendungsbeispiele
• Praktikum: Biosignale
– Praktische Entwicklung
• Aufnahme von Bewegungsdaten (in Koop mit Sportinstitut)
• Verschiedene Biosensoren (Vicon, Beschleunigungssensoren, EMG)
• Automatischer Bewegungserkennung
Ein
führu
ng
10
Lehre am CSL – Sommer
10
Sommersemester
• Kognitive Modellierung
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
– Modellierung menschlicher Kognition und menschlichen Affekts im
Kontext der Mensch-Maschine-Interaktion
– Modelle menschlichen Verhaltens, menschliches Lernen
(Zusammenhang und Unterschiede zu maschinellen Lernverfahren),
Repräsentation von Wissen, Emotionsmodelle, und kognitive
Architekturen
• Methoden der Biosignalverarbeitung
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
– algorithmische Methoden der modernen Biosignalverarbeitung
Ein
führu
ng
11
Arbeiten am CSL
11
• Bachelor
• Master
• Studienarbeiten
• Diplomarbeiten
• Hiwi-Jobs
Ein
führu
ng
12 12
Development of adaptive dialog system
• CSL develops successful EEG-based workload recognition
system
– Is the user fully attentive or distracted?
• Integrated into speech dialog system to adapt its behavior
– Simple example: High workload use shorter, simpler utterances
• Your task for BA/MA/SA/DA thesis: Implement a workload
adaptive speech dialog system for more complex tasks (in Java)
– Explore possibilities for intelligent, “cognitive” system strategies to react to
high workload
– Creativity is encouraged and rewarded!
• Learn about…
– application of speech recognition
– design of intelligent speech dialog systems
– usability and user-centered design
• Contact: [email protected]
Ein
führu
ng
13 13
Aufgaben: • Finden und Extrahieren von Aussprachen im WWW • Sicherstellung der Qualität • Auswertung des Einfluss auf Spracherkennungssysteme
Benötigte Kenntnisse: • Grundlagenwissen Spracherkennung • Programmierkenntnisse, z.B. in Perl oder PHP • Spaß an Informatik und Linguistik
SA/BA/DA/MA: Web-derived Pronunciations
Ab sofort bei: Tim Schlippe ([email protected])
Ein
führu
ng
14
Hörerliste
14
• Ausfüllen!
N Nachname, Vorname Fach, Semester Mtr.-Nr Email
1 SCHULTZ, Tanja Informatik, 36 [email protected]
2
Ein
führu
ng
15
Literatur
15
Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken Language Processing, Prentice Hall PTR, NJ, 2001 ($81.90 internet price)
Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, Englewood Cliffs, NJ, 1993
Jelinek, Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, 1997 ($35)
Schultz and Kirchhoff, Multilingual Speech Processing, Elsevier, Academic Press, 2006 (ask the authors for discounts!)
+ diverse Artikel (pdf), die wir im Web zur Verfügung stellen
(wirklich lesen!)
Ein
führu
ng
16
Nützliche Links, Zusätzliches Material
16
• Alle Folien werden als pdf ins Web gestellt
http://csl.anthropomatik.kit.edu > Studium und Lehre > SS2012 > Multilinguale Mensch-Maschine Kommunikation
• Elektronisches Archiv vieler Publikationsbände und Berichte (Proceedings) der wichtigsten Konferenzen zum Thema “Speech and Language”
ICASSP (International Conference on Acoustics, Speech, and Signal
Processing)
Interspeech (Zusammenschluss von Eurospeech und ICSLP)
ASRU (Automatic Speech Recognition and Understanding)
ACL (Association of Comp Linguistics), NA-ACL (North American ACL)
HLT (Human Language Technologies) …
Ein
führu
ng
17
Nützliche Links, Zusätzliches Material
17
• Biosignale und Benutzerschnittstellen (Schultz)
– Sprache als ein Biosignal in einem allgemeineren Rahmen
• Maschinelle Übersetzung (Waibel)
– Zusammenhang: Sprachübersetzung, statistische Methoden,
Sprachmodellierung
• Mustererkennung (Beyerer)
– Grundlagen Mustererkennung
• Automatische Spracherkennung (Waibel/Stüker)
– Grundlagen Spracherkennung (WS)
• Praktikum: Multilingual Speech Processing (Schultz)
• Praktikum: Automatische Spracherkennung (Waibel)
• Seminar: Sprach-zu-Sprach-Übersetzung (Waibel)
Ein
führu
ng
18
Allgemeine Information: Ziel der Veranstaltung
18
Ziele der Vorlesung
•Sprache in der Mensch-Maschine Kommunikation
– Vor- und Nachteile von Sprache als Eingabesignal
– Aspekte der Multilingualität in der Spracherkennung
•Grundlagen der Spracherkennung
– Grundbegriffe
– Sprachproduktion und Perzeption
– Digitale Signalverarbeitung, Merkmalsextraktion
– Statistische Modellierung, Klassifikation
– Akustische Modellierung, HMMs
– Sprachmodellierung
•Weitere Themen der Sprachverarbeitung
– Dialogmodellierung, Synthese, (Übersetzung: bei Prof. Waibel)
•Anwendungsbeispiele aus der Forschung
Ein
führu
ng
19
Heute: Anwendungsbeispiele
19
• Spracherkennung: Von Spracheingabesignal nach Text
• Sprachsynthese: Von Text nach Sprachausgabesignal
• Sprachübersetzung (über Sprachengrenzen):
Von Sprachsignal in Sprache L1 zu Sprachsignal in L2
= Spracherkennung + MT + Sprachsynthese
• Sprachverstehen, Zusammenfassen
= Von Spracheingabesignal nach Bedeutung
• Sprachaktivität ist aber nicht nur das Was wird gesprochen Wer spricht? → SprecherIDentifizierung
Welche Sprache wird gesprochen? → LanguageID
Über was wird gesprochen? → TopicID
Wie wird gesprochen? → EmotionID
Zu wem wird gesprochen? → Focus of Attention
• Übersetzung (über Speziesgrenzen): Beispiel Delphine
Ein
führu
ng
20
Introduction
20
• Each of the lessons covers one topic from
“speech recognition and understanding”
• It covers the most important areas of today’s research
and also discusses some historic issues
• The goal of the course is to introduce you to the science
of automatic speech recognition and understanding
• Today‘s topic:
– Why are we doing Speech Recognition?
• What are the advantages and disadvantages
– Where is it useful?
• Examples of applications, demos
Ein
führu
ng
21
Why Automatic Speech Recognition?
21
ADVANTAGES:
• Natural way of communication for human beings
No practicing necessary for users, i.e. speech
does not require any teaching as opposed to
reading/writing
High bandwidth (speaking is faster than typing)
• Additional communication channel (Multimodality)
• Hands and eyes are free for other tasks
→ Works in the car / on the run / in the dark
• Mobility (microphones are smaller than keyboards)
• Some communication channels (e.g. phone) are designed
for speech
• ...
Ein
führu
ng
22
Why Automatic Speech Recognition?
22
DISADVANTAGES:
• Unusable where silence/confidentiality is required
(meetings, library, spoken access codes)
… we are working on solutions (see later)
• Still unsatisfactory recognition rate when:
Environment is very noisy (party, restaurant, train)
Unknown or unlimited domains
Uncooperative speakers (whisper, mumble, …)
• Problems with accents, dialects, code-switching
• Cultural factors (e.g. collectivism, uncertainty avoidance)
• Speech input is still more expensive than keyboard
Ein
führu
ng
23
Input Speeds (Characters per Minute)
23
Mode
Standard
Best
Handwriting
200
500
Typewriter
200
1000
Stenography
500
2000
Speech
1000
4000
Ein
führu
ng
24
Where is Speech Recognition and Understanding useful
Human - Machine Interaction:
1. Remote control applications • Operating Machines over the Phone
2. Hands/Eyes busy or not useful • Speech Recognition in cars
• Help for Physically Challenged, Nurse bots
3. Authentication • Speaker Identification/Verification/Segmentation
• Language/Accent Identification
4. Entertainment / Convenience • Speech Recognition for Entertainment
• Gaming
5. Indexing and Transcribing Acoustic Documents • Archive, Summarize, Search and Retrieve
Ein
führu
ng
25
Where is Speech Recognition and Understanding useful
Human - Human Interaction:
1. Mediate communication across language boundaries
• Speech Translation
• Language Learning
• Synchronization / Sign Language
2. Support human interaction
• Meeting and Lecture systems
• Non-verbal Cue Identification
• Multimodal applications
• Speech therapy support
Ein
führu
ng
26
Operating Machines over the Phone
• Remote Controlled Home Operate heating / air conditioning, turn lights on/off, check email
• Voice-Operated Answering Machine Call answering machine from anywhere and discuss recent calls
• Access Databases Pittsburgh Bus Information with CMU’s Let’s Go at 412-268-3526
Check the weather with MIT’s Jupiter at 1-888-573-8255
Zugauskunft (Erlangen), Telefonauskunft, Fluggesellschaften, Kino
• Call Center Route or dispatch calls, 911 emergency line
AT&T: How may I help you?
The HMIHY system was deployed in 2001, and according to AT&T
was handling more than 2 million calls per month by the end of 2001.
• Use Interactive Services worldwide Plan your next trip with an artificial travel agent
Ein
führu
ng
27
Hands-Free / Eyes-Free Tasks
• Hands and/or Eyes are busy with tools
Radio repair
Construction site
• Hands and/or Eyes are needed to operate machines/cars
Hold the steering wheel
Pull levers, turn knobs, operate switches
Watch the street while driving
Monitor production line
• Hands are working on other people Hair stylist cutting hair
Surgeon working on a patient
• Hands and/or Eyes are not helpful in the environment Dark rooms (photography)
Outer Space (remote control)
Ein
führu
ng
28
Speech Recognition in Cars
• Use your cellular phone while keeping your hands on the
wheel and eyes on the street, e.g. voice dialing
• Operate your audio device while driving
• Dictate messages (e-mails, SMS)
TODAY several companies and services
are emerging which do exactly this
• Talk to your personal digital assistant
• Navigation -
Ask your way through a foreign city
Find the nearest restaurant
Ein
führu
ng
29
Support in everyday life, Help for Elderly and Physically Challenged
People who are immobile such as lying in bed/hospital or who can‘t
their hands due to illness or accidents
• operate parts of their environment/machines by voice
• ask a robot for help
Nursebot Pearl and Florence: ISAC feeding a physically challenged individual
CMU‘s Robotic assistant for the elderly Center for Intelligent Systems, Vanderbilt Univ
Children with speaking disorders make significant improvements by trying
to make a speech recognizer understand them
Children with dyslexia and similar problems learn to read faster using
automatic speech recognition
Ein
führu
ng
30
Information in Sprache
Speech Speech
Recognition
Language Language
Recognition
Speaker Speaker
Recognition
Accent Accent
Recognition
Words Onune baksana be adam!
Language Name
TurkishTurkish
Speaker Name
UmutUmut
Accent
IstanbulIstanbul
:
: :
:
Emotion Emotion
Recognition
Emotion
AngryAngry
Topic ID: Chemicals
Entity Tracking: Istanbul
Acoustic Scene: Bus Station
Discourse Analysis: Negotiation
Tanja Schultz, Speaker Characteristics, In: C. Müller (Ed.) Speaker Classification, Lecture Notes in Computer
Science / Artificial Intelligence, Springer, Heidelberg - Berlin - New York, Volume 4343.
Ein
führu
ng
31
Speaker Recognition
?
?
?
Identification
? ?
Verification/Detection
Segmentation and Clustering
Whose voice is it? Is it Sally’s voice?
Tim
Will
Where are the speaker changes?
Which segments are from the same speaker?
Ein
führu
ng
32
Speaker Identification/Verification/Recognition
Verification verify someone’s claimed identity, e.g.
is the person who s/he claims to be
Instead of password: say something instead of typing
Identification “who is speaking”
Identifies a speaker from an enrolled
population by searching the database
Personalized behavior: customize machine reaction auto-
matically to the current user
Recognition Often used to refer to all problems of
verification, identification,
segmentation&clustering
Ein
führu
ng
33
Speaker Segmentation and Clustering
Overlapping speech Speech over noise
Speaker turn miss
Segmentation: Automatically segment incoming speech by speaker
Clustering: cluster segments of the same speaker
Adaptation: use parameters that are optimized recognition for specific speaker
Mandarin Broadcast News
Ein
führu
ng
34
Language Identification
Japanese
o Auswahl Erkenner (bei multilingualer Spracherkennung)
o Anrufweiterleitung (z.B. 911 emergency line)
o Datenanalyse, Auswahl
o Spezialfall: Akzenterkennung
o Optimierung aller Systemparameter auf Sprecherakzent
o E-Language Learning
Tanja Schultz, Identifizierung von Sprachen -Exemplarisch aufgezeigt am Beispiel der Sprachen Deutsch, Englisch und Spanisch,
Diplomarbeit, Institut für Logik, Komplexität und Deduktionssysteme, Universität Karlsruhe, April 1995
Ein
führu
ng
35
FarSID: Far-Field Speaker Recognition
• Originalsignal
• Effekt Echo
• Effekt Distance
• Effect Raumgröße (1-m Distanz, .5-sec Echo)
Klein
Q. Jin, Y. Pan, T. Schultz, Far-Field Speaker Recognition, Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, ICASSP, Toulouse, France, 2006
Ein
führu
ng
36
Global Communication
The dream (?) of communicating
across language boundaries
- A babelfish for everybody -
• Fun, Everyday life:
• Chat in your mother tongue
Worldwide
• Travel without comm. problems
• Business:
• Negotiate and being sure that
your partner is getting it right
• Computer has no stakes, e.g.
neutral translation, not lopsided
• Face-to-Face Communication
• Over the phone or internet
• Text-to-Text vs Speech-to-Speech
„The building of the tower of Babel“,
1563 by Pieter Brueghel,
Kunsthistorisches Museum, Vienna
The building of the Tower of Babel
and the Confusion of Tongues
(languages) in ancient Babylon
mentioned in Genesis
"Babel" is composed of two words
"baa“meaning "gate" and "el," "god."
Hence, "the gate of god.“ A related
word in Hebrew, "balal" means
"confusion."
Ein
führu
ng
37
GALE
GALE = Global Autonomous Language Exploitation:
Process huge volumes of speech and text data in
multiple languages (Arabic, Chinese, English)
• Broadcast News, Shows, Telephone Conversations
Apply automatic technology to spoken and written language:
• Absorb, Analyze, and Interpret
Deliver pertinent information in easy-to-understand
forms to monolingual analysts
Three engines:
- Transcription,
- Translation,
- Distillation
Ein
führu
ng
38
Demonstration GALE – Chinese TV
Mandarin
Broadcast News
CCTV
recorded in the US
over satellite
Transforming the
Mandarin speech
Into Chinese text
using Automatic
Speech Recognition ASR
Translating from
Chinese text into
English text
using Statistical
Machine Translation
SMT
H. Yu, Y.C. Tam, T. Schaaf, S. Stüker, Q. Jin, M. Noamany, T. Schultz, The ISL RT04 Mandarin Broadcast
News Evaluation System, EARS Rich Transcription Workshop, Palisades, NY, November 2004
Ein
führu
ng
39
PDA Speech Translation in Mobile Scenarios
• Tourism
– Needs in Foreign Country
– International Events
• Conferences
• Business
• Olympics
• Humanitarian Needs
– Humanitarian, Government
– Emergency line 911
– USA, multicultural
population
• Army, peace corps
A. Waibel, A. Badran, A. Black, R. Frederking, D. Gates, A. Lavie, L. Levin, K. Lenzo, L Mayfield Tomokiyo,
J. Reichert, T. Schultz, D. Wallace, M. Woszczyna, J. Zhang, Speechalator: Two-way Speech-to-Speech
Translation in your Hand. HLT-NAACL 2003, Edmonton, Alberta, Canada, 2003
Ein
führu
ng
40
Verbmobil
Talk to people (face-to-face) from/in other countries in your own
language.
A step towards Startrek's "Universal Translator“
Ein
führu
ng
41
Mobility: Personal Digital Assistants
Use your PDA or cellular phone to get help
• Navigation
• Translation
• Information (travel, transportation, medical, ...)
Demo
Ein
führu
ng
42
RLAT: Rapid Language Adaptation Tools
Major Problem: Tremendous costs and time for development
– Very few languages ( 50 out of 6900) with many resources
– Lack of conventions (e.g. Languages without writing system)
– Gap between technology and language expertise
SPICE: Intelligent system that learns language from user – Speech Processing: Interactive Creation and Evaluation toolkit
– Develop web-based toolkits for Speech Processing: ASR, MT, TTS
– http://cmuspice.org
– http://csl.ira.uka.de/rlat-dev
• Interactive efficient learning Interactive learning:
– Solicite knowledge from user in the loop
– Rapid adaptation of language independent models
Efficiency:
– Reduce time and costs by a factor of 10
T. Schultz, A. Black, S. Badaskar, M. Hornyak, J. Kominek, SPICE: Web-based Tools for Rapid Language
Adaptation in Speech Processing Systems, Proceedings of Interspeech, Antwerp, Belgium, August 2007
Demo
Ein
führu
ng
43
Meeting Room
The Meeting Browser is a powerful tool that allows us to record a new
meeting, review or summarize an existing meeting or search a set of
existing meetings for a particular speaker, topic, or idea.
http://www.is.cs.cmu.edu/meeting_room/
Ein
führu
ng
44
Indexing Acoustic Documents
The world is flooded with
information.
More and more
information is coming
through audio-visual
channels.
Trying to find information
in acoustic documents
needs an intelligent
acoustic search engine.
Ein
führu
ng
45
View4You / Informedia
Automatically records Broadcast News and allows the
user to retrieve video segments of news items for
different topics using spoken language input
Kemp/Waibel 1999
Ein
führu
ng
46
Education, Learning Languages
• LISTEN: Automated reading tutor that listens
to a child read it aloud a displayed text, and
helps where needed.
• CHENGO: web-based language learning in a
gaming environment for English, Chinese
• Programm CALL at CMU on Computer
Assisted Language Learning
Ein
führu
ng
47
Robust and Confidential Speech Recognition
Traditional Speech Recognition:
• Capture the acoustic sound wave by microphone
• Transform signal into electrical energy
Requirements and Challenges:
• Audibility:
Speech needs to be perceivable by microphone
(no low voice or whispering, no silent speech)
• Interference: Speech disturbs others
(no speaking in libraries, theaters, meetings)
• Privacy: Speech signal can be captured by others
(no confidential phone calls in public places)
• Robustness:
Signal is corrupted by noisy environment
(difficult to recognize in restaurants, bars, cars)
Ein
führu
ng
48
Bone-conduction
• When we speak normally our body is a resonance box
Skin and bones vibrate when we speak (try this!)
• Capture this vibration by so-called bone-conducting
or skin-conducting microphones
• Whispered speech is defined as:
– the articulated production of respiratory sound
– with few or no vibration of the vocal-folds
– produced by the motion of the articulator apparatus
– transmitted through the soft tissue or bones of the head
Nakajima
Zheng et al. Jou et al. / Intecs
Stethoscopic
Microphone
Ein
führu
ng
49
Electromyography – Silent Speech
Approach:
– Surface Electromyography (EMG)
– Surface = No needles
– Electro = electrical activity
– Myo = muscle
– Graphy = recording
s1
s2
s1 – s2
EMG-Signal
- Measure the electrical activity of facial
muscles by capturing the electrical
capacity differences
- MOTION is recorded, not acoustic signal
silently moving the lips / articulators
is good enough
SILENT SPEECH Demo
Ein
führu
ng
51
Delphinisch
Kommunikation über Sprachgrenzen über Speziesgrenzen
• Zusammenarbeit mit Wild Dolphins Project • freilebende Atlantis Spotted Dolphins
• Bestimmung, Verhalten, Kommunikation
• Kommunikation mit Delphinen
• Delphine versuchen Kontakt aufzunehmen
• Information 20Mio Jahre alte Spezies
• “Dolphone” und “Delphinisch” • Lautproduktion, Perzeption, Frequenz,
Medium
• Mustererkennung, Extraktion,
Clustering, Statistische Modellierung
• Audio- und Video indexing, archiving, retrieval
• Audioaufnahme, -analyse, -synthese, -übersetzung
http://wilddolphinproject.com
Ein
führu
ng
52
Even Beyond Human Speech …
Towards Communication with Dolphins
CMU: www.cs.cmu.edu/~tanja
Wild Dolphin Project
(http://wilddolphinproject.com)
Why do we want to talk to Dolphins? • They might have a lot to say (20Mio old species)
• It is a challenging scientific problem
- Cross language boundaries
Cross species boundaries
- Different sound production, perception, …
- Different medium (water), transmission, omni-directional
• Nothing is known about dolphins’ language
• It involves spending a lot of time in the Bahamas
Why do Dolphins want to talk to us? We don’t know …
… but there is evidence that they try hard
Ein
führu
ng
53
• Collaborative research and development program
• Developing multimedia and multilingual indexing and
management tools
e.g. automatic analysis, classification, extraction and
exploration of information
• Facilitate extraction of information in unlimited quantities of
multimedia and multilingual documents, including written texts,
speech and music audio files, and images and videos.
• Available to everyone via personal computers, television and
handheld terminals.
Quaero
Ein
führu
ng
54
Conclusions
Speech:
• Is the most natural way of communication for human beings
• Does not require any teaching or practicing
• Has high bandwidth (speaking is faster than typing)
• Supplements other communication channels (Multimodality)
Speech Recognition is useful:
• In hands-busy and eyes-busy environments
• For mobile / small devices
• Support in everyday life, Help for physically challenged folks
Speech Recognition and Understanding:
• Allows to (remotely) operate Machines
• Supports global communication between humans
• Break language (and maybe sometimes cultural) barriers