ttssamples.syntheticspeech.de Open in urlscan Pro
2a00:1169:103:f5e0::  Public Scan

Submitted URL: http://ttssamples.syntheticspeech.de/
Effective URL: https://ttssamples.syntheticspeech.de/
Submission: On June 27 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

site navigation

--------------------------------------------------------------------------------


GERMAN TEXT-TO-SPEECH

last update: 15th March 2023


CONTENTS:

 1.  Foreword
 2.  Commercial systems
 3.  Universities/research
 4.  Other systems
 5.  Service systems
 6.  Further samples
 7.  Licensed products
 8.  Missing examples
 9.  Unknown examples
 10. TTS classification chart
 11. Credits
 12. Change-log

--------------------------------------------------------------------------------




REMARKS

   
 * This collection of German TTS samples is maintained by Felix Burkhardt
 * There's also a page on Speechsynthesis demos with simulated emotion
 * I appreciate hints about missing systems! ( felixbur@gmx.de )
 * Generally the demonstrations don't show the up-to-date quality of the
   systems!
 * The information provided is to my best knowledge, but of course I can't
   guarantee for the correctness!
 * Some difference in quality is due to different sample-rates; most demos are
   16 kHz but some are 8 kHz or 22.
 * Don't rely on the year specifications, I'm very unsure about most of them.
   Many of them simply denote the year when I took the samples (although they're
   meant to stand for the year when the voice / engine was released).
 * The remark "courtesy of company X" means that I got the samples specially for
   this page from the respective company. Possibly they made special adjustments
   wrt. pronounciation and prosody.


TERMINOLOGY

I added a chart to facilitate the understanding of the concepts used for
classification. It's kind of out-dated, as non-uniform unit selection is not
explicitely mentioned.

TTS consists always of two components, which I call Dutoit's introduction):

 * NLP (Natural Language Processing): conversion of orthographical text into
   phoneme-alphabet and prosody description.
 * DSP (Digital Speech Processing): speech engine: synthesis of speechsignal
   from ouput of NLP-component.

The engines that synthesize the speech (DSP-component) are based mainly on five
main technologies:

 * DNN Synthesis: Quite the newest addition to speech synthesis algorithms are
   artificial neural networks (or deep neural nets, meaning the number of layers
   is higher than in traditional artificial neural network architecture, say:
   five. TTS with neural nets has been done since many decades but to my
   knowledge not for German). They replace the HMM approach to predict the best
   acoustic parameters for a given sequence of symbols representing text.
 * HMM Synthesis: Synthesis based on Hidden Markov Models, a statistical
   approach to model the transition probabilites of the acoustic parameters
   based on the speech to be generated. The approaches are trained on a
   relatively large data corpus, but have a small footprint for synthesis
   because they don't operate on the wavedata directly but on some parameterized
   representation (e.g. LPC). However this is also the reason they tend to
   produce artifacts. Sample: Simple4All
 * Non-uniform unit-selection: Best fitting chunks of speech from large
   databases get concatenated, minimizing a double cost-function: best fit to
   neighbor unit and best fit to target prosody. Sounds most natural (similar to
   original speaker), but unflexible and large footprint. Sample: RealSpeak
 * Diphone-synthesis: Speech concatenated from diphone-units (two-phone
   combinations), prosody-fitting done by signal-manipulation (depends on
   unit-coding). relatively small footprint but not very natural. Sample:
   Bell-Labs synthesis
 * Formant-synthesis: Speech synthesized by physical models (formants are
   resonance frequencies in vocal-tract). Very flexible and smallest footprint,
   but very unnatural. Sample: Eloquence

--------------------------------------------------------------------------------

The test sentences were:

sentence 1:

An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes
besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen.

As I found this sentence a bit too simple, I thought up another test sentence
which contains a collection of known problems for the NLP module: (in some demos
this sentence is truncuated due to provider's restriction on character number)

sentence 2:

Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens
nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das
Gramm.

Speaking now 6 years after thinking up those sentences, more pressing problems
for German speech synthesis used in services like email-reading arise from the
pronounciation of english terms, e.g. the following sentence would not be
pronounced correctly by most systems without tuning:

sentence 3:

Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden.


COMMERCIAL

company/link description/engine name technology languages voice name year
(approx.) s1 s2 s3 Acapela

Acapela was formed in December 2003 from a combination of three European
companies specializing in vocal technologies, Babel Technologies (Belgium),
Infovox (Sweden) and Elan Speech (France). Acapela HQ TTS non-uniform
unit-selection DE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT,
BR, RU, TR Claudia
2015 Claudia (Smile)
2017 Lea (Child)
2013 Jonas (child)
2013 Andreas
2011 Julia
2009 Klaus
2006 Sarah
2003 Custom Voice non-uniform unit-selection, ANN DE Felix DNN (page author):
Artificial neural net model adapted with 15 minutes data 2017 Felix (page
author):
Non-uniform unit-selection from 2 hours data 2017 Greeting Bunny non-uniform
unit-selection DE, US, FR, IT, ES, NL, SE, NO, DK, BE Bunny
2008 Aculab

Aculab diphone
diphone-concatenation with LPC coded units. LPC (linear predictive coding),
originally a compression algorithm, useful for synthesis because based on a
source/filter model of speech. DE, UK, US, FR, BR, IT, ES Julia
1998 - Aristech

Formerly Speechconcept Cerevoice non-uniform unit-selection
Developments from Aristech, CereProc and University of Edinburgh DE, EN, FR, IT,
ES, US, NL, JP Sophie, adult
Corporate Voice, courtesy of Aristech 2011 Leopold, Austrian adult
courtesy of Aristech 2013 Alex, adult
courtesy of Aristech 2016 Gudrun, adult
courtesy of Aristech 2013 Nick, youth
courtesy of Aristech 2011 Saskia, youth
courtesy of Aristech 2011 Atip

Proser diphone
NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from
Babeltech DE, US Carla
2000 Erkan
Turkish accent 2004 Fifi
French accent 2004 Steffen
2000 Eva
2000 AT&T

Natural Voices non-uniform unit-selection DE, IT, US, UK, FR, MX* Klara
2001 Reiner
2002 Babeltech

Brightspeech non-uniform unit-selection
same as Acapela HQ TTS Ingrid
2002 - Babil diphone
diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band
Resynthesis Overlap and Add), similar to PSOLA but the database is treated
beforehand to adapt pitch, amplitude and spectral features. DE, US, UK, ES, FR,
NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SA Eva
2000 Greta
2000 Steffen
1997 Helga
Same as Infovox 330 1998 - - - Gerhard
Same as Infovox 330 1998 - - - Bell Labs

diphone
LPC-diphone concatenation DE, FR, ES, US, UK, IT, RU, RO, CN
- Centigram

Acquired by Lernout & Hauspie, later Nuance TruVoice formant DE, US, MX*, FR, IT
1996 - Cepstral

Cepstral TTS non-uniform unit-selection
Associated wiith Alan Black, one of the pioneers of non-uniform unit-selection
and lead scientist of Festival, an open source text-to-speech framework
developed at Univ. of Edinburgh and the CMU. DE, UK, US, ES, FR, EG, TH, AF
Kathrin
2003 Matthias
2003 Deutsche Telekom

Berkom TTS formant
Research system by former rd department of German Telekom. Hybrid approach
combining formant synthesis for voiced phonemes and concatenating with waveform
coded units for unvoiced parts. DE Felix
1998 SAMT hardware-based formant synthesis
(Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of
former Forschungsinstitut der Deutschen Bundespost. DE

Other sample: 1987 - - - Digital Equipment Corporation

DecTalk formant
First commercial text-to-speech synthesizer. Rule based formant-synthesis - the
legendary formant synthesizer, based on Klatt's MITTalk) DE, US, UK, ES, MX*, FR
1982 - Elan

SaySo non-uniform unit-selection DE, US, FR, IT, ES Lea
2003 Tempo diphone
Pitch Synchronous Overlap and Add (PSOLA): famous algorithm to change pitch and
time of speech that made diphone-synthesis a great success for many years. DE,
US, UK, FR, ES, IT, BR, PT, RU, PL Thomas
1998 Dagmar
1996 Eloquent Technologies

Aquired by Scansoft. ETI Eloquence
rule-based formant-synthesis (Klatt-style). Later sold by Speechworks, also
licensed to IBM (ViaVoice Outloud) DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR,
CN, JP, KR
1998 - GData

Logox microsegment synthesis
Microsegmentsynthesis (concatenating subphonetic units), not developed any more.
Originally based on research from Univ. of Saarbrücken. DE, US, UK Default voice
2000 - Bill
1998 Bill (Swabian accent)
2002 Bill (Hessian accent)
2002 Bill (Saxon accent)
2002 Bill (French accent)
2002 google

wavenet wavenet: artificial neural nets end-to-end AF, AR, BG, BN, CA, CS, DA,
DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV, ML, MS, NL,
NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH Wavenet A
(female)
2018 Wavenet B (male)
2018 Wavenet C (female)
2018 Wavenet D (male)
2018 Wavenet E
2022 Wavenet F
2022 Google Basic so-called basic (non-uniform unit selection?) AF, AR, BG, BN,
CA, CS, DA, DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV,
ML, MS, NL, NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH
Standard A (female)
2018 Standard B (male)
2018 Basic C
2022 Basic D
2022 Basic E
2022 Basic F
2022 Google Translate non-uniform unit-selection Female
Samples were accessed via the translation service. 2013 ibm

Watson unknown CS, DE, EN, ES, FR, IT, JA, KS, NL, PT, SV, ZH Birgit
2022 Dieter
2022 Erika
2022 CTTS non-uniform unit-selection DE, US, UK, JP, KR, IT, ES, FR Male
Courtesy of IBM. Database speaker is Gilles Karolyi. Sentence 3 sample is 8 kHz.
2002 Female

Other sample: 2004 - - - Infovox

330/Infovox Desktop diphone-concatenation
Probably same as Babeltech Babil. Infovox 310 is apple version DE, UK, DK, NL,
FI, FR, IS, IT, NO, ES, SE Helga
8 kHz version
Other sample: 1996 Gerhard
8 kHz version
Other sample: 1996 210/230 formant-synthesis
successor of KTH's OVE, originally telia promotor DE, UK, DK, NL, FI, FR, IS,
IT, NO, ES, SE
1994 - Desktop PRO non-uniform unit-selection
same as Acapela HQ TTS
- - - Innoetics

non-uniform unit-selection
Development system from unsupervised audiobook extraction DE, US, UK, GR, BG
Christian
Courtesy of Innoetics 2015 Claudia
Courtesy of Innoetics 2015 Jessi
Courtesy of Innoetics 2015 Kalrsson
Courtesy of Innoetics 2015 Ivona

Owned by Amazon Ivona TTS non-uniform unit-selection
Licensed by Lumenvox. DE, US, UK, ES, RO, PL, MX Hans
2011 Marlene
2011 Lernout & Hauspie

Acquired by Scansoft in 2001n after bankruptcy TTS3000 diphone DE, US, UK, NL,
FR, RU, ES, MX, BR, CN, KR Stefan
1996 - Anna

Other sample: 1996 - - - Loquendo

Acquired by Nuance in 2011 Loquendo TTS non-uniform unit-selection
Formerly called Actor DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SE
Katrin
Courtesy of Loquendo. 2003 Stefan
Courtesy of Loquendo. 2003 Ulrike
2001 Meridian

Orpheus formant
Formerly from Dolphin Oceanic Ltd. Specialized on fast speech as used by blind
customers. DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN, MD, CR, DN, NL, FI, GR,
HU, LT, MY, NO, PL, RO, MX, SE Orpheus
2009 Microsoft

Microsoft Azure TTS services deep neural nets DNN ES, DK, DE, AU, CA, GB, IN,
US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN Amala
2022 Bernd
2022 Christoph
2022 Conrad
2022 Elke
2022 Gisela
2022 Kasper
2022 Killian
2022 Klarissa
2022 Klaus
2022 Louisa
2022 Maja
2022 Ralf
2022 Tanja
2022 Katja (Neural)
2020 Microsoft Mobile Voices non-uniform unit-selection ES, DK, DE, AU, CA, GB,
IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN Katja
2014 Stefan
2014 Microsoft Speech Platform - Runtime Languages (Version 11) non-uniform
unit-selection ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO,
NL, PL, BR, PT, RU, SE, HK, TW, CN Hedda
2012 Neospeech

A Hoya company. As is ReadSpeaker. non-uniform unit-selection DE, US, UK, MX,
TW, TH, KR, IT, CN, CH, JP, CT, BR, PT, FR Lena
2018 Tim
2018 Nuance

Formerly Scansoft (originating from Kurzweil and Xerox), acquired Europeean
pioneers Lernout & Hauspie in 2001, took the name of a smaller company named
Nuance which they acquired in 2005 Vocalizer DNN Artificial neural nets US
Nuance Website Sample

Other sample: 2018 - - - Vocalizer non-uniform unit-selection
Formerly called RealSpeak (Vocalizer was the name of the original Nuance
product), originally from Lernout & Hauspie), converged with RVoice (formerly
Rhetorical) . First commercial German unit-selection TTS DE, NL, PT, CA, CN, ES,
DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ,
FI, GR, IN, HU, TH, TR, ZA, RO Victor
2016 Anna
11 kHz, courtesy of Nuance 2010 Yannick
11 kHz, courtesy of Nuance 2006 Yannick 2
Yannick embedded version recorded from a cell phone 2009 Monika
and Beate (?) - same as RVoice F026 2005 Steffi
8 kHz 2004 Steffi 2
Newer version with enhanced voicequality and better pronunciation. 2005 Vera
8 kHz 1999 Nuance (until 2005)

Acquired by Scansoft in 2005 Vocalizer 4.05 non-uniform unit-selection DE, US,
UK, AU, CA(FR), MX*, BR Anna Weber
2004 - Vocalizer 1.0 non-uniform unit-selection
licensed Fonix engine DE, US, UK, NL, FR, IT, NO, ES, SE
2001 - ReadSpeaker

A Hoya company. As is NeoSpeech. Formerly called rSpeak non-uniform
unit-selection
using deep neural artificial networks DE,GB,US,AU,ES,FR,NL,SE Max
courtesy of ReadSpeaker 2018 Rhetorical Systems

Was headquartered in Edinburgh, Scotland. Acquired by Scansoft / Nuance in 2004
RVoice non-uniform unit-selection DE, UK, US, GR, ES F026
2004 M027
2004 F018
Speechworks

Acquired by Scansoft / Nuance in 2003 Speechify non-uniform unit-selection DE,
US, UK, AU, JP, MX*, FR, BR, CA(FR) Tessa
2002 Svox

Originally a spin-off from ETH Zurich. Acquired by Nuance in 2011 Svox Corporate
non-uniform unit-selection DE, FR, IT, US, ES Petra
2005 Markus
2005 Marlene

Other sample: 2003 - - - diphone DE, FR, IT, US, ES Nicole
2000 - thorstenvoice

VITS deep learning model: VITS (Conditional Variational Autoencoder with
Adversarial Learning for End-to-End Text-to-Speech) DE Thorsten
2023 Tacotron 2 - DDC deep learning model: Double Decoder Consistency model
architecture DE Thorsten
2023 tom weber software

Fahrgastansagen TTS non-uniform unit-selection DE Andreas
Samples courtesy of tom weber software 2015 Marianne
Samples courtesy of tom weber software 2015 VoiceINTERConnect

diphone
Commercial version of the Dress Synthesizer (University of Dresden). female
voice
2000 male voice
2000 Votrax

formant
Early hardware Formant synthesizer. Samples taken from an Audiodata Braille
reader. DE
1974 Voxygen

Spin-off from French Orange Labs. Hybrid non-uniform unit-selection / HMM
synthesis DE, FR, EN, ES, IT, AR Sylvia
courtesy of Voxygen 2014 Matthias
courtesy of Voxygen 2014


UNIVERSITIES / RESEARCH

Institution System Remark Year (approx.) / remark s1 s2 s3 IKP Bonn

BOSS
non-uniform unit-selection 2001

Hadifix
mixed inventory concatenation
HADIFIX = HAlbsilben, DIphone und suFIXe
DE 1995

- University of Budapest

Multivox 5 (Profivox)
diphone synthesis 2004
male speaker 1
- 2004
male speaker 2
- Multivox 3
formant synthesis
DE, HU, FI, NL, ES, PT, SA, Esperanto 1994


Other sample: - - - DFKI

Mary
non-uniform unit-selection
Mary=modular architecture for speech synthesis, open source. Great tool also to
teach about speech synthesis because the output and input of different
poicessing modules can be viewed as text.
DE, EN , Tibetian 2011
Pavoque corpus
2007
Bits 1
for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit
Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6,
Bonn, Germany. 2007
Bits 2
2007
Bits 3
2007
Bits 4
Mary/Mbrola
diphone
DE, EN 2000

Technical university of Dresden

DRESS
diphone synthesis 1996

Voice 1
concatenative formant-synthesizer 1993


Other sample: - - - TUSY
hardware formant-synthesizer 1987


Other sample: - - - ROSY
hardware formant-synthesizer
Robotron Synthesizer 1977


Other sample: - - - Syni 2
punchcard controlled formant-synthesizer
Robotron Synthesizer 1975


Other sample: - - - Syni 1
punchcard controlled formant-synthesizer
Robotron Synthesizer 1972


Other sample: - - - Michael Pucher with Austrian academy of sciences

hts-engine-world
HMM-based vocoder synthesis, for details see the article M. Pucher, D. Schabus,
J. Yamagishi, F. Neubarth, V. Strom: Modeling and interpolation of Austrian
German and Viennese dialect in HMM-based speech synthesis. Speech Communication,
Volume 52, Issue 2, February 2010, Pages 164-179.
Specialized on Austrian dialects/sociolects. based on open-source software:
https://github.com/mipuc/hts-engine-world 2020
LEO
Austrian German male - 2020
HPO
Viennese dialect male - 2020
JOE
Viennese youth female - 2020
KEP
Austrian German male, adaptive voice - 2020
MPU
Austrian German male, adaptive voice - Jonathan Duddington

eSpeak
formant-synthesis
based on the 1995 unix "speak"-program. Open-source 2006

ETH Zürich

Svox
diphone-concatenation
Predecessor of the commercial version later acquired by Nuance. 1998

Gerhard Mercator University of Duisburg


formant-synthesis 1996

- KTH Stockholm

Infovox
formant synthesis
Developed by Rolf Carlson, Bjorn Granströ;m and Sheri Hunnicut 1992

- - Ove III
Hardware formant synthesis
Orator Verbis Electris (OVE) . Developed by Gunnar Fant 1967


Other sample: - - - University of Mons

Mbrola
diphone-synthesis
Mbrola: Multi-band Resynthesis Overlap and Add. The NLP (text phonemisation)
component is Txt2Pho, the Hadifix NLP in combination with Mbrola-Synthesis .
Available for free for noncommercial use. MBROLA-TTS is avalable for about 34
different languages. 1998
de8
Markus Binsteiner's work an a Bavarian dialect
Other sample: - - - 2000
de7
(by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three
voice qualities (for emotional speech simulation). 2000
de6
(by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice
qualities (for emotional speech simulation). 2000
de5
by Fred Englert (ATIP), female, 22 kHz 2000
de4
By IMS Stuttgart, male, 16 kHz, includes english and french diphones 2000
de3
by ATIP, female, first 22005 kHz voice 1997
de2
By ATIP, male, 16 kHz 1996
de1
By ATIP, female, 16 kHz ÖFAI (Austrian Research Institute for Artificial
Intelligence)

VieCtoS
demisyllable-LPC-concatenation
Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my
limited knowledge of Tobi-Labels. 1998

- - OGI, Oregon Graduate Institute,


LPC-diphone concatenation
Developed at the OGI, Center for Spoken Language Understanding during a workshop
in 1998. TTS-Framework is Festival 1998

- Ruhr Univerität Bochum

SyRUB, Version 4.1.1 1995

Simple4All

Tundra corpus
non-uniform unit-selection
EU FP7 Project "Simple4All" Tundra corpus, system features unsupervised
learning. 2013

Espnet

Hokuspokus model
ANN: Tacotron2
Thanks to kan-bayashi
en, jp, de 2022
Hokuspokus
Hochschule Hof, Institut für Informationssysteme

VITS
deep learning Vits (VITS: Conditional Variational Autoencoder with Adversarial
Learning for End-to-End Text-to-Speech) model
de 2023
Friedrich
2023
Eva
2023
Bernd
Tacotron2
deep learning Tacotron 2 model
de 2023
Hokuspokus


with the following systems it wasn't possible to synthesize own sentences:

name/link description year (approx.) mpeg3 AEG Telefunken

SVS (SPRAUS Voll Synthese)

unknown 1975

Karlchen

unknown concatenation ("Parcor-Synthetisator")
Deutsche Bahn Auskunftssystem 1978

ATR



non-uniform unit selection 1997
male
1997
female
Bose

unkown

unkown
recorded from a bose mini soundlink II bluetooth speaker february 2018 2018

Univ. of Dresden, Peter Birkholz

Vocal Tract Lab

Articulatory synthesis

Handtweaked articulatory movements transformed into a mathematical model to
generate soundwaves - ELIS Lab

Eurovocs

diphone-synthesis
Technology from Lernout & Hauspie 1998

1996

First Byte



product-name:Monologue, ProVoice. waveform-concatenation synthesis (? 1998

HHI: Heinrich Hertz Institut



technology unknown 1978

Keller & Trauth.

SpeakEaZy

waveform-concatenation synthesis 1998

SlowSoft

SlangTTS

Non-uniform unit-selection synthesis 2020

Wolfgang_von_Kempelen's Speaking Machine



Hardware manual sound generator ("papa", "mama") 1769

University of Köln

Institut für Phonetik

articulatory-synthesis (actually not a TTS-system) 1996

Karl Küpfmüller / Bernhard Cramer



Hardware phoneme concatenation 1955

University of Lausanne (LAIP)



TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes
a model to reduce/elaborate articulation according to speech-rate. 1998

Mila (Machine learning laboratory at the University of Montrea)

Char2Wav

Deep neural artificial networks from University of Montreal: An end-to-end model
for speech synthesis learned with Deeplearning4J. Char2Wav has two components: a
reader and a neural vocoder. The reader is an encoder-decoder model with
attention. The encoder is a bidirectional recurrent neural network that accepts
text or phonemes as inputs, while the decoder is a recurrent neural network
(RNN) with attention that produces vocoder acoustic features. For the German
samples, the Pavoque database was used for training. 2017

Philips/IPO Eindhoven

Spengi

diphone-synthesis 1997

Unknown Russian TTS



unknown / formant? 1970

H.W. Strube, University of Göttingen



Articulatory synthesis. 1977

Texas Instruments Language Translator



LPC coded word-concatenation 1980
Male Voice
University of West Bohemia in Pilsen

ARTIC (ARtificial Talker In Czech)

concatenative synthesizer
Commercial version available by speechtech by the name of ERIS. 2002






-

--------------------------------------------------------------------------------


SERVICE PRODUCTS

The following table lists some products to enhance text-to-speech quality.

company product description date sample ReadSpeaker, now commercialize their own
engine under the name rSpeak, both a Hoya company. SagEs / SayIt Serverbased
website reader. Based on Acapela products. Sample reads a newspaper article
(Tagesspiegel). Note pronunciation of the word "playstation". 7/11/07 ETeX -
Dictionaries. 1/7/05 Interlinx, aquired by Speech Concept emphasis /
SpeechOptimizer Tuning tool for pronounciation and prosody modeling. 1/7/05

--------------------------------------------------------------------------------


FURTHER EXAMPLES

Speechsynthesis examples, that did not fit otherwise.

Description Example Ultrafast speechsynthesis as used by blind, with 14
syllables per second, based on formant synthesis Eloquence realspeak British
English, 31/5/05, "Flight LH312 from Frankfurt to Berlin." TTS of the Fiat "Blue
& Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance. Apple
Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster
compact version of Voice Yannick von Nuance , ,

--------------------------------------------------------------------------------


LICENSED SYSTEMS

the following engines are based on systems with a different name:

   
 * SpeaKing Synthesis uses SVOX
 * LinguaTec VoiceReader uses Nuance Vocalizer voices
 * Lumenvox uses Ivona
 * POSSY from the ETH Zürich is a multilingual extension of SVOX
 * Infovox Desktop Version 2.0 PRO same as Babeltech's Brightspeech
 * VoicePro from WinDi see Babeltech/Mbrola.
 * IBM's Viavoice Outloud see Eloquent.
 * Digalo see Acapela Elan Tempo
 * Voice RSS uses Microsoft Hedda


MISSING EXAMPLES

For the following systems I didn't yet get samples:

 * ALLVOC, predecessor of Elan from France Telecom based on PSOLA
 * Papageno, from Siemens
 * Tubsy from Technical University Berlin
 * SyRUB from the University of Bochum.
   

--------------------------------------------------------------------------------


UNKNOWN EXAMPLES

For the following systems I have no information about the supplier:

--------------------------------------------------------------------------------




CATEGORIZATION OF TEXT-TO-SPEECH SYSTEMS

Systems are usually either system- or signal modeling, primarily rule-based or
data-based and can be distinguished by the type of the basic units and the way
they are coded.




--------------------------------------------------------------------------------


CREDITS:

The following persons delivered information and/or samples:

 * Ulf Beckmann
 * Patrick Chabane
 * Bernhard Frötschl
 * Robert Kachel
 * Adrian Kurz
 * Michael Lang
 * Selinay Pachale
 * Eric Röder
 * Ali Savas
 * Stefan Seide
 * Bernhard Zeller

--------------------------------------------------------------------------------


CHANGELOG

   
 * 2023/03/15: added iisys samples
 * 2023/03/14: added Thorsten-voice samples
 * 2022/08/18: added Espnet2 samples
 * 2022/03/23: added Microsoft, IBM and Google samples
 * 2020/12/14: added Microsoft Katja DNN samples
 * 2020/7/20: added Pucher samples
 * 2020/5/20: added SlowSoft
 * 2018/10/9: added Microsoft Mobile Voices
 * 2018/9/5: added Google's wavenet samples
 * 2018/4/24: added Google's basic samples
 * 2018/4/24: added AEG Karlchen sample
 * 2018/4/24: added AEG SPRAUS sample
 * 2018/2/16: added Nuance DNN sample
 * 2018/2/15: complete re-write in XML/XSLT, added Bose sample
 * 2018/1/3: added new ReadSpeaker samples and NeoSpeech, both Hoya companies.
   Removed German version of this page. Too much work
 * 2017/12/13: added Author's custom voice
 * 2017/05/12: added Char2Wav
 * 2017/03/1: added acapela Claudia Smile sample
 * 2017/02/15: added rSpeak Max samples
 * 2016/10/28: added Nuance vocalizer Victor example phrase samples
 * 2016/10/13: added Nuance vocalizer Victor sample
 * 2016/05/09: linked second Firstbyte Provoice sample
 * 2016/04/20: updated Aristech Alex samples
 * 2016/1/4: added Votrax full samples
 * 2015/12/4: removed VoiceRSS because they used Microsoft TTS
 * 2015/12/4: added Vocalizer (Realspek) Steffi newer version
 * 2015/9/17: added Innoetics
 * 2015/5/18: added Acapela voice Claudia
 * 2015/5/9: added OnScreenVoices samples
 * 2014/12/18: added H.W. Strube sample
 * 2014/12/17: added Karlchen
 * 2014/12/17: added Votrax, Von Kempelen, Küpfmüller, HHI, AEG Telefunken, OVE
   III and unknown Russian TTS
 * 2014/11/27: added Voxygen
 * 2014/9/25: removed Lumenvox
 * 2014/1/29: added Acapela child voices Lea and Jonas
 * 2014/1/29: removed broken links
 * 2013/10/29: added Google
 * 2013/09/18: added Simple4all
 * 2013/09/5: added Lumenvox
 * 2013/08/15: changed SpeechConcept to Aristech and updated samples
 * 2013/01/28: added VoiceRSS samples
 * 2013/01/28: introduced Amazon
 * 2012/11/28: added Microsoft samples
 * 2012/01/05: shifted identified samples to other section
 * 2012/01/05: removed Apple because Nuance TTS is used
 * 2012/01/05: added Acapela voice Andreas samples
 * 2012/01/03: added SyRUB samples
 * 2012/01/03: added SpeechConcept Leopold voice samples
 * 2011/09/08: added unknown section.
 * 2011/09/05: added SpeechConcept corporate voice samples. Noted aquisition of
   Svox and Loquendo by Nuance. Removed product comparison table (too much work
   to keep up-to-date)
 * 2011/02/28: added Acapela HQTTS Andreas sample.
 * 2011/02/22: added Mary Pavoque samples.
 * 2011/02/21: added Ivona samples.
 * 2011/02/16: added iPhone samples.
 * 2010/10/8: added Vocalizer Anna samples.
 * 2010/01/06: added Vocalizer Yannick embedded samples
 * 2009/09/17: added BrightSpeech Julia samples
 * 2009/09/17: added SpeechConcept samples
 * 2009/09/17: added orpheus from meridian
 * 2009/01/15: added product comparison table
 * 2008/09/23: added berkom speech sample
 * 2008/09/15: added ultra fast speech sample
 * 2008/04/14: added Acapela greeting bunny sample
 * 2007/11/07: added ReadSpeaker sample
 * 2007/10/15: added e-speak and vocal tract lab
 * 2007/09/28: added Mary BITS samples
 * 2007/07/14: added Texas Instruments Language Translator
 * 2006/11/29: added new voice (Klaus) for BrightSpeech
 * 2006/8/21: added updated Loquendo samples
 * 2006/6/19: added Vocalizer Yannick samples.
 * 2005/11/3: added vocalizer sample.
 * 2005/10/11: added second LAIP-TTS sample.
 * 2005/08/29: added SAMT sample.
 * 2005/07/01: added ETeX and Interlinx service descriptions.
 * 2005/06/23: added SVOX unit-selection samples, courtesy of SVOX.
 * 2005/03/31: added realspeak fun sample
 * 2005/02/22: added Logox accent samples
 * 2005/01/04: added MBROLA de-X samples
 * 2004/12/21: added Scansoft Steffi samples
 * October 29th 2004: added Chatr female sample
 * October 25th 2004: added Binsteiner sample
 * September 23rd 2004: added univ. stuttgart unit-selection sample
 * August 11th 2004: added more s3 samples
 * August 10th 2004: added Boss s2 sample
 * August 9th 2004: added rvoice F026 samples
 * July 7th 2004: added IBM ctts female sample
 * July 4th 2004: added artic synthesizer
 * June 24th 2004: added sentence 3 for selected commercial engines
 * June 22th 2004: added multivox 5 and old dresden formant-synthesizers
 * June 11th 2004: added voiceINTERConnect
 * June 10th 2004: added languages for commercial engines
 * June 10th 2004: added erkan and fifi voices for atip's proser

--------------------------------------------------------------------------------

Speechsynthesis-demos with simulated emotion