ttssamples.syntheticspeech.de
Open in
urlscan Pro
2a00:1169:103:f5e0::
Public Scan
Submitted URL: http://ttssamples.syntheticspeech.de/
Effective URL: https://ttssamples.syntheticspeech.de/
Submission: On June 27 via api from US — Scanned from DE
Effective URL: https://ttssamples.syntheticspeech.de/
Submission: On June 27 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
site navigation -------------------------------------------------------------------------------- GERMAN TEXT-TO-SPEECH last update: 15th March 2023 CONTENTS: 1. Foreword 2. Commercial systems 3. Universities/research 4. Other systems 5. Service systems 6. Further samples 7. Licensed products 8. Missing examples 9. Unknown examples 10. TTS classification chart 11. Credits 12. Change-log -------------------------------------------------------------------------------- REMARKS * This collection of German TTS samples is maintained by Felix Burkhardt * There's also a page on Speechsynthesis demos with simulated emotion * I appreciate hints about missing systems! ( felixbur@gmx.de ) * Generally the demonstrations don't show the up-to-date quality of the systems! * The information provided is to my best knowledge, but of course I can't guarantee for the correctness! * Some difference in quality is due to different sample-rates; most demos are 16 kHz but some are 8 kHz or 22. * Don't rely on the year specifications, I'm very unsure about most of them. Many of them simply denote the year when I took the samples (although they're meant to stand for the year when the voice / engine was released). * The remark "courtesy of company X" means that I got the samples specially for this page from the respective company. Possibly they made special adjustments wrt. pronounciation and prosody. TERMINOLOGY I added a chart to facilitate the understanding of the concepts used for classification. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned. TTS consists always of two components, which I call Dutoit's introduction): * NLP (Natural Language Processing): conversion of orthographical text into phoneme-alphabet and prosody description. * DSP (Digital Speech Processing): speech engine: synthesis of speechsignal from ouput of NLP-component. The engines that synthesize the speech (DSP-component) are based mainly on five main technologies: * DNN Synthesis: Quite the newest addition to speech synthesis algorithms are artificial neural networks (or deep neural nets, meaning the number of layers is higher than in traditional artificial neural network architecture, say: five. TTS with neural nets has been done since many decades but to my knowledge not for German). They replace the HMM approach to predict the best acoustic parameters for a given sequence of symbols representing text. * HMM Synthesis: Synthesis based on Hidden Markov Models, a statistical approach to model the transition probabilites of the acoustic parameters based on the speech to be generated. The approaches are trained on a relatively large data corpus, but have a small footprint for synthesis because they don't operate on the wavedata directly but on some parameterized representation (e.g. LPC). However this is also the reason they tend to produce artifacts. Sample: Simple4All * Non-uniform unit-selection: Best fitting chunks of speech from large databases get concatenated, minimizing a double cost-function: best fit to neighbor unit and best fit to target prosody. Sounds most natural (similar to original speaker), but unflexible and large footprint. Sample: RealSpeak * Diphone-synthesis: Speech concatenated from diphone-units (two-phone combinations), prosody-fitting done by signal-manipulation (depends on unit-coding). relatively small footprint but not very natural. Sample: Bell-Labs synthesis * Formant-synthesis: Speech synthesized by physical models (formants are resonance frequencies in vocal-tract). Very flexible and smallest footprint, but very unnatural. Sample: Eloquence -------------------------------------------------------------------------------- The test sentences were: sentence 1: An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. Dabei war eigentlich immer sehr schönes Wetter gewesen. As I found this sentence a bit too simple, I thought up another test sentence which contains a collection of known problems for the NLP module: (in some demos this sentence is truncuated due to provider's restriction on character number) sentence 2: Dr. A. Smithe von der NATO (und nicht vom CIA) versorgt z.B. - meines Wissens nach - die Heroin seit dem 15.3.00 tgl. mit 13,84 Gramm Heroin zu 1,04 DM das Gramm. Speaking now 6 years after thinking up those sentences, more pressing problems for German speech synthesis used in services like email-reading arise from the pronounciation of english terms, e.g. the following sentence would not be pronounced correctly by most systems without tuning: sentence 3: Die Manpowerdiskussion wird gecancelt, du kannst das File vom Server downloaden. COMMERCIAL company/link description/engine name technology languages voice name year (approx.) s1 s2 s3 Acapela Acapela was formed in December 2003 from a combination of three European companies specializing in vocal technologies, Babel Technologies (Belgium), Infovox (Sweden) and Elan Speech (France). Acapela HQ TTS non-uniform unit-selection DE, FR, NL, ES, SE, US, SA, CY, DN, FI, CA, GR, IE, NO, PL, PT, BR, RU, TR Claudia 2015 Claudia (Smile) 2017 Lea (Child) 2013 Jonas (child) 2013 Andreas 2011 Julia 2009 Klaus 2006 Sarah 2003 Custom Voice non-uniform unit-selection, ANN DE Felix DNN (page author): Artificial neural net model adapted with 15 minutes data 2017 Felix (page author): Non-uniform unit-selection from 2 hours data 2017 Greeting Bunny non-uniform unit-selection DE, US, FR, IT, ES, NL, SE, NO, DK, BE Bunny 2008 Aculab Aculab diphone diphone-concatenation with LPC coded units. LPC (linear predictive coding), originally a compression algorithm, useful for synthesis because based on a source/filter model of speech. DE, UK, US, FR, BR, IT, ES Julia 1998 - Aristech Formerly Speechconcept Cerevoice non-uniform unit-selection Developments from Aristech, CereProc and University of Edinburgh DE, EN, FR, IT, ES, US, NL, JP Sophie, adult Corporate Voice, courtesy of Aristech 2011 Leopold, Austrian adult courtesy of Aristech 2013 Alex, adult courtesy of Aristech 2016 Gudrun, adult courtesy of Aristech 2013 Nick, youth courtesy of Aristech 2011 Saskia, youth courtesy of Aristech 2011 Atip Proser diphone NLP-component and voices from Atip, Mbrola Engine (diphone-concatenation) from Babeltech DE, US Carla 2000 Erkan Turkish accent 2004 Fifi French accent 2004 Steffen 2000 Eva 2000 AT&T Natural Voices non-uniform unit-selection DE, IT, US, UK, FR, MX* Klara 2001 Reiner 2002 Babeltech Brightspeech non-uniform unit-selection same as Acapela HQ TTS Ingrid 2002 - Babil diphone diphone-concatenation based on commercial Mbrola-engine. MBROLA (Multi Band Resynthesis Overlap and Add), similar to PSOLA but the database is treated beforehand to adapt pitch, amplitude and spectral features. DE, US, UK, ES, FR, NL, BE, BR, PT, IT, SE, NO, DK, FI, IS, TR, CZ, SA Eva 2000 Greta 2000 Steffen 1997 Helga Same as Infovox 330 1998 - - - Gerhard Same as Infovox 330 1998 - - - Bell Labs diphone LPC-diphone concatenation DE, FR, ES, US, UK, IT, RU, RO, CN - Centigram Acquired by Lernout & Hauspie, later Nuance TruVoice formant DE, US, MX*, FR, IT 1996 - Cepstral Cepstral TTS non-uniform unit-selection Associated wiith Alan Black, one of the pioneers of non-uniform unit-selection and lead scientist of Festival, an open source text-to-speech framework developed at Univ. of Edinburgh and the CMU. DE, UK, US, ES, FR, EG, TH, AF Kathrin 2003 Matthias 2003 Deutsche Telekom Berkom TTS formant Research system by former rd department of German Telekom. Hybrid approach combining formant synthesis for voiced phonemes and concatenating with waveform coded units for unvoiced parts. DE Felix 1998 SAMT hardware-based formant synthesis (Sprach-Ausgabesystem in Multiplex-Technik): hardware-based formant synthesis of former Forschungsinstitut der Deutschen Bundespost. DE Other sample: 1987 - - - Digital Equipment Corporation DecTalk formant First commercial text-to-speech synthesizer. Rule based formant-synthesis - the legendary formant synthesizer, based on Klatt's MITTalk) DE, US, UK, ES, MX*, FR 1982 - Elan SaySo non-uniform unit-selection DE, US, FR, IT, ES Lea 2003 Tempo diphone Pitch Synchronous Overlap and Add (PSOLA): famous algorithm to change pitch and time of speech that made diphone-synthesis a great success for many years. DE, US, UK, FR, ES, IT, BR, PT, RU, PL Thomas 1998 Dagmar 1996 Eloquent Technologies Aquired by Scansoft. ETI Eloquence rule-based formant-synthesis (Klatt-style). Later sold by Speechworks, also licensed to IBM (ViaVoice Outloud) DE, UK, US, ES, MX, FR, CA(FR), IT, FI, BR, CN, JP, KR 1998 - GData Logox microsegment synthesis Microsegmentsynthesis (concatenating subphonetic units), not developed any more. Originally based on research from Univ. of Saarbrücken. DE, US, UK Default voice 2000 - Bill 1998 Bill (Swabian accent) 2002 Bill (Hessian accent) 2002 Bill (Saxon accent) 2002 Bill (French accent) 2002 google wavenet wavenet: artificial neural nets end-to-end AF, AR, BG, BN, CA, CS, DA, DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV, ML, MS, NL, NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH Wavenet A (female) 2018 Wavenet B (male) 2018 Wavenet C (female) 2018 Wavenet D (male) 2018 Wavenet E 2022 Wavenet F 2022 Google Basic so-called basic (non-uniform unit selection?) AF, AR, BG, BN, CA, CS, DA, DE, EL, EN, ES, FI, FIL, FR, GU, HI, HU, IN, IS, IT, JA, KN, KS, LV, ML, MS, NL, NO, PL, PT, RO, RU, SL, SR, SV, TA, TE, TH, TL, TR, UK, VI, ZH Standard A (female) 2018 Standard B (male) 2018 Basic C 2022 Basic D 2022 Basic E 2022 Basic F 2022 Google Translate non-uniform unit-selection Female Samples were accessed via the translation service. 2013 ibm Watson unknown CS, DE, EN, ES, FR, IT, JA, KS, NL, PT, SV, ZH Birgit 2022 Dieter 2022 Erika 2022 CTTS non-uniform unit-selection DE, US, UK, JP, KR, IT, ES, FR Male Courtesy of IBM. Database speaker is Gilles Karolyi. Sentence 3 sample is 8 kHz. 2002 Female Other sample: 2004 - - - Infovox 330/Infovox Desktop diphone-concatenation Probably same as Babeltech Babil. Infovox 310 is apple version DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE Helga 8 kHz version Other sample: 1996 Gerhard 8 kHz version Other sample: 1996 210/230 formant-synthesis successor of KTH's OVE, originally telia promotor DE, UK, DK, NL, FI, FR, IS, IT, NO, ES, SE 1994 - Desktop PRO non-uniform unit-selection same as Acapela HQ TTS - - - Innoetics non-uniform unit-selection Development system from unsupervised audiobook extraction DE, US, UK, GR, BG Christian Courtesy of Innoetics 2015 Claudia Courtesy of Innoetics 2015 Jessi Courtesy of Innoetics 2015 Kalrsson Courtesy of Innoetics 2015 Ivona Owned by Amazon Ivona TTS non-uniform unit-selection Licensed by Lumenvox. DE, US, UK, ES, RO, PL, MX Hans 2011 Marlene 2011 Lernout & Hauspie Acquired by Scansoft in 2001n after bankruptcy TTS3000 diphone DE, US, UK, NL, FR, RU, ES, MX, BR, CN, KR Stefan 1996 - Anna Other sample: 1996 - - - Loquendo Acquired by Nuance in 2011 Loquendo TTS non-uniform unit-selection Formerly called Actor DE, IT, ES, FR, BR, PT, CN, UK, US, MX, GR, CL, AR, SE Katrin Courtesy of Loquendo. 2003 Stefan Courtesy of Loquendo. 2003 Ulrike 2001 Meridian Orpheus formant Formerly from Dolphin Oceanic Ltd. Specialized on fast speech as used by blind customers. DE, UK, US, FR, BR, PT, IT, ES, Welsh, CN, MD, CR, DN, NL, FI, GR, HU, LT, MY, NO, PL, RO, MX, SE Orpheus 2009 Microsoft Microsoft Azure TTS services deep neural nets DNN ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN Amala 2022 Bernd 2022 Christoph 2022 Conrad 2022 Elke 2022 Gisela 2022 Kasper 2022 Killian 2022 Klarissa 2022 Klaus 2022 Louisa 2022 Maja 2022 Ralf 2022 Tanja 2022 Katja (Neural) 2020 Microsoft Mobile Voices non-uniform unit-selection ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN Katja 2014 Stefan 2014 Microsoft Speech Platform - Runtime Languages (Version 11) non-uniform unit-selection ES, DK, DE, AU, CA, GB, IN, US, MX, FI, CA, FR, IT, JP, KR, NO, NL, PL, BR, PT, RU, SE, HK, TW, CN Hedda 2012 Neospeech A Hoya company. As is ReadSpeaker. non-uniform unit-selection DE, US, UK, MX, TW, TH, KR, IT, CN, CH, JP, CT, BR, PT, FR Lena 2018 Tim 2018 Nuance Formerly Scansoft (originating from Kurzweil and Xerox), acquired Europeean pioneers Lernout & Hauspie in 2001, took the name of a smaller company named Nuance which they acquired in 2005 Vocalizer DNN Artificial neural nets US Nuance Website Sample Other sample: 2018 - - - Vocalizer non-uniform unit-selection Formerly called RealSpeak (Vocalizer was the name of the original Nuance product), originally from Lernout & Hauspie), converged with RVoice (formerly Rhetorical) . First commercial German unit-selection TTS DE, NL, PT, CA, CN, ES, DK, PT, FR, IT, JP, KR, MX, NO, PL, RU, SE, US, UK, AU, SA, ID, Basque, BE, CZ, FI, GR, IN, HU, TH, TR, ZA, RO Victor 2016 Anna 11 kHz, courtesy of Nuance 2010 Yannick 11 kHz, courtesy of Nuance 2006 Yannick 2 Yannick embedded version recorded from a cell phone 2009 Monika and Beate (?) - same as RVoice F026 2005 Steffi 8 kHz 2004 Steffi 2 Newer version with enhanced voicequality and better pronunciation. 2005 Vera 8 kHz 1999 Nuance (until 2005) Acquired by Scansoft in 2005 Vocalizer 4.05 non-uniform unit-selection DE, US, UK, AU, CA(FR), MX*, BR Anna Weber 2004 - Vocalizer 1.0 non-uniform unit-selection licensed Fonix engine DE, US, UK, NL, FR, IT, NO, ES, SE 2001 - ReadSpeaker A Hoya company. As is NeoSpeech. Formerly called rSpeak non-uniform unit-selection using deep neural artificial networks DE,GB,US,AU,ES,FR,NL,SE Max courtesy of ReadSpeaker 2018 Rhetorical Systems Was headquartered in Edinburgh, Scotland. Acquired by Scansoft / Nuance in 2004 RVoice non-uniform unit-selection DE, UK, US, GR, ES F026 2004 M027 2004 F018 Speechworks Acquired by Scansoft / Nuance in 2003 Speechify non-uniform unit-selection DE, US, UK, AU, JP, MX*, FR, BR, CA(FR) Tessa 2002 Svox Originally a spin-off from ETH Zurich. Acquired by Nuance in 2011 Svox Corporate non-uniform unit-selection DE, FR, IT, US, ES Petra 2005 Markus 2005 Marlene Other sample: 2003 - - - diphone DE, FR, IT, US, ES Nicole 2000 - thorstenvoice VITS deep learning model: VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) DE Thorsten 2023 Tacotron 2 - DDC deep learning model: Double Decoder Consistency model architecture DE Thorsten 2023 tom weber software Fahrgastansagen TTS non-uniform unit-selection DE Andreas Samples courtesy of tom weber software 2015 Marianne Samples courtesy of tom weber software 2015 VoiceINTERConnect diphone Commercial version of the Dress Synthesizer (University of Dresden). female voice 2000 male voice 2000 Votrax formant Early hardware Formant synthesizer. Samples taken from an Audiodata Braille reader. DE 1974 Voxygen Spin-off from French Orange Labs. Hybrid non-uniform unit-selection / HMM synthesis DE, FR, EN, ES, IT, AR Sylvia courtesy of Voxygen 2014 Matthias courtesy of Voxygen 2014 UNIVERSITIES / RESEARCH Institution System Remark Year (approx.) / remark s1 s2 s3 IKP Bonn BOSS non-uniform unit-selection 2001 Hadifix mixed inventory concatenation HADIFIX = HAlbsilben, DIphone und suFIXe DE 1995 - University of Budapest Multivox 5 (Profivox) diphone synthesis 2004 male speaker 1 - 2004 male speaker 2 - Multivox 3 formant synthesis DE, HU, FI, NL, ES, PT, SA, Esperanto 1994 Other sample: - - - DFKI Mary non-uniform unit-selection Mary=modular architecture for speech synthesis, open source. Great tool also to teach about speech synthesis because the output and input of different poicessing modules can be viewed as text. DE, EN , Tibetian 2011 Pavoque corpus 2007 Bits 1 for details see Schröder, M. & Hunecke, A. (2007). Creating German Unit Selection Voices for the MARY TTS Platform from the BITS Corpora. Proc. SSW6, Bonn, Germany. 2007 Bits 2 2007 Bits 3 2007 Bits 4 Mary/Mbrola diphone DE, EN 2000 Technical university of Dresden DRESS diphone synthesis 1996 Voice 1 concatenative formant-synthesizer 1993 Other sample: - - - TUSY hardware formant-synthesizer 1987 Other sample: - - - ROSY hardware formant-synthesizer Robotron Synthesizer 1977 Other sample: - - - Syni 2 punchcard controlled formant-synthesizer Robotron Synthesizer 1975 Other sample: - - - Syni 1 punchcard controlled formant-synthesizer Robotron Synthesizer 1972 Other sample: - - - Michael Pucher with Austrian academy of sciences hts-engine-world HMM-based vocoder synthesis, for details see the article M. Pucher, D. Schabus, J. Yamagishi, F. Neubarth, V. Strom: Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis. Speech Communication, Volume 52, Issue 2, February 2010, Pages 164-179. Specialized on Austrian dialects/sociolects. based on open-source software: https://github.com/mipuc/hts-engine-world 2020 LEO Austrian German male - 2020 HPO Viennese dialect male - 2020 JOE Viennese youth female - 2020 KEP Austrian German male, adaptive voice - 2020 MPU Austrian German male, adaptive voice - Jonathan Duddington eSpeak formant-synthesis based on the 1995 unix "speak"-program. Open-source 2006 ETH Zürich Svox diphone-concatenation Predecessor of the commercial version later acquired by Nuance. 1998 Gerhard Mercator University of Duisburg formant-synthesis 1996 - KTH Stockholm Infovox formant synthesis Developed by Rolf Carlson, Bjorn Granströ;m and Sheri Hunnicut 1992 - - Ove III Hardware formant synthesis Orator Verbis Electris (OVE) . Developed by Gunnar Fant 1967 Other sample: - - - University of Mons Mbrola diphone-synthesis Mbrola: Multi-band Resynthesis Overlap and Add. The NLP (text phonemisation) component is Txt2Pho, the Hadifix NLP in combination with Mbrola-Synthesis . Available for free for noncommercial use. MBROLA-TTS is avalable for about 34 different languages. 1998 de8 Markus Binsteiner's work an a Bavarian dialect Other sample: - - - 2000 de7 (by Marc Schröder, DFKI/Uni Saarland, female, 22 kHz), all diphones in three voice qualities (for emotional speech simulation). 2000 de6 (by Marc Schröder, DFKI/Uni Saarland, male, 22 kHz), all diphones in three voice qualities (for emotional speech simulation). 2000 de5 by Fred Englert (ATIP), female, 22 kHz 2000 de4 By IMS Stuttgart, male, 16 kHz, includes english and french diphones 2000 de3 by ATIP, female, first 22005 kHz voice 1997 de2 By ATIP, male, 16 kHz 1996 de1 By ATIP, female, 16 kHz ÖFAI (Austrian Research Institute for Artificial Intelligence) VieCtoS demisyllable-LPC-concatenation Vienna Concept-to-Speech system. If the prosody sounds poor it's due to my limited knowledge of Tobi-Labels. 1998 - - OGI, Oregon Graduate Institute, LPC-diphone concatenation Developed at the OGI, Center for Spoken Language Understanding during a workshop in 1998. TTS-Framework is Festival 1998 - Ruhr Univerität Bochum SyRUB, Version 4.1.1 1995 Simple4All Tundra corpus non-uniform unit-selection EU FP7 Project "Simple4All" Tundra corpus, system features unsupervised learning. 2013 Espnet Hokuspokus model ANN: Tacotron2 Thanks to kan-bayashi en, jp, de 2022 Hokuspokus Hochschule Hof, Institut für Informationssysteme VITS deep learning Vits (VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) model de 2023 Friedrich 2023 Eva 2023 Bernd Tacotron2 deep learning Tacotron 2 model de 2023 Hokuspokus with the following systems it wasn't possible to synthesize own sentences: name/link description year (approx.) mpeg3 AEG Telefunken SVS (SPRAUS Voll Synthese) unknown 1975 Karlchen unknown concatenation ("Parcor-Synthetisator") Deutsche Bahn Auskunftssystem 1978 ATR non-uniform unit selection 1997 male 1997 female Bose unkown unkown recorded from a bose mini soundlink II bluetooth speaker february 2018 2018 Univ. of Dresden, Peter Birkholz Vocal Tract Lab Articulatory synthesis Handtweaked articulatory movements transformed into a mathematical model to generate soundwaves - ELIS Lab Eurovocs diphone-synthesis Technology from Lernout & Hauspie 1998 1996 First Byte product-name:Monologue, ProVoice. waveform-concatenation synthesis (? 1998 HHI: Heinrich Hertz Institut technology unknown 1978 Keller & Trauth. SpeakEaZy waveform-concatenation synthesis 1998 SlowSoft SlangTTS Non-uniform unit-selection synthesis 2020 Wolfgang_von_Kempelen's Speaking Machine Hardware manual sound generator ("papa", "mama") 1769 University of Köln Institut für Phonetik articulatory-synthesis (actually not a TTS-system) 1996 Karl Küpfmüller / Bernhard Cramer Hardware phoneme concatenation 1955 University of Lausanne (LAIP) TTS-system from the university of Lausanne (LAIP), uses MBROLA -engine. Includes a model to reduce/elaborate articulation according to speech-rate. 1998 Mila (Machine learning laboratory at the University of Montrea) Char2Wav Deep neural artificial networks from University of Montreal: An end-to-end model for speech synthesis learned with Deeplearning4J. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. For the German samples, the Pavoque database was used for training. 2017 Philips/IPO Eindhoven Spengi diphone-synthesis 1997 Unknown Russian TTS unknown / formant? 1970 H.W. Strube, University of Göttingen Articulatory synthesis. 1977 Texas Instruments Language Translator LPC coded word-concatenation 1980 Male Voice University of West Bohemia in Pilsen ARTIC (ARtificial Talker In Czech) concatenative synthesizer Commercial version available by speechtech by the name of ERIS. 2002 - -------------------------------------------------------------------------------- SERVICE PRODUCTS The following table lists some products to enhance text-to-speech quality. company product description date sample ReadSpeaker, now commercialize their own engine under the name rSpeak, both a Hoya company. SagEs / SayIt Serverbased website reader. Based on Acapela products. Sample reads a newspaper article (Tagesspiegel). Note pronunciation of the word "playstation". 7/11/07 ETeX - Dictionaries. 1/7/05 Interlinx, aquired by Speech Concept emphasis / SpeechOptimizer Tuning tool for pronounciation and prosody modeling. 1/7/05 -------------------------------------------------------------------------------- FURTHER EXAMPLES Speechsynthesis examples, that did not fit otherwise. Description Example Ultrafast speechsynthesis as used by blind, with 14 syllables per second, based on formant synthesis Eloquence realspeak British English, 31/5/05, "Flight LH312 from Frankfurt to Berlin." TTS of the Fiat "Blue & Me" Navigation Headunit with Microsoft CE. Voice Steffi of Nuance. Apple Iphone 2011, Recorded with PC Mikrofon from Apple iPhone 4.1, TTS is faster compact version of Voice Yannick von Nuance , , -------------------------------------------------------------------------------- LICENSED SYSTEMS the following engines are based on systems with a different name: * SpeaKing Synthesis uses SVOX * LinguaTec VoiceReader uses Nuance Vocalizer voices * Lumenvox uses Ivona * POSSY from the ETH Zürich is a multilingual extension of SVOX * Infovox Desktop Version 2.0 PRO same as Babeltech's Brightspeech * VoicePro from WinDi see Babeltech/Mbrola. * IBM's Viavoice Outloud see Eloquent. * Digalo see Acapela Elan Tempo * Voice RSS uses Microsoft Hedda MISSING EXAMPLES For the following systems I didn't yet get samples: * ALLVOC, predecessor of Elan from France Telecom based on PSOLA * Papageno, from Siemens * Tubsy from Technical University Berlin * SyRUB from the University of Bochum. -------------------------------------------------------------------------------- UNKNOWN EXAMPLES For the following systems I have no information about the supplier: -------------------------------------------------------------------------------- CATEGORIZATION OF TEXT-TO-SPEECH SYSTEMS Systems are usually either system- or signal modeling, primarily rule-based or data-based and can be distinguished by the type of the basic units and the way they are coded. -------------------------------------------------------------------------------- CREDITS: The following persons delivered information and/or samples: * Ulf Beckmann * Patrick Chabane * Bernhard Frötschl * Robert Kachel * Adrian Kurz * Michael Lang * Selinay Pachale * Eric Röder * Ali Savas * Stefan Seide * Bernhard Zeller -------------------------------------------------------------------------------- CHANGELOG * 2023/03/15: added iisys samples * 2023/03/14: added Thorsten-voice samples * 2022/08/18: added Espnet2 samples * 2022/03/23: added Microsoft, IBM and Google samples * 2020/12/14: added Microsoft Katja DNN samples * 2020/7/20: added Pucher samples * 2020/5/20: added SlowSoft * 2018/10/9: added Microsoft Mobile Voices * 2018/9/5: added Google's wavenet samples * 2018/4/24: added Google's basic samples * 2018/4/24: added AEG Karlchen sample * 2018/4/24: added AEG SPRAUS sample * 2018/2/16: added Nuance DNN sample * 2018/2/15: complete re-write in XML/XSLT, added Bose sample * 2018/1/3: added new ReadSpeaker samples and NeoSpeech, both Hoya companies. Removed German version of this page. Too much work * 2017/12/13: added Author's custom voice * 2017/05/12: added Char2Wav * 2017/03/1: added acapela Claudia Smile sample * 2017/02/15: added rSpeak Max samples * 2016/10/28: added Nuance vocalizer Victor example phrase samples * 2016/10/13: added Nuance vocalizer Victor sample * 2016/05/09: linked second Firstbyte Provoice sample * 2016/04/20: updated Aristech Alex samples * 2016/1/4: added Votrax full samples * 2015/12/4: removed VoiceRSS because they used Microsoft TTS * 2015/12/4: added Vocalizer (Realspek) Steffi newer version * 2015/9/17: added Innoetics * 2015/5/18: added Acapela voice Claudia * 2015/5/9: added OnScreenVoices samples * 2014/12/18: added H.W. Strube sample * 2014/12/17: added Karlchen * 2014/12/17: added Votrax, Von Kempelen, Küpfmüller, HHI, AEG Telefunken, OVE III and unknown Russian TTS * 2014/11/27: added Voxygen * 2014/9/25: removed Lumenvox * 2014/1/29: added Acapela child voices Lea and Jonas * 2014/1/29: removed broken links * 2013/10/29: added Google * 2013/09/18: added Simple4all * 2013/09/5: added Lumenvox * 2013/08/15: changed SpeechConcept to Aristech and updated samples * 2013/01/28: added VoiceRSS samples * 2013/01/28: introduced Amazon * 2012/11/28: added Microsoft samples * 2012/01/05: shifted identified samples to other section * 2012/01/05: removed Apple because Nuance TTS is used * 2012/01/05: added Acapela voice Andreas samples * 2012/01/03: added SyRUB samples * 2012/01/03: added SpeechConcept Leopold voice samples * 2011/09/08: added unknown section. * 2011/09/05: added SpeechConcept corporate voice samples. Noted aquisition of Svox and Loquendo by Nuance. Removed product comparison table (too much work to keep up-to-date) * 2011/02/28: added Acapela HQTTS Andreas sample. * 2011/02/22: added Mary Pavoque samples. * 2011/02/21: added Ivona samples. * 2011/02/16: added iPhone samples. * 2010/10/8: added Vocalizer Anna samples. * 2010/01/06: added Vocalizer Yannick embedded samples * 2009/09/17: added BrightSpeech Julia samples * 2009/09/17: added SpeechConcept samples * 2009/09/17: added orpheus from meridian * 2009/01/15: added product comparison table * 2008/09/23: added berkom speech sample * 2008/09/15: added ultra fast speech sample * 2008/04/14: added Acapela greeting bunny sample * 2007/11/07: added ReadSpeaker sample * 2007/10/15: added e-speak and vocal tract lab * 2007/09/28: added Mary BITS samples * 2007/07/14: added Texas Instruments Language Translator * 2006/11/29: added new voice (Klaus) for BrightSpeech * 2006/8/21: added updated Loquendo samples * 2006/6/19: added Vocalizer Yannick samples. * 2005/11/3: added vocalizer sample. * 2005/10/11: added second LAIP-TTS sample. * 2005/08/29: added SAMT sample. * 2005/07/01: added ETeX and Interlinx service descriptions. * 2005/06/23: added SVOX unit-selection samples, courtesy of SVOX. * 2005/03/31: added realspeak fun sample * 2005/02/22: added Logox accent samples * 2005/01/04: added MBROLA de-X samples * 2004/12/21: added Scansoft Steffi samples * October 29th 2004: added Chatr female sample * October 25th 2004: added Binsteiner sample * September 23rd 2004: added univ. stuttgart unit-selection sample * August 11th 2004: added more s3 samples * August 10th 2004: added Boss s2 sample * August 9th 2004: added rvoice F026 samples * July 7th 2004: added IBM ctts female sample * July 4th 2004: added artic synthesizer * June 24th 2004: added sentence 3 for selected commercial engines * June 22th 2004: added multivox 5 and old dresden formant-synthesizers * June 11th 2004: added voiceINTERConnect * June 10th 2004: added languages for commercial engines * June 10th 2004: added erkan and fifi voices for atip's proser -------------------------------------------------------------------------------- Speechsynthesis-demos with simulated emotion