Мы используем файлы cookie.
Продолжая использовать сайт, вы даете свое согласие на работу с этими файлами.

音声合成

アーティキュレートリー・シンセシス

Другие языки:

アーティキュレートリー・シンセシス

Подписчиков: 0, рейтинг: 0

調音音声合成: 合成音声と声道モデル

ドイツ語文 "Lea und Doreen mögen Bananen"

(日本語訳: リーとドリーンはバナナが好き) を子音+母音調音結合モデルを使って
自然発話文の基本周波数と音長から再現。

アーティキュレートリー・シンセシス (英: articulatory synthesis)、調音合成 (ちょうおんごうせい) あるいは 調音音声合成 とは、人間の声道のモデルとそこで行なわれる調音プロセス (articulation) に基づいて音声合成を行なうための計算手法である。声道の形状は通常、舌や顎、唇といった調音器官の位置変更と関連した数多くの調音方法で制御できる。声道の表現を介した空気の流れのデジタル・シミュレーションで、音声が生成される。

機械式語り手

「音声合成#歴史」も参照

機械式「語り手」(talking heads) の製作の試みには長い歴史がある。オーリヤックのジェルベール (–1003)、アルベルトゥス・マグヌス (1198–1280)、ロジャー・ベーコン (1214–1294) らは皆、喋る頭 (speaking heads) を作ったと言われている (Wheatstone 1837)。しかしながら、歴史的に確認された音声合成の始まりは訳注: クリスティアン・クラッツェンシュタイン (1723–1795) とヴォルフガング・フォン・ケンペレン (1734–1804)であり、ケンペレンは1791年に研究報告を出版した。(Dudley & Tarnoczy (1950)も参照)

電子式声道

最初の電子式アナログ声道は、Dunn (1950)やStevens, Kasowski & Fant (1953)、Fant (1960)のように静的なものだった。Rosen (1958)は動的な声道 (DAVO)を組み立て、後にDennis (1963)がコンピュータ制御を試みた。Dennis & et al. (1964))、比企 & et al. (1968))、Baxter & Strong (1969)らもアナログ声道ハードウェアについて説明している。

最初のコンピュータ・シミュレーションは、Kelly & Lochbaum (1962)が行なった; その後デジタルコンピュータによるシミュレーションを、例えば中田 & 光岡 (1965)、松井 (1968)、Mermelstein (1971))が行なった。本多, 井上 & 小川 (1968)はアナログコンピュータによるシミュレーションを行なった。

Haskinsと前田のモデル

研究室の実験で定期的に使用される最初のソフトウェアによる調音シンセサイザーは、1970年代半ばにHaskins Laboratoriesで Philip Rubin, Tom Baer, Paul Mermelstein により開発された。ASY (Articulatory Synthesis)として知られるこのシンセサイザーは、1960年代–1970年代にベル研究所で Paul Mermelstein, Cecil Coker, およびその同僚らによって開発された声道モデルに基づく音声生成の計算モデルだった。もう一つの頻繁に使用された著名なモデルは、前田眞治 (Shinji Maeda)による、舌の形状制御に因子ベースのアプローチ (factor-based approach) を使ったモデルである。

現代的なモデル

音声生成イメージング、調音制御モデリング、舌の生体力学モデリングの最近の進展は、調音合成が行われる方法に変化をもたらしている。一例として、Philip Rubin, Mark Tiede,Louis Goldstein が設計したHaskins CASYモデル (Configurable Articulatory Synthesis)では、声道の縦断面を実際の核磁気共鳴画像(MRI)データと一致させており、MRIデータを声道の3次元モデルの構築に使用している。フル3次元の調音合成モデルは Olov Engwallが説明している。幾何学的に基づいた3次元調音スピーチ・シンセサイザーはPeter Birkholzにより開発されている。(VocalTracLab参照) ArtiSynthプロジェクトは、ブリティッシュコロンビア大学のSidney Felsが率いており、人間の声道と上気道のための3次元生体力学モデリング・ツールキットを提供している。舌などの調音器官の生体力学モデリングは、Reiner Wilhelms-Tricarico,Yohan Payan と Jean-Michel Gerard, 党建武 (Jianwu Dang) と本多清志 (Kiyoshi Honda) など数多くの科学者によって開拓されている。

商用モデル

数少ない商用の調音スピーチ・シンセシス・システムの一つは、NeXTベースのシステムで、多数の独自研究が実施されていたカナダのカルガリー大学のスピンオフ企業 Trillium Sound Researchにより開発・販売された。 1980年代後半スティーブ・ジョブスが設立し、1997年Apple Computerと合併した NeXTの様々な転生が消滅した後、TrilliumのソフトウェアはGNU General Public Licenseで公開され、Gnuspeechとして継続している。 1994年に最初に発売されたこのシステムは、René Carréの"Distinctive Region Model" (DRM)で制御される、人間の口腔および鼻腔の導波路 (waveguide) モデルもしくは伝送路アナログ(transmission-line analog) を使った(訳注: Tube Resonance Model (TRM))、フル調音ベースのテキスト読み上げ変換を提供する。

脚注

参考文献

Baxter, Brent; Strong, William J. (1969), “WINDBAG—a vocal-tract analog speech synthesizer”, Journal of the Acoustical Society of America 45: 309(A), doi:10.1121/1.1971456
Birkholz, P.; Jackel, D.; Kröger, B.J. (2007), “Simulation of losses due to turbulence in the time-varying vocal system”, IEEE Transactions on Audio, Speech, and Language Processing 15: 1218–1225
Birkholz P, Jackel D, Kröger BJ (2006), “Construction and control of a three-dimensional vocal tract model”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006) (Toulouse, France): 873–876
Coker, C. H. (1968), “Speech synthesis with a parametric articulatory model”, Proc. Speech. Symp., Kyoto, Japan , paper A-4.
Coker, C. H. (1976). “A model for articulatory dynamics and control”. Proceedings of the IEEE 64 (4): 452–460. doi:10.1109/PROC.1976.10154.
Coker, C. H.; Fujimura, O. (1966). “Model for the specification of the vocal tract area function”. Journal of the Acoustical Society of America 40: 1271. doi:10.1121/1.2143456.
Dennis, Jack B. (1963), “Computer control of an analog vocal tract”, Journal of the Acoustical Society of America 35: 1115(A)
Dudley, Homer; Tarnoczy, Thomas H. (1950). “The speaking machine of Wolfgang von Kempelen”. Journal of the Acoustical Society of America 22 (2): 151–66. doi:10.1121/1.1906583.
Dunn, Hugh K. (1950). “Calculation of vowel resonances, and an electrical vocal tract”. Journal of the Acoustical Society of America 22 (6): 740–53. doi:10.1121/1.1906681.
Engwall, O. (2003), “Combining MRI, EMA & EPG measurements in a three-dimensional tongue model”, Speech Communication 41: 303-329, doi:10.1016/S0167-6393(02)00132-2
Fant, C. Gunnar M (1960), Acoustic theory of speech production, The Hague: Mouton
Fant, Gunnar (1970), Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, Mouton/Walter de Gruyter, ISBN 9789027916006
Gariel, M. (1879). “Machine parlante de M. Faber”. J. Physique Théorique et Appliquée 8: 274–5. doi:10.1051/jphystap:018790080027401.
Gerard, J.M.; Wilhelms-Tricarico, R.; Perrier, P.; Payan, Y. (2003). “A 3D dynamical biomechanical tongue model to study speech motor control”. Recent Research Developments in Biomechanics 1: 49–64.
Henke, W. L. (1966), “Dynamic Articulatory Model of Speech Production Using Computer Simulation”, Unpublished doctoral dissertation, MIT, Cambridge, MA.
本多, 高; 井上, 誠一; 小川, 康男 (1968), Kohasi, Y., ed., “A hybrid control system of a human vocal tract simulator”, Reports of the 6th International Congress on Acoustics (Tokyo, International Council of Scientific Unions.): 175–8
Kelly, John L.; Lochbaum, Carol (1962), “Speech synthesis”, Proceedings of the Speech Communications Seminar, paper F7 (Stockholm, Speech Transmission Laboratory, Royal Institute of Technology)

Kempelen, Wolfgang R. Von (1791), Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine, Wien: J. B. Degen
前田, 眞治 (1988), “Improved articulatory models”, Journal of the Acoustical Society of America 84 (Sup. 1): S146, doi:10.1121/1.2025845
前田, 眞治 (1990), Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model In W. J. Hardcastle & A. Marchal, ed., Speech Production and Speech Modelling, Dordrecht: Kluwer Academic, pp. 131–149
松井, 英一 (1968), Kohasi, Y., ed., “Computer-simulated vocal organs”, Reports of the 6th International Congress on Acoustics (Tokyo, International Council of Scientific Unions.): 151–4
Mermelstein, Paul. (1969), Walker, D. E., ed., “Computer simulation of articulatory activity in speech production”, Proceedings of the International Joint Conference on Artificial Intelligence, Washington, D.C., 1969 (New York: Gordon & Breach)
Mermelstein, P. (1973). “Articulatory model for the study of speech production”. Journal of the Acoustical Society of America 53 (4): 1070–1082. doi:10.1121/1.1913427. PMID 4697807.
中田, 和男; 光岡, 輝義 (1965). “Phonemic transformation and control aspects of synthesis of connected speech”. J. Radio Res. Labs. 12: 171–86.
Mrayati, M.; Carre, R; Guerin, B. (1988), “Distinctive regions and modes: a new theory of speech production”, Speech Communication 7 (3): 257–286, October 1988, doi:10.1016/0167-6393(88)90073-8
Mrayati, M.; Carré, R; Guérin, B. (1990), “Distinctive regions and modes: articulatory-acoustic-phonetic aspects: A reply to Boë and Perrier's comments”, Speech Communication 9 (3): 231–238, June 1990, doi:10.1016/0167-6393(90)90059-I
Paget, R. (1930), Human Speech, New York: Harcourt
Rahim, M.; Goodyear, C.; Kleijn, W.; Schroeter, J.; Sondhi, M. (1993). “On the use of neural networks in articulatory speech synthesis”. Journal of the Acoustical Society of America 93 (2): 1109–1121. doi:10.1121/1.405559.
Rosen, George (1958). “Dynamic analog speech synthesizer”. Journal of the Acoustical Society of America 30 (3): 201–9. doi:10.1121/1.1909541.
Rubin, P. E.; Baer, T.; Mermelstein, P. (1981). “An articulatory synthesizer for perceptual research”. Journal of the Acoustical Society of America 70 (2): 321–328. doi:10.1121/1.386780.
Rubin, P.; Saltzman, E.; Goldstein, L.; McGowan, R.; Tiede, M.; Browman, C. (1996), “CASY and extensions to the task-dynamic model”, Proceedings of the 1st ESCA Tutorial and Research Workshop on Speech Producing Modeling - 4th Speech Production Seminar: 125-128 . (other PDF)
Stevens, Kenneth N.; Kasowski, S.; Fant, C. Gunnar M. (1953). “An electrical analog of the vocal tract”. Journal of the Acoustical Society of America 25 (4): 734–42. doi:10.1121/1.1907169.

外部リンク

From MRI and Acoustic Data to Articulatory Synthesis
Praat: doing phonetics by computer

“Smithsonian Speech Synthesis History Project (SSSHP) 1986-2002”. 2013年10月3日時点のオリジナルよりアーカイブ。2014年5月28日閲覧。

Introduction to Articulatory Speech Synthesis
Simulated singing with the singing robot Pavarobotti or a description from the BBC on how the robot synthesized the singing.

音声合成

モデル / 手法

エンジン

商　用	AquesTalk AITalk ReadSpeaker FineSpeech RECAIUS RubyTalk VoiceOperator CereProc IVONA Microsoft text-to-speech voices PlainTalk Syllaflow Seiren Voice
フリー	eSpeak Gnuspeech Festival Speech Synthesis System Open JTalk
非OSS	MBROLA

システム / API

商　用	Microsoft Speech API Microsoft Speech Server Talk It!
フリー	FreeTTS

ハードウェア

歴史的	DECtalk Pattern playback The Voder Wolfgang von Kempelen's speaking machine
ＬＳＩ	GI SP0256 TI LPC Speech Chips
娯　楽	Currah Echo 2 Phasor Intellivoice Speak & Spell PC-6000シリーズ PC-6600シリーズ Yamaha CX5M

応用ソフトウェア

商　用	VOICEROID CeVIO Megpoid Talk A.I.VOICE ボイスソムリエ AOLbyPhone DialogOS Dr. Sbaitso Microsoft Agent Microsoft Narrator Voice font VOICEPEAK
フリー	棒読みちゃん SofTalk VOICEVOX
サイト	コエステーション / CoeAvatar CoeFont

アクセシビリティ

RIAS
Silent speech interface
Speech-generating device
Spoken Web
TuVox

スクリーン　
リーダー
(リスト)

商　用	JAWS PC Talker VoiceOver
フリー	BRLTTY Gnopernicus GR for UNIX NonVisual Desktop Access Orca Thunder
ハード	簡単ケータイらくらくホン

Self-voicing

商　用	WordQ+SpeakQ
フリー	Emacspeak

音声ブラウザ

商　用	aiBrowser ホームページリーダー Spoken Web

ブラウザ拡張

フリー	Fire Vox Text to Voice

サイト拡張

商　用	BrowseAloud Readspeaker

ボーカルシンセ

商　用	Cantor VOCALOID CeVIO Synthesizer V くまうた
フリー	AquesTone Flinger ディレイラマ Sinsy NEUTRINO
シェア	UTAU（重音テト）
非OSS	MBROLA
ハード	DECtalk PC-6000シリーズ PC-6600シリーズ Yamaha CX5-M
応　用	ぼかりす
サイト	コエラボ

サービス・サイト

Odiogo
Quack.com

プロトコル

開発者・研究者

Catherine Browman
Franklin Seaney Cooper
Gunnar Fant
Haskins Laboratories
ヴォルフガング・フォン・ケンペレン
Ignatius Mattingly
Philip Rubin
CereProc
IVONA
VoiceWeb
ヤマハ

アーティキュレートリー・シンセシス

機械式語り手

電子式声道

Haskinsと前田のモデル

現代的なモデル

商用モデル

関連項目

脚注

参考文献

外部リンク