外文翻譯----一個(gè)實(shí)驗(yàn)文語(yǔ)轉(zhuǎn)換系統(tǒng)在分析韻律短語(yǔ)的貢獻(xiàn)_第1頁(yè)
已閱讀1頁(yè),還剩12頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、<p>  外文翻譯文獻(xiàn)(中文)</p><p>  一個(gè)實(shí)驗(yàn)文語(yǔ)轉(zhuǎn)換系統(tǒng)在分析韻律短語(yǔ)的貢獻(xiàn)</p><p><b>  介紹</b></p><p>  我們描述了一個(gè)實(shí)驗(yàn)性的文語(yǔ)轉(zhuǎn)換系統(tǒng),它使用一個(gè)確定性的解析器和韻律規(guī)則為英文輸入生成詞組水平音高和時(shí)間持續(xù)久的信息。這一信息是用來(lái)注釋輸入句子,然后被處理的文本到語(yǔ)音程序目前在貝

2、爾實(shí)驗(yàn)室開(kāi)發(fā)。在建構(gòu)這系統(tǒng)中,我們的目標(biāo)一直是檢驗(yàn)假設(shè)(i)該語(yǔ)法樹(shù)中的信息可用。尤其地,如主謂和頭補(bǔ)這樣的語(yǔ)法功能,是BV公司本身在確定svnthetic韻律時(shí)有用的短語(yǔ)和語(yǔ)法功能(ii)它可以使用一個(gè)指定語(yǔ)法句法分析函數(shù)來(lái)確定合成語(yǔ)音的韻律短語(yǔ)。</p><p>  雖然語(yǔ)法和韻律之間的某些關(guān)聯(lián)是眾所周知的(例如像進(jìn)度話(huà)詞性應(yīng)力的影響,或設(shè)立括號(hào)表達(dá)式關(guān)閉)實(shí)用的知識(shí)是非常小的語(yǔ)法問(wèn)題上可能被連接到可用的韻律

3、短語(yǔ)。在許多研究中,研究人員之間尋求成分結(jié)構(gòu)和韻律連接(如Cooper和Paccia-Cooper1980年。Umeda1982年。Gee和Grosjean1983)但是,隨著Selkirk(1984年)的例外。他們往往忽略了在svntax樹(shù)語(yǔ)法功能的代表性。此外,以前的工作還沒(méi)有具體明確,提供了一個(gè)完整的系統(tǒng)實(shí)施的基礎(chǔ)。在我們的韻律短語(yǔ)記錄人類(lèi)語(yǔ)言的研究的基礎(chǔ)上,我們決定強(qiáng)調(diào)三個(gè)方面的結(jié)構(gòu),它涉及到短語(yǔ):句法選區(qū),語(yǔ)法功能及成分的長(zhǎng)度

4、。這些研究結(jié)果。我們將詳細(xì)討論,已實(shí)施了韻律規(guī)則的集合在一個(gè)實(shí)驗(yàn)文語(yǔ)轉(zhuǎn)換系統(tǒng)。</p><p>  我們系統(tǒng)具有兩個(gè)重要的特征。第一,對(duì)我們的韻律系統(tǒng)的輸入是由一個(gè)一個(gè)分析樹(shù)的deterministtc分析器Fidditch(欣德?tīng)?983)版本生成的。這個(gè)解析器左角落搜索策略,特別是,它的決定,給Fidditch的速度,使在線(xiàn)文本到語(yǔ)音的生產(chǎn)是可行的。在建設(shè)一個(gè)解析樹(shù)里,F(xiàn)ldditch確定核心主謂對(duì)象關(guān)系,但

5、沒(méi)有試圖代表附屬或修飾關(guān)系。因此相對(duì)的條文,狀語(yǔ)和其他非參數(shù)的成分在樹(shù)中沒(méi)有指定位置,而且沒(méi)有指定的語(yǔ)義角色。第二,在韻律系統(tǒng)的規(guī)則通過(guò)參考據(jù)法結(jié)構(gòu)和早期的語(yǔ)法結(jié)構(gòu)來(lái)建立韻律樹(shù)。其結(jié)果是一個(gè)支持該觀(guān)點(diǎn)的分層表示,也是在Selkirk(1984)提出該語(yǔ)法功能信息與韻律短語(yǔ)有關(guān),但間接得,通過(guò)不同層次的處理。該系統(tǒng)的非正式測(cè)試顯示,它在所產(chǎn)生的合成語(yǔ)音質(zhì)量韻律中能夠產(chǎn)生顯著改善。我們?cè)谖覀兠枋龅恼{(diào)查系統(tǒng)的問(wèn)題中,并沒(méi)有發(fā)現(xiàn)任何嚴(yán)重違反我們

6、的基本方針。在許多情況下,看來(lái)當(dāng)前版本的問(wèn)題能就通過(guò)進(jìn)一步采取我們的做法來(lái)解決,包括所要求的另一個(gè)因素確定的韻律短語(yǔ)解析器的詞匯信息</p><p><b>  文語(yǔ)轉(zhuǎn)換</b></p><p>  大多數(shù)文語(yǔ)系統(tǒng)包括兩部分:發(fā)音規(guī)則和語(yǔ)音合成器。發(fā)音規(guī)則轉(zhuǎn)換成拼音輸入文字,wav可以補(bǔ)充到一個(gè)提供關(guān)于一部分語(yǔ)音、強(qiáng)調(diào)模式和特定詞語(yǔ)的拼音組成信息的字典。語(yǔ)音合成器然后

7、轉(zhuǎn)換拼音成語(yǔ)音參數(shù)系列,并在后來(lái)的處理中產(chǎn)生數(shù)字化語(yǔ)音。雖然這些系統(tǒng)往往表現(xiàn)在字的發(fā)音非常好,但當(dāng)涉及到提供完整的句子很好的韻律時(shí)他們功虧一簣。目前的文本到語(yǔ)音系統(tǒng)無(wú)法獲得語(yǔ)法和影響詞組層次韻律的句子的語(yǔ)義特征。因此判刑韻律規(guī)則,當(dāng)他們提供所有通常取決于文本(例如標(biāo)點(diǎn)符號(hào))表面的問(wèn)題,以及在復(fù)雜程度不同的啟發(fā)。雖然這種技術(shù)通常添加一個(gè)更自然的質(zhì)量,由此產(chǎn)生的合成語(yǔ)音,他們可能會(huì)在一些重要方面失敗,例如,忽略了冗長(zhǎng)的主語(yǔ)和謂語(yǔ)韻律活動(dòng)之間

8、的韻律事件,以至于在字中正確的標(biāo)記顯著特征中的正確性和標(biāo)記之間沒(méi)有明確的韻律邊界。</p><p>  一些作者(如Allen 1976; Elovitz等al.1976。Luce等1983)曾建議,語(yǔ)音合成與天然之間的韻律差異是主要的,在未解決的因素,導(dǎo)致合成語(yǔ)音的流利的理解困難。但是詞組之間的層次韻律及其來(lái)源的關(guān)系,是如此知之甚少,以至于我們對(duì)在任何程度上不同層次的適用的解釋--句法,語(yǔ)義或務(wù)實(shí)沒(méi)有很好的理解

9、。我們目前有一個(gè)合理的文本自動(dòng)句法分析工具,但對(duì)于語(yǔ)義或語(yǔ)用文本分析并沒(méi)有等價(jià)發(fā)達(dá)的東西。因此,一個(gè)明顯的目的是探討在何種程度上詞組層次韻律可以解釋語(yǔ)法樹(shù)和發(fā)展這一關(guān)系的詳細(xì)描述。另外一個(gè)目標(biāo)是將這個(gè)關(guān)系而產(chǎn)生的見(jiàn)解轉(zhuǎn)換成一個(gè)能夠與語(yǔ)音合成器工作的系統(tǒng)。這使我們能夠更充分地測(cè)試我們的描述,或許也將進(jìn)一步產(chǎn)生一些文語(yǔ)技術(shù)。</p><p><b>  句法結(jié)構(gòu)與韻律短語(yǔ)</b></p&g

10、t;<p>  除了字一級(jí)水平,出現(xiàn)了句法結(jié)構(gòu)和韻律短語(yǔ)之間的系統(tǒng)連接聯(lián)系。Cooper和Paccia -Cooper(1980),梅田(1982)和Gee和Grosjean (1983)和Selkirk韻律理論(1984)在心理學(xué)聲學(xué)調(diào)查是其中較顯著的研究,代表了兩種主要方法語(yǔ)法/韻律關(guān)系。在Cooper和Paccia -Cooper(1980)和Umeda(1982),從語(yǔ)法連接韻律短語(yǔ)是任何過(guò)濾過(guò)程作中間人,即他們提

11、出了具體韻律短語(yǔ)可以直接從語(yǔ)法句法結(jié)構(gòu)通過(guò)擁有音值的特別句法節(jié)點(diǎn)關(guān)聯(lián)(或者成分界限),要么暫停,節(jié)段性延長(zhǎng),或交叉的語(yǔ)音規(guī)則,單詞的調(diào)節(jié)阻塞。相比之下,Gee和Grosjean(1983)和Selkirk(1984)認(rèn)為,語(yǔ)法關(guān)系是間接的韻律:韻律短語(yǔ)是根據(jù)規(guī)則推導(dǎo),是指由左到右的順序,長(zhǎng)度(或分支模式),并在在Selkirk的情況下的語(yǔ)法功能,以及組成成員,以便推斷層次韻律結(jié)構(gòu)。但是,盡管各自的立場(chǎng)非常清楚,這些研究都不是決定性的。所

12、有的語(yǔ)法框架缺乏足夠詳細(xì)和正式允許廣泛的測(cè)試,大多數(shù)只考慮了少數(shù)的句子和句子類(lèi)型。</p><p>  為了發(fā)展我們的分析,我們首先在從包含四個(gè)指令手冊(cè)的不同文本里閱讀我們的一次演講來(lái)審查韻律短語(yǔ)。后來(lái)這些文本增加了一個(gè)專(zhuān)業(yè)閱讀散文故事。韻律短語(yǔ)之間的界限被確定歸類(lèi),然后根據(jù)他們的句法和語(yǔ)義方面的功能被歸類(lèi)。</p><p><b>  文語(yǔ)轉(zhuǎn)換合成</b></

13、p><p>  該方案構(gòu)成的講話(huà)組件中描述了Liberman和Buchsbaum(個(gè)人通信)。這些方案作為輸入文字文本和產(chǎn)生數(shù)字化語(yǔ)音輸出。通過(guò)注解文字輸入這個(gè)系統(tǒng),其運(yùn)作的許多方面都可以重寫(xiě)或修改,例如:主要和次要的短語(yǔ)邊界的位置,給單詞的壓力,轉(zhuǎn)錄的單詞和它們之間的界限,時(shí)間段,以及等高線(xiàn)間距的細(xì)節(jié)。正如我們將顯示,我們的韻律體制使我們能夠生產(chǎn)其中的四個(gè)邊境水平確定和感知區(qū)分,使用目前的文本到語(yǔ)音轉(zhuǎn)換系統(tǒng)的注釋字

14、符串。</p><p><b>  韻律短語(yǔ)</b></p><p>  韻律規(guī)則使用的有關(guān)成分結(jié)構(gòu),語(yǔ)法的作用,和長(zhǎng)度來(lái)映射一個(gè)表面結(jié)構(gòu)樹(shù)標(biāo)識(shí)韻律短語(yǔ)邊界的位置(由節(jié)點(diǎn)標(biāo)志著)和每個(gè)邊界(由節(jié)點(diǎn)號(hào),標(biāo)志著中)的相對(duì)強(qiáng)度信息。正是這一點(diǎn)是用來(lái)注釋用轉(zhuǎn)義序列提供有關(guān)韻律短語(yǔ)說(shuō)明文字到語(yǔ)音轉(zhuǎn)換系統(tǒng)的輸入文字信息。</p><p>  在擬定我們的規(guī)

15、則來(lái)建設(shè)韻律結(jié)構(gòu),我們以單單實(shí)施Gee和Grosjean(1983)模型的思想開(kāi)始。這種模式最初提出來(lái)預(yù)測(cè)主觀(guān)的描述句子結(jié)構(gòu),被稱(chēng)為性能結(jié)構(gòu),從句法樹(shù)決定韻律邊界,但聲明不是明確提出了一個(gè)句法成分。</p><p>  我們起初被Gee和Grosjean的模式吸引,因?yàn)槠鋵?duì)相對(duì)邊界的比重,即在一個(gè)關(guān)于在句子中的其他界面邊界強(qiáng)度的測(cè)定。我們發(fā)現(xiàn),在我們所收集的數(shù)據(jù),這個(gè)比重發(fā)揮了重要作用。事實(shí)上,我們直接納入到我們

16、的系統(tǒng)這樣做的一個(gè)權(quán)重的方法,即Gee和Grosjean的規(guī)則來(lái)確定圍繞一個(gè)使用相對(duì)長(zhǎng)度(如終端節(jié)點(diǎn)數(shù)量衡量)動(dòng)詞短語(yǔ)的韻律邊界的優(yōu)勢(shì)。</p><p>  當(dāng)我們擴(kuò)展Gee和Grosjean的模型來(lái)創(chuàng)建一個(gè)通用系統(tǒng)使用適當(dāng)?shù)乃惴ǎ覀兊乃惴ㄆx了它的出發(fā)點(diǎn),反映了我們?cè)噲D糾正在Gee和Grosjean模型中遇到的弱點(diǎn)和缺陷。我們遇到的這些問(wèn)題并不奇怪,因?yàn)槲覀兊哪繕?biāo)和Gee和Grosjean之間的不同。<

17、/p><p>  Gee和Grosjean模式和我們目前的算法中最重要的區(qū)別是涉及邊界的決定因素權(quán)重。Gee和Grosjean假設(shè)這個(gè)比重僅取決于句法節(jié)點(diǎn)的數(shù)量,其數(shù)量左到右順序,在動(dòng)詞短語(yǔ)組成的長(zhǎng)度的例子。相比之下,我們的數(shù)據(jù)與Selkirk(1984)的理論分析一致,表明邊界的力量是依賴(lài)于語(yǔ)法功能,在一個(gè)給定的句子成分的發(fā)揮。特別是,我們觀(guān)察這些功能之間的邊界方面的強(qiáng)度,就像如下討論。我們的附加規(guī)則從大部分的Se

18、lkirk的算法中推導(dǎo)出了。我們也取得了Gee和Grosjean(1983)從Selkirk的工作采取的大部分思想,某些句法頭劃出語(yǔ)音短語(yǔ)邊界,并提供更高層次的分析。我們的韻律運(yùn)行規(guī)則使用四個(gè)獨(dú)立的階段.每個(gè)階段是建立在之前的階段,這樣的規(guī)則可以參考語(yǔ)法和韻律結(jié)構(gòu),因?yàn)橄群蠼⒏邔哟蔚捻嵚山Y(jié)構(gòu)。</p><p><b>  結(jié)論</b></p><p>  我們描述

19、了一個(gè)在線(xiàn)實(shí)驗(yàn)系統(tǒng),該系統(tǒng)采用韻律規(guī)則由成分結(jié)構(gòu)、語(yǔ)法功能、韻律和長(zhǎng)度得到韻律應(yīng)用。該系統(tǒng)包含三個(gè)模塊:一個(gè)確定性的解析器,短語(yǔ)的韻律規(guī)則,和一個(gè)轉(zhuǎn)換短語(yǔ)的韻律規(guī)則的輸出到貝爾實(shí)驗(yàn)室文本語(yǔ)音轉(zhuǎn)換系統(tǒng)的算法。</p><p>  基于基元選擇的語(yǔ)音合成方法中普通話(huà)文語(yǔ)轉(zhuǎn)換</p><p><b>  介紹</b></p><p>  文語(yǔ)轉(zhuǎn)換系統(tǒng)

20、是一個(gè)可以自由轉(zhuǎn)換文本文件到音頻文件的系統(tǒng)。這是一個(gè)把文本文件讀出來(lái)給人聽(tīng)的過(guò)程。對(duì)于文語(yǔ)轉(zhuǎn)換系統(tǒng),有著廣范圍的應(yīng)用。</p><p>  一個(gè)典型的文語(yǔ)轉(zhuǎn)換系統(tǒng)包含三個(gè)主要的部分:文本分析,韻律生成和語(yǔ)音合成。文本分析部分理解了每個(gè)文本并確定每個(gè)句子的聲音;韻律合成部分產(chǎn)生控制語(yǔ)音變異的一些參數(shù);語(yǔ)音合成部分根據(jù)發(fā)音和韻律的要求產(chǎn)生話(huà)語(yǔ)的表達(dá)。</p><p>  在過(guò)去的二十年,許多方

21、法已被用于合成語(yǔ)音,主要途徑可分為兩個(gè)主要的類(lèi)別,即以規(guī)則以基礎(chǔ)的共振峰合成和串聯(lián)合成。共振峰合成生成語(yǔ)音使用一套規(guī)則。這些規(guī)則經(jīng)常是來(lái)自一個(gè)漫長(zhǎng)的實(shí)驗(yàn)過(guò)程,這種方法需要小型計(jì)算機(jī)內(nèi)存。但是語(yǔ)音質(zhì)量受到了該方法本身的限制。然而,串聯(lián)合成須使用一些預(yù)先錄制的語(yǔ)音單位為模板。合成過(guò)程中,各單位通過(guò)使用信號(hào)處理技術(shù)被修改,然后聯(lián)合在一起形成一段話(huà)語(yǔ)。這個(gè)方法通常需要更大的內(nèi)存。但是語(yǔ)音質(zhì)量也相對(duì)應(yīng)地更好了。然而,隨著科技的發(fā)展,人并不滿(mǎn)足于這

22、樣的通過(guò)使用信號(hào)方法產(chǎn)生的語(yǔ)音話(huà)語(yǔ)機(jī)。</p><p>  正常連接合成的工作原理是保持一個(gè)小單位的庫(kù)存在系統(tǒng)。合成過(guò)程中一個(gè)單位被選中,然后根據(jù)韻律特征修改使用信號(hào)處理技術(shù)。用該方法合成可生成具有較高的語(yǔ)音質(zhì)量,但是,由于信號(hào)處理過(guò)程,合成語(yǔ)音或多或少扭曲。一個(gè)簡(jiǎn)單地產(chǎn)生好質(zhì)量語(yǔ)音的方法是儲(chǔ)存大量的人類(lèi)發(fā)音的語(yǔ)音段在一個(gè)數(shù)據(jù)庫(kù)里,當(dāng)執(zhí)行時(shí),串聯(lián)所有需要的語(yǔ)音段在一起不作任何修改。當(dāng)然,選擇的連接段時(shí)間越長(zhǎng),生成

23、的講話(huà)越自然。由于每個(gè)語(yǔ)音單位在不同情況下可能有很多變種或韻律情況下,這種方法需要一個(gè)大的內(nèi)存來(lái)存儲(chǔ)大量的語(yǔ)音段。因?yàn)閹啄昵暗挠?jì)算能力和內(nèi)存限制,該方法不實(shí)用。隨著硬件的發(fā)展,大語(yǔ)料庫(kù)語(yǔ)音合成用于直接連接使用單位是可能的。單位選擇為基礎(chǔ)的語(yǔ)音合成(或語(yǔ)料庫(kù)為基礎(chǔ)合成)已應(yīng)用在英語(yǔ)及其他語(yǔ)言好幾年。一些嘗試(劉,王,1998年;楚等人,2001年;王等人,2000年,Liet人,2001年)已使用中文TTS的單位選擇方式。吳等人 (200

24、1)也提出了一個(gè)計(jì)劃,選擇發(fā)音,語(yǔ)言最佳單位,然后應(yīng)用韻律修改。但是,所有提出的方法已在適當(dāng)?shù)捻嵚蓱?yīng)用局限性。如果沒(méi)有適當(dāng)?shù)捻嵚蓪徸h后,生成的語(yǔ)音質(zhì)量,有時(shí)可能會(huì)很差。本文關(guān)注有關(guān)如何適用于一個(gè)單位選擇基</p><p><b>  2基元選擇模型</b></p><p>  一個(gè)基元選擇模型具有良好的組織基元的數(shù)據(jù)庫(kù)。該數(shù)據(jù)庫(kù)包含了語(yǔ)音基元從一大主體,這是經(jīng)過(guò)精心設(shè)

25、計(jì),有韻律的所有語(yǔ)音和覆蓋面大變種各單位。在數(shù)據(jù)庫(kù)中,每個(gè)基元有一個(gè)講話(huà)可能變種的數(shù)量,這是適合出現(xiàn)在不同的語(yǔ)音和韻律環(huán)境。大語(yǔ)料進(jìn)行了分析和離線(xiàn)所有的計(jì)算都儲(chǔ)存在一個(gè)單位的數(shù)據(jù)庫(kù)。在數(shù)據(jù)庫(kù)中,每一個(gè)基元的實(shí)例所描述的特征向量。每個(gè)功能可能是離散或連續(xù)值。的特點(diǎn)包括單位本身和該單位的環(huán)境特點(diǎn)。本機(jī)的功能本身用于選擇正確的單位,符合段的要求,而環(huán)境的特點(diǎn)是用于最好的選擇內(nèi)容相關(guān)的單位,這可能減少選擇的單位之間的不連續(xù)性。主體為基礎(chǔ)的合成實(shí)

26、際上是一種串聯(lián)模式匹配的過(guò)程。在合成,工作需要做的是選擇最佳單位,發(fā)音和韻律的最佳匹配的目標(biāo)單位。同時(shí),選擇的單位之間的不連續(xù)性,應(yīng)盡可能小。為了滿(mǎn)足這些要求,兩種成本的界定應(yīng)合成。一個(gè)是單位成本,介紹如何關(guān)閉選擇的單位到所需的單位。另一種是連接的成本,它描述了連續(xù)性的程度單位之間的選擇??偝杀臼莾煞N成本的加權(quán)和。</p><p><b>  3 基元選擇</b></p>&l

27、t;p>  在語(yǔ)音合成過(guò)程中接受來(lái)自韻律生成零件信息,檢索講話(huà)單位數(shù)據(jù)庫(kù)來(lái)為每一個(gè)適當(dāng)?shù)膯挝徊檎夷繕?biāo)語(yǔ)音單位。該裝置可以選擇過(guò)程如圖1所示,在圖中,目標(biāo)一句是“今天很熱”,由4個(gè)音節(jié)組成。每個(gè)音節(jié)有一組候選單位。粗線(xiàn)厚邊框顯示選定的基元序列。在單位選擇過(guò)程,為了獲得最佳的講話(huà),我們要考慮(1)通過(guò)與目標(biāo)單位的比較,候選單位是否適當(dāng),(2)被選擇的單位之間鏈接的平滑。因此,選擇過(guò)程是要找到一個(gè)在所有的最佳路徑在連接晶格可能路徑。搜索

28、過(guò)程是按照一個(gè)成本函數(shù),它描述對(duì)一個(gè)單位,兩個(gè)單位之間的平滑度的適當(dāng)程度。</p><p><b>  4 語(yǔ)料庫(kù)</b></p><p>  正如我們前面提到的,一個(gè)大語(yǔ)料是用于基于合成的單位選擇。該語(yǔ)料包含了大量收集的話(huà)語(yǔ)。合成的單位將被從語(yǔ)料中提取。盡可能多地覆蓋上下文相關(guān)單位和韻律的變種是理想的。但是,建立一個(gè)非常大的語(yǔ)料,有一個(gè)完整的覆蓋單位的變種,這通常

29、是不可能的。由于建設(shè)有高品質(zhì)的大型語(yǔ)料庫(kù)的成本非常昂貴的,平衡是通常由覆蓋面和規(guī)模之間衡量。</p><p>  在此研究中,我們建立了一個(gè)約38000音節(jié)語(yǔ)料。這語(yǔ)料的腳本是從一個(gè)大的文本語(yǔ)料庫(kù)(約3億個(gè)漢字)選擇的。主體是設(shè)計(jì)來(lái)盡可能覆蓋經(jīng)常使用的獨(dú)立音節(jié)和上下文相關(guān)的音節(jié)。我們使用北大人民日?qǐng)?bào)的文本語(yǔ)料庫(kù),作為真正的word文本參考來(lái)評(píng)估腳的本主體。我們算出創(chuàng)建語(yǔ)料庫(kù)覆蓋的99.8%的音節(jié)出現(xiàn)在北大語(yǔ)料庫(kù)。

30、當(dāng)單位上下文是由最初和最后一類(lèi)分組(我們定義了11個(gè)聲母類(lèi)和10個(gè)韻母類(lèi))中,語(yǔ)料覆蓋的76.8%的單位的類(lèi)出現(xiàn)在北大文本語(yǔ)料庫(kù)。有了這樣的覆蓋面,我們認(rèn)為,對(duì)于基于合成的單位選擇,語(yǔ)料庫(kù)是合適的。</p><p>  外文翻譯文獻(xiàn)(英文) </p><p>  THE CONTRIBUTION OF PARSING TO PROSODIC PHRASING IN AN EXPERI

31、MENTAL TEXT-TO-SPEECH SYSTEM</p><p>  INTRODUCTION </p><p>  We describe an experimental text-to-speech system that uses a deterministic parser and prosody rules to generate phrase-level pitch a

32、nd duration information for English input. This information is used to annotate the input sentence, which is then processed by the text-to-speech programs currently under development at Bell Labs. In constructing the sys

33、tem, our goal has been to test the hypotheses (i) that information available in the syntax tree. In particular. grammatical functions such as subje</p><p>  Although certain connections between syntax and pr

34、osody are well-known (e.g. the influence of part of speech on stress in words like progress, or the setting off of parenthetical expressions) very little practical knowledge is available on which aspects of syntax might

35、be connected to prosodic phrasing. In many studies, investigators have sought connections between constituent structure and prosody (e.g. Cooper and Paccia-Cooper 1980. Umeda 1982. Gee and Grosjean 1983) but, with the ex

36、ception of</p><p>  Two important features characterize our system. First. the input to our prosody system is a parse tree generated by a version of the deterministtc parser Fidditch (Hindle 1983). The left-

37、corner search strategy of this parser and, in particular, its determinism, give Fidditch the speed that makes online text-to-speech production feasible. In building a parse tree, Fldditch identifies the core subject-ver

38、b- object relations but makes no attempt to represent adjunct or modifier relations. Thus rel</p><p>  Informal tests of the system show that it is capable of producing a significant improvement in the proso

39、dic quality of the resulting synthesized speech, Our investigations of the system's problems, which we describe, have not revealed any serious counterexample to our basic approach. In many cases,it appears that probl

40、ems with the current version can be resolved by taking our approach a step further, and including lexical information required by the parser as another factor in the determination </p><p>  TEXT-TO-SPEECH &l

41、t;/p><p>  Most text-to-speech systems comprise two components: pronunciation rules and a speech </p><p>  synthesizer. Pronunciation rules convert the input text into a phonetic transcription; thi

42、s information mav also be supplemented by a dictionary that provides information about the part of speech, stress pattern and phonetic makeup of particular words. The speech synthesizer then converts this phonetic transc

43、ription into a series of speech parameters which are subsequently processed to produce digitized speech.</p><p>  While these systems tend to perform quite well on word pronunciation, they fall short when it

44、 comes to providing good prosody for complete sentences. Current text-to-speech systems have no access to the syntactic and semantic properties of a sentence that influence phrase-level prosody. Hence rules for sentence

45、prosody, when they are provided at all typically depend on superficial aspects of text (e.g. punctuation) and on heuristics that vary widely in sophistication. Although such techniques of</p><p>  Several au

46、thors (e.g. Allen 1976; Elovitz et al. 1976; Luce et al. 1983) have suggested that prosodic differences between synthetic and natural speech are the primary, unaddressed factor leading to difficulties in the comprehensio

47、n of fluent synthetic speech. The relation between phrase-level prosody and its sources, however, is so poorly understood that we have no good sense of the degree to which different levels of explanation--syntactic, sema

48、ntic, or pragmatic--are applicable. We currently h</p><p>  SYNTACTIC STRUCTURE AND PROSODIC PHRASING</p><p>  Beyond the word level, however, there has been little investigation of systematic c

49、onnections between syntactic structure and prosodic phrasing. The psycholinguistic and acoustic investigations of Cooper and Paccia-Cooper (1980), Umeda (1982) and Gee and Grosjean (1983)and the prosodic theory of Selkir

50、k (1984) are among the more notable studies and represent the two main approaches to syntax/prosody relations. In Cooper and Paccia-Cooper (1980) and Umeda (1982), the connection from syntax to pr</p><p>  

51、To develop our analysis, we first examined prosodic phrasing in the speech of one of us reading prose from various texts, including four instruction manuals. These texts were later augmented by a professional reading of

52、a prose story. The boundaries between prosodic phrases were identified and then classed according to their syntactic context and semantic function. </p><p>  Text-to-speech Synthesis</p><p>  Th

53、e programs that make up the speech component are described in Liberman and Buchsbaum (personal communication). These programs take character text as input and produce digitized speech output. By annotating the input text

54、 to this system, many aspects of its operation can be overridden or modified: e.g. the location of major and minor phrase boundaries, the stress given to words, the transcription of words and the boundaries between them,

55、 the timing of segments, </p><p>  and details of the pitch contour. As we will show, with our prosody system we are able to produce </p><p>  strings in which four boundary levels are identifie

56、d and perceptually distinguished, using the current text- to-speech system annotations. </p><p>  Prosodic Phrasing </p><p>  The prosody rules use information about constituent structure, gramm

57、atical role, and length to map a surface structure. The prosody tree identifies the location of phrase boundaries (signified by the nodes) and the relative strength of each boundary (signified by a number in the node).

58、It is this information that is used to annotate the input text with escape sequences that provide the text-to- speech system with instructions about prosodic phrasing. </p><p>  In formulating our rules for

59、building the prosodic structure, we began with the idea of simply implementing the model of Gee and Grosjean (1983). This model, initially proposed to predict a form of psychological data describing subjective sentence s

60、tructure known as performance structure, determines prosodic boundaries from a syntactic tree, but assumes rather than explicitly presents a syntactic component.</p><p>  We were initially attracted to the G

61、ee and Grosjean model because of its emphasis on relative boundary weighting, i.e., on the determination of the strength of a given boundary with respect to the other boundaries in the sentence. We found that in the data

62、 we had collected, this weighting played an important role. In fact, we incorporated directly into our system one method of doing this weighting, namely Gee and Grosjean's rule to determine the strengths of the proso

63、dic phrase boundaries around</p><p>  The most important difference between the Gee create an algorithm adequate for use in a general purpose system, our algorithm diverged from its starting point, reflectin

64、g our attempts to correct weaknesses and lacunae that we encountered in the Gee and Grosjean model. That we encountered these problems is not surprising given the difference between our goals and those of Gee and Grosjea

65、n. and Grosjean model and our current algorithm involves the factors determining boundary weight. Gee and Grosj</p><p>  Our adjunction rules are derived for the most part from Selkirk's account. We have

66、 also made use of the idea, which Gee and Grosjean (1983) take largely from the work of Selkirk, that certain syntactic heads mark off phonological phrase boundaries, and provide the basic prosodic constituents for highe

67、r level analysis. </p><p>  Our prosody rules run in four independent stages. Each stage builds on the previous stage, so that the rules can refer to both syntactic and prosodic structure as they build succe

68、ssively higher levels of prosodic structure.</p><p>  CONCLUSIONS </p><p>  We have described an on-line experimental system that uses prosody rules to infer prosodic phrasing from constituent s

69、tructure, grammatical functions, and length considerations. The system contains three modules: a deterministic parser, a set of prosodic phrasing rules, and an algorithm to convert the output of the prosodic phrasing rul

70、es into signals for the Bell Labs text-to-speech system.</p><p>  A Unit Selection-based Speech Synthesis Approach for Chinese Mandarin Text-to-Speech</p><p>  1 Introduction </p><p&g

71、t;  Text-to-Speech system is a system that converts free text into speech. This is a process that reads out the text for people. There is a wide range of applications for text-to-speech system. </p><p>  A t

72、ypical text-to-speech system consists of three main parts, which are text analysis, prosody generation and speech synthesis. The text analysis part understands the text and determines the sound of each sentence. The pros

73、ody generation part generates some parameters that control the variability of the speech. The speech synthesis part generates the speech utterance based on the pronunciation and prosody requirement.</p><p> 

74、 In the past decades, many approaches have been used to synthesize speech. The main approaches can be classified into two main categories, i.e. rule-based formant synthesis and concatenation synthesis. Formant synthesis

75、generates speech using a set of rules. The rules are usually derived from a long process of experiments. This approach needs small computer memory. But the speech quality is limited by the approach itself. Concatenation

76、synthesis, however, uses some pre-recorded speech units as te</p><p>  Normal concatenation synthesis works by keeping a small unit inventory in system. During synthesis, a unit is selected and then modified

77、 using signal processing techniques according to prosody features. Synthesis by this way can generate speech with relatively high quality. However, the synthetic speech is more or less distorted due to the signal process

78、ing process.</p><p>  A simple idea of generating good speech is to store large quantities of speech segments of human speech in a database and, when generating, concatenate all the needed speech segments to

79、gether without any modification. Of course the longer the stored segments selected for the concatenation, the more natural the generated speech. As each speech unit may have many variants in different contexts or prosodi

80、c </p><p>  situations, this approach needs a large memory to store a large number of speech segments. The approach was not practical some years ago because of the limitation of computer power and memory. Wi

81、th the development of hardware, the use of large speech corpus as synthetic units for direct concatenation is possible. </p><p>  Unit selection-based speech synthesis (or corpus-based synthesis) has been ap

82、plied in English and other languages for some years. Some attempts (Liu, and Wang, 1998; Chu et al. 2001; Wang et al., 2000, Li et al, 2001) have been made for Chinese TTS using unit selection approach. Wu et al. (2001)

83、also proposed a scheme to select phonetically, linguistically best units and then apply prosodic modifications. </p><p>  However, all the proposed approaches have limitations in the application of proper pr

84、osody. Without proper prosody consideration, the quality of the generated speech may be poor sometimes. This paper concerns about how to apply prosody in a unit selection based synthesis.</p><p>  2 Unit Sel

85、ection Model </p><p>  A unit selection model has a well-organized unit database. The database contains the speech units from a large corpus, which is carefully designed to have a large coverage of all phone

86、tic and prosodic variants of each unit. In the database, each speech unit has a number of possible variants, which are suitable to appear in different phonetic and prosodic environments. The large speech corpus is analyz

87、ed offline and all the calculated features are stored in a unit database. In the database, each </p><p>  3 Unit Selection Process </p><p>  The speech synthesis process accepts information from

88、 prosody generation part, retrieves the speech unit database to find a proper unit for every target speech unit. The unit selection process can be illustrated as Figure 1. In the figure, the target sentence is “今天很熱 (it

89、is very hot today)”, which consists of 4 syllables. Each syllable has a set of candidate units. The thick line and thick edge box indicate the selected unit sequence. In unit selection process, to get the best speech, we

90、 have t</p><p><b>  4 Corpus </b></p><p>  As we have mentioned earlier, a large speech corpus is used in unit selection based synthesis. The speech corpus consists of a large collec

91、tion of utterances. The unit for the synthesis will be extracted from the corpus. It is ideal to cover context dependent units and prosody variants as much as possible. However, it is usually impossible to build a very l

92、arge speech corpus that has a complete coverage of unit variants. As the cost of constructing a large corpus with high quality is very expens</p><p>  In this research, we built a corpus of around 38000 syll

93、ables. The script of this speech corpus is selected from a large text corpus (around 300M Chinese characters). The corpus is designed to cover the frequently used context independent syllable and context dependent syllab

94、le as much as possible. We use PKU People’s Daily text corpus as a reference for real word text to evaluate the script of the corpus. We calculated that the built corpus covers 99.8% of syllable occurrences in the PKU co

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論