[雙語翻譯]語音識別外文翻譯--自動語音識別錯誤檢測與糾正綜述_第1頁
已閱讀1頁,還剩13頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、2950 英文單詞, 英文單詞,1.6 萬英文字符,中文 萬英文字符,中文 4900 字文獻出處: 文獻出處:Errattahi R , Hannani A E , Ouahmane H . Automatic Speech Recognition Errors Detection and Correction: A Review[J]. Procedia Computer Science, 2018, 128:32-37.Automa

2、tic Speech Recognition Errors Detection and Correction: A ReviewRahhal Errattahi, Asmaa El Hannani, Hassan OuahmaneAbstractEven though Automatic Speech Recognition (ASR) has matured to the point of commercial application

3、s, high error rate in some speech recognition domains remain as one of the main impediment factors to the wide adoption of speech technology, and especially for continuous large vocabulary speech recognition applications

4、. The persistent presence of ASR errors have intensified the need to find alternative techniques to automatically detect and correct such errors. The correction of the transcription errors is very crucial not only to imp

5、rove the speech recognition accuracy, but also to avoid the propagation of the errors to the subsequent language processing modules such as machine translation. In this paper, basic principles of ASR evaluation are first

6、 summarized, and then the state of the current ASR errors detection and correction research is reviewed. We focus on emerging techniques using word error rate metric.Keywords: Automatic Speech Recognition; ASR Error Dete

7、ction; ASR Error Correction; ASR evaluation;1. IntroductionAutomatic Speech Recognition (ASR) systems aims at converting a speech signal into a sequence of words either for text-based communication purposes or for device

8、 controlling. The purpose of evaluating ASR systems is to simulate human judgement of the performance of the systems in order to measure their usefulness and assess the remaining difficulties and especially when comparin

9、g systems. The standard metric of ASR evaluation is the Word Error Rate, which is defined as the proportion of word errors to words processed.ASR has matured to the point of commercial applications by providing transcrip

10、tion with an acceptable level of performance which allows integration into many applications. In general, ASR systems are effective when the conditions are well controlled. Nevertheless, they are too dependent on the tas

11、k being performed and the results are far from ideal, and especially for Large Vocabulary Continuous Speech Recognition (LVCSR) applications. This later still one of the most challenging tasks in the field, due to a numb

12、er of factors, including poor articulation, variable speaking rate and high degree of acoustic variability caused by noise, side-speech, accents, sloppy pronunciation, hesitation, repetition, interruptions and channel mi

13、smatch, and/or distortions. To deal with all these problems, there has been a plethora of algorithms and technologies proposed by the scientific communities for all steps of LVCSR over the last decade: pre-processing, fe

14、ature extraction, acoustic modeling, language modeling, decoding and result post-processing. Nevertheless LVCSR systems are not yet robust with error rates of up to 50% under certain conditions [21],[8].The persistent pr

15、esence of ASR errors motivates the attempt to find alternative techniques to assist users in correcting the transcription errors or to totally automate the correction process. evaluation procedure. In other words, the re

16、ference and recognised words get matched in order to decide which word have been deleted or inserted, and which reference- recognised string pairs have been aligned to each other, which may result in a hit or a substitut

17、ion.This is normally done by using the Viterbi Edit Distance [17] to efficiently select the reference and the recognised word sequence alignment for which the weighted error score is minimized. The Edit Distance usually

18、aligns an identical weights (1 for the Levensthein distance) to all three, insertion, substitution and deletion. Yet, unified weights may present a doubt to choose the best path alignment in the case when we have differe

19、nt ones which have the same score.To avoid this problem Morris et al. [12] suggest using different weights, such that substitution will be favoured than insertion and deletion. In general, it’s recommended to put WI = WD

20、 , and WS < WI + WS . Where WI , WS and WD are respectively the weight of insertion, substitution, and deletion.2.3. ASR Evaluation MetricsAccording to McCowan et al. [11] an ideal ASR evaluation metric should be: (i)

21、 Direct; measure ASR component independently on the ASR application, (ii) Objective; the measure should be calculated in an automated manner,(iii) Interpretable; the absolute value of the measure must give an idea about

22、the performance, and (iv) Modular; the evaluation measure should be general to allow thorough application-dependent analysis.Word Error Rate (WER) is the most popular metric for ASR evaluation, it measures the percentage

23、 of incorrect words (Substitutions (S), Insertions (I), Deletions (D)) regarding the total number of words processed. It is defined asWER = =(1)𝑆 + 𝐷 + 𝐼𝑁1𝑆 + 𝐷 + w

24、868;𝐻 + 𝑆 + 𝐷where I = total number of insertions, D = total number of deletions, S = total number of substitutions, H = total number of hits, and N1 = total number of input words.Despite of bei

25、ng the most commonly used, WER has many shortcomings [10]. First of all, WER is not a true percentage because it has no upper bound, so it doesn’t tell you how good a system is, but only that one is better than another.

26、Moreover, WER is not D/I symmetric, so in noisy conditions WER could exceed 100%, for the fact that it gives far more weight to insertions than to deletions.The WER still effective for speech recognition where errors can

27、 be corrected by typing, such as, dictation. However, for almost any other type of speech recognition systems, where the goal is more than transcription, it is necessary to look for an alternative, or additional, evaluat

28、ion framework.Many researchers have proposed alternative measures to solve the evident limitations of WER. In [12] Andrew et al. introduced two information theoretic measures of word information communicated. The first o

29、ne, named Relative Information Lost (RIL), is based on Mutual Information (I, or MI) [7], which measures the statistical dependence between the input words X and output words Y, and is calculated using the Shannon Entrop

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論