智能与分布计算实验室
  语音识别的鲁棒性和自适应方法研究
姓名 丰洪才
论文答辩日期 2005.05.08
论文提交日期 2005.06.24
论文级别 博士
中文题名 语音识别的鲁棒性和自适应方法研究
英文题名 Research on Robustness and Adaptation Methods of Speech Recognition
导师1 卢正鼎
导师2
中文关键词 隐马尔可夫模型;模型自适应;最大后验估计;说话人归一化;说话人聚类
英文关键词 Hidden Markov Model;Model Adaptation;Maximum a Posterior;Speaker Normalization;Speaker Clustering
中文文摘 由于语音识别系统应用环境的复杂性,各种干扰因素往往导致语音识别系统的性能急剧地下降,因此,如何提高语音识别系统对各种干扰因素的鲁棒性和自适应能力,就成为语音识别技术走向实用化的关键问题。 针对语音识别系统中存在着环境差异和环境噪音的问题,在MAP(Maximum a Posterior,最大后验估计)和MLLR(Maximum Likelihood Linear Regression,最大似然线性回归)两种说话人自适应方法的基础上,采用了一种新的渐进使用自适应数据的策略,给出了一种快速综合渐进语音识别自适应方法。实验结果表明,快速综合渐进语音识别自适应方法即使在自适应数据比较少的情况下也可以取得较好的语音识别效果,在一定程度上克服了说话人差异和环境差异对语音识别系统的影响,在无噪音和有噪音的渐进自适应情况下分别可以降低识别字错误率23.03%和29.69%。 给出了用说话人聚类方法为自适应提供一个更好的初始声学模型的说话人自适应方法。该方法以模型自适应技术为核心,采用说话人聚类技术减小训练集的特征分布离散度和基元间混迭度,利用自适应数据为自适应过程选择最适当的初始声学模型,可以极大程度地提高SI(Speaker Independent,非特定人)语音识别系统的性能。 针对传统的VTLN(Vocal Tract Length Normalization,声道长度归一化)方法用单一声道因子来描述说话人差异导致频谱弯折函数无法将不同的共振峰同时对齐的问题,提出了用更为细致的频谱弯折函数来完成频谱归一化方法,即基于分段线性弯折函数的频谱归一化方法。在适当的频谱分段下,该方法较好地完成了频谱对齐的任务。传统的声道长度归一化方法可以看作该方法在分段数为2时的特例。由于利用了与模型无关的频谱弯折函数,该方法被证明是一种适用于无监督模式的说话人自适应方法,具有很高的鲁棒性。 在基于统计模型化说话人归一化训练方法、CMN(Cepstral Mean Normalization,倒谱均值归一化)方法和说话人自适应训练方法分析基础上,提出了与状态相关的直接均值移动归一化训练方法,并将之与模型自适应方法结合,得到基于MAP估计的直接均值移动归一化训练和MAP/WNR(Weighted Neighbor Regression,加权近邻回归)模型自适应结合方法。实验结果表明,直接均值移动归一化训练方法是一种较好的有监督模式下的鲁棒性方法。
英文文摘 Because application environment of speech recognition system is complicacy, every kind of interfere factor often degrades the performances of speech recognition rapidly, so the problem of how to improve robustness and adaptive ability of speech recognition system becomes a key ingredient that speech recognition technology turn into utility. In allusion to the problem of environment diversity and environment noises in application of speech recognition system, a kind of approach for rapid, integration and incremental adaptation is presented in this dissertation, and the method adopts a new strategy which incremental uses adaptation data based on both speaker adaptation methods of Maximum a Posterior (MAP) and Maximum Likelihood Linear Regression (MLLR). The experimental results show that the method work well when only few data is available, and can effectively deal with the speaker variations and environment variations. The new one improves the Word Error Rate (WER) by 23.03% in a quiet environment and by 29.69% in a noisy environment respectively. The dissertation puts forward speaker adaptatio method. The method offers a better-initialized acoustics-model that uses speaker-clustering method for adaptation, and it takes model adaptation technology as a kernel,uses speaker clustering technology to reduce broader distribution and cross-unit overlaps of training set ,selects a best propriety initialized acoustics model utilizing adaptation data for adaptation process,so the method can do its best to improve speech recognition performance of SI (Speaker Independent) system. In allusion to problem of frequency warping function cannot to right differ formant simultaneity in tradition vocal tract length normalization(VTLN)method, which describe speaker variation using singleness vocal tract factor,the dissertation brings forward more particularity frequency warping function to finish frequency normalization that is speaker normalization method based on the piece-wise linear frequency warping. With an appropriate partition of frequency axis, the differences of spectral can be removed well, and the traditional VTLN can be regarded as the special case with only one partition point. Due to the model-independent warping function, this method is proved to be a quite fast adaptation technique, and especially suitable for the unsupervised adaptation. On the base of analysis speaker normalization training method based on mathematics statistical model, method of Cepstral Mean Normalization(CMN) and speaker adaptation training method, the dissertation brings forward the normalization training technique of state-relative direct mean shift based on MAP estimation, and combines the technique with model adaptation method in unification robustness frameworks , the technique combines the normalization training technique of state-relative direct mean shift based on MAP estimation with MAP/WNR(Weighted Neighbor Regression) model adaptation。The experimental results show that the method is a better robustness method on supervised mode.