摘要
Speakervariabilityisanimportantsourceofspeechvariationswhichmakescontinuousspeechrecognitionadifficulttask.Adaptingautomaticspeechrecognition(ASR)modelstothespeakervariationsisawell-knownstrategytocopewiththechallenge.AlmostallsuchtechniquesfocusondevelopingadaptationsolutionswithintheacousticmodelsoftheASRsystems.Althoughvariationsoftheacousticfeaturesconstituteanimportantportionoftheinter-speakervariations,theydonotcovervariationsatthephoneticlevel.Phoneticvariationsareknowntoformanimportantpartofvariationswhichareinfluencedbybothmicro-segmentalandsuprasegmentalfactors.Inter-speakerphoneticvariationsareinfluencedbythestructureandanatomyofaspeaker'sarticulatorysystemandalsohis/herspeakingstylewhichisdrivenbymanyspeakerbackgroundcharacteristicssuchasaccent,gender,age,socioeconomicandeducationalclass.Theeffectofinter-speakervariationsinthefeaturespacemaycauseexplicitphonerecognitionerrors.Theseerrorscanbecompensatedlaterbyhavingappropriatepronunciationvariantsforthelexiconentrieswhichconsiderlikelyphonemisclassificationsbesidespronunciation.Inthispaper,weintroducespeakeradaptivedynamicpronunciationmodels,whichgeneratedifferentlexiconsforvariousspeakerclustersanddifferentrangesofspeechrate.Themodelsarehybridsofspeakeradaptedcontextualrulesanddynamicgeneralizeddecisiontrees,whichtakeintoaccountwordphonologicalstructures,rateofspeech,unigramprobabilitiesandstresstogeneratepronunciationvariantsofwords.EmployingthesetofspeakeradapteddynamiclexiconsinaFarsi(Persian)continuousspeechrecognitiontaskresultsinworderrorratereductionsofasmuchas10.1%inaspeaker-dependentscenarioand7.4%inaspeaker-independentscenario.
出版日期
2009年10月20日(中国期刊网平台首次上网日期,不代表论文的发表时间)