摘要
Manyalgorithmshavebeenimplementedfortheproblemofdocumentcategorization.ThemajorityworkinthisareawasachievedforEnglishtext,whileaveryfewapproacheshavebeenintroducedfortheArabictext.ThenatureofArabictextisdifferentfromthatoftheEnglishtextandthepreprocessingoftheArabictextismorechallenging.ThisisduetoArabiclanguageisahighlyinflectionalandderivationallanguagethatmakesdocumentminingahardandcomplextask.Inthispaper,wepresentanAutomaticArabicdocumentsclassificationsystembasedonkNNalgorithm.Also,wedevelopanapproachtosolvekeywordsextractionandreductionproblemsbyusingDocumentFrequency(DF)thresholdmethod.TheresultsindicatethattheabilityofthekNNtodealwithArabictextoutperformstheotherexistingsystems.Theproposedsystemreached0.95micro-recallscoreswith850Arabictextsin6differentcategories.
出版日期
2008年02月12日(中国期刊网平台首次上网日期,不代表论文的发表时间)