简介:Asemi-structureddocumenthasmorestructuredinformationcomparedtoanordinarydocument,andtherelationamongsemi-structureddocumentscanbefullyutilized.Inordertotakeadvantageofthestructureandlinkinformationinasemi-structureddocumentforbettermining,astructuredlinkvectormodel(SLVM)ispresentedinthispaper,whereavectorrepresentsadocument,andvectors'elementsaredeterminedbyterms,documentstructureandneighboringdocuments.TextminingbasedonSLVMisdescribedintheprocedureofK-meansforbriefnessandclarity:calculatingdocumentsimilarityandcalculatingclustercenter.TheclusteringbasedonSLVMperformssignificantlybetterthanthatbasedonaconventionalvectorspacemodelintheexperiments,anditsFvalueincreasesfrom0.65-0.73to0.82-0.86.
简介:Sequentialpatternminingisanimportantdataminingproblemwithbroadapplications.However,itisalsoachallengingproblemsincetheminingmayhavetogenerateorexamineacombinatoriallyexplosivenumberofintermediatesubsequences.Recentstudieshavedevelopedtwomajorclassesofsequentialpatternminingmethods:(1)acandidategeneration-and-testapproach,representedby(i)GSP,ahorizontalformat-basedsequentialpatternminingmethod,and(ii)SPADE,averticalformat-basedmethod;and(2)apattern-growthmethod,representedbyPrefixSpananditsfurtherextensions,suchasgSpanforminingstructuredpatterns.Inthisstudy,weperformasystematicintroductionandpresentationofthepattern-growthmethodologyandstudyitsprinciplesandextensions.Wefirstintroducetwointerestingpattern-growthalgorithms,FreeSpanandPrefixSpan,forefficientsequentialpatternmining.ThenweintroducegSpanforminingstructuredpatternsusingthesamemethodology.Theirrelativeperformanceinlargedatabasesispresentedandanalyzed.Severalextensionsofthesemethodsarealsodiscussedinthepaper,includingminingmulti-level,multi-dimensionalpatternsandminingconstraint-basedpatterns.
简介:Geological Prospecting and Mining in TibetGeologicalProspectingandMininginTibet¥DONDUINAMGYISeptember1,1995markedthe30thanniv...
简介:HuainanCoalMiningBureau,aspeciallargecoalenterpriseandastatekeycoalproductionbase,issituatedincentral-northpartofAnhuiProvince.Thearea,well-knownas"thecoalcapitalofEastChina",aboundsincoalresources,andtheprovencoalreserveisestimatedtobeupto70billiontonswithcompletevarietiesandsuperiorquality.Bytheyearof2010,theannualproductioncapacitywillreach30milliontons.Thereareexcellentinvestmentenvironmentandconvenientcommunicationandtransportation
简介:语篇语言学与翻译研究,进而讨论翻译研究的语篇语言学方法以及语篇翻译研究的范围、研究重点以及研究方法,即翻译研究的语篇语言学方法
简介:Thispaperexaminestheapproachusedbymiddleschoolteacherstoteachingtextsandarguesagainstthetraditionalpracticeofexploitingtextsjusttoteachgrammarandvocabulary.Amorebalancedapproachispresented,involvingallfour-languageskills,concentratingonoutputaswellasinput,andtrainingstu-dentstoextractrelevantinformationfromtexts,makingthemmoreefficientreaders.
简介:Thepaperdescribesatexture-basedfasttextlocationschemewhichoperatesdirectlyintheDiscreteWaveletTransform(DWT)domain.Bythedistinguishingtexturecharacteristicsencodedinwavelettransformdomain,thetextisfastdetectedfromcomplexbackgroundimagesstoredinthecompressedformatsuchasJPEG2000withoutfulldecompress.Comparedwithsometraditionalcharacterlocationmethods,theproposedschemehastheadvantagesoflowcomputationalcost,robusttosizeandfontofcharactersandhighaccuracy.Preliminaryexperimentalresultsshowthattheproposedschemeisefficientandeffective.
简介:Withmassiveamountsofdatastoredindatabases,mininginformationandknowledgeindatabaseshasbecomeanimportantissueinrecentresearch.Researchersinmanydifferentfieldshaveshowngreatinterestindateminingandknowledgediscoveryindatabases.Severalemergingapplicationsininformationprovidingservices,suchasdatawarehousingandon-lineservicesovertheInternet,alsocallforvariousdataminingandknowledgediscoverytchniquestounderstandusedbehaviorbetter,toimprovetheserviceprovided,andtoincreasethebusinessopportunities.Inresponsetosuchademand,thisarticleistoprovideacomprehensivesurveyonthedataminingandknowledgediscorverytechniquesdevelopedrecently,andintroducesomerealapplicationsystemsaswell.Inconclusion,thisarticlealsolistssomeproblemsandchallengesforfurtherresearch.
简介:Thispaperpresentsanewwaytoextractconceptthatcanbeusedtoimprovetextclassificationper-formance(precisionandrecall).Thecomputationalmeasurewillbedividedintotwolayers.Thebottomlayercalleddocumentlayerisconcernedwithextractingtheconceptsofparti-culardocumentandtheupperlayercalledcategorylayeriswithfindingthedescriptionandsubjectconceptsofparticularcategory.Therelevantim-plementationalgorithmthatdramatic-allydecreasesthesearchspaceisdis-cussedindetail.Theexperimentbasedonreal-worlddatacollectedfromInfo-Bankshowsthattheapproachissupe-riortothetraditionalones.
简介:TheonethingthatmostinterfereswithEnglishasaSecondLanguagelearner’sreadingisunknownvocabulary.Thisreferstoanywordwhichblocksthereaders’understandingintheirprocessofreading.Whenreading,onewillinevitablymeetunknownvocabularynomatterhowlargeone’smentallexiconis.Oftenthedensityofunfamiliarwordsinreadingmakesthereaderfeelgreatlyfrustratedandgiveupintheend,forhehasalreadylosthistrainofthoughtwhenlookingupwordsinthedictionary.Itisclearthattoogreatadensityofunknownlexicalitemsslowsdownthereadingspeedwhichleadstopoorcomprehensionofthetext.Ofcourse,one’sknowledgeofEnglishvocabularyisboundtobelimited;butisthereawayonecanefficientlycopewiththeunknownwordsonecomesacrossinreadingwithouthavingtostopandlookthemupinthedictionary.
简介:Landresourcesarefacingcrisesofbeingmisused,especiallyforanintersectionareabetweentownandcountry,andlandcontrolhastobeenforced.Thispaperpresentsadevelopmentofdataminingmethodforlandcontrol.Avector-matchmethodfortheprerequisiteofdataminingi.e.,datacleaningisproposed,whichdealswithbothcharacterandnumericdataviavectorizingcharacter-stringandmatchingnumber.Aminimaldecisionalgorithmofroughsetisusedtodiscovertheknowledgehiddeninthedatawarehouse.Inordertomonitorlandusedynamicallyandaccurately,itissuggestedtosetupareal-timelandcontrolsystembasedonGPS,digitalphotogrammetryandonlinedatamining.Finally,themeansisappliedintheintersectionareabetweentownandcountryofWuhancity,andasetofknowledgeaboutlandcontrolisdiscovered.
简介:Thispaperpresentsafault-detectionmethodbasedonthephasespacereconstructionanddataminingapproachesforthecomplexelectronicsystem.TheapproachforthephasespacereconstructionofchaotictimeseriesisacombinationalgorithmofmultipleautocorrelationandΓ-test,bywhichthequasi-optimalembeddingdimensionandtimedelaycanbeobtained.Thedataminingalgorithm,whichcalculatestheradiusofgyrationofunit-masspointaroundthecentreofmassinthephasespace,candistinguishthefaultparameterfromthechaotictimeseriesoutputbythetestedsystem.Theexperimentalresultsdepictthatthisfaultdetectionmethodcancorrectlydetectthefaultphenomenaofelectronicsystem.
简介:OutlierminingisanimportantaspectindataminingandtheoutlierminingbasedonCookdistanceismostcommonlyused.Butweknowthatwhenthedatahavemulticollinearity,thetraditionalCookmethodisnolongereffective.Consideringtheexcellenceoftheprincipalcomponentestimation,weuseittosubstitutetheleastsquaresestimation,andthengivetheCookdistancemeasurementbasedonprincipalcomponentestimation,whichcanbeusedinoutliermining.Atthesametime,wehavedonesomeresearchonrelatedtheoriesandapplicationproblems.
简介:RecentyearshavewitnessedaresurgenceofinterestintheepicpoetryofValeriusFlaccusandhiscontemporariesintheSilverAge,leadingtotheappreciationoftheselaterepicsnotonlyfortheirintrinsicvaluebutalsofortheircontributiontoourunderstandingofepicintheprecedingGoldenAge,theprincipalexampleofwhichis,ofcourse,Vergil’sAeneid.Foritsrehabilitationtopopularity,ValeriusFlaccus’sArgonauticaisindebtedtotheperceptiveanalysisofthepractitionersofliterarycriticism.Withrespecttoancienttexts,however,literarycriticism
简介:Inthispaper,ARMiner,adataminingtoolbasedonassociationrules,isintroduced.Beginningwiththesystemarchitecture,thecharacteristicsandfunctionsaredis-cussedindetails,includingdatatransfer,concepthierarchygeneralization,miningruleswithnegativeitemsandthere-developmentofthesystem.Anexampleofthetool'sapplicationisalsoshown.Finally,someissuesforfutureresearcharepresented.
简介:ThebackdoororinformationleakofWebserverscanbedetectedbyusingWebMiningtechniquesonsomeabnormalWeblogandWebapplicationlogdata.ThesecurityofWebserverscanbeenhancedandthedamageofillegalaccesscanbeavoided.Firstly,thesystemfordiscoveringthepatternsofinformationleakagesinCGIscriptsfromWeblogdatawasproposed.Secondly,thosepatternsforsystemadministratorstomodifytheircodesandenhancetheirWebsitesecuritywereprovided.Thefollowingaspectsweredescribed:oneistocombinewebapplicationlogwithweblogtoextractmoreinformation,sowebdataminingcouldbeusedtomineweblogfordiscoveringtheinformationthatfirewallandInformationDetectionSystemcannotfind.AnotherapproachistoproposeanoperationmoduleofwebsitetoenhanceWebsitesecurity.Inclusterserversession,Density-BasedClusteringtechniqueisusedtoreduceresourcecostandobtainbetterefficiency.