简介:Sequentialpatternminingisanimportantdataminingproblemwithbroadapplications.However,itisalsoachallengingproblemsincetheminingmayhavetogenerateorexamineacombinatoriallyexplosivenumberofintermediatesubsequences.Recentstudieshavedevelopedtwomajorclassesofsequentialpatternminingmethods:(1)acandidategeneration-and-testapproach,representedby(i)GSP,ahorizontalformat-basedsequentialpatternminingmethod,and(ii)SPADE,averticalformat-basedmethod;and(2)apattern-growthmethod,representedbyPrefixSpananditsfurtherextensions,suchasgSpanforminingstructuredpatterns.Inthisstudy,weperformasystematicintroductionandpresentationofthepattern-growthmethodologyandstudyitsprinciplesandextensions.Wefirstintroducetwointerestingpattern-growthalgorithms,FreeSpanandPrefixSpan,forefficientsequentialpatternmining.ThenweintroducegSpanforminingstructuredpatternsusingthesamemethodology.Theirrelativeperformanceinlargedatabasesispresentedandanalyzed.Severalextensionsofthesemethodsarealsodiscussedinthepaper,includingminingmulti-level,multi-dimensionalpatternsandminingconstraint-basedpatterns.