摘要
TheanalysisofrepeatsintheDNAsequencesisanimportantsubjectinbioinformatics.Inthispaper,weproposeanovelprojection-assemblealgorithmtofindunknowninterspersedrepeatsinDNAsequences.Thealgorithmemploysrandomprojectionalgorithmtoobtainacandidatefragmentset,andexhaustivesearchalgorithmtosearcheachpairoffragmentsfromthecandidatefragmentsettofindpotentiallinkage,andthenassemblethemtogether.Thecomplexityofourprojection-assemblealgorithmisnearlylineartothelengthofthegenomesequence,anditsmemoryusageislimitedbythehardware.Wetestedouralgorithmwithbothsimulateddataandrealbiologydata,andtheresultsshowthatourprojection-assemblealgorithmisefficient.Bymeansofthisalgorithm,wefoundanun-labeledrepeatregionthatoccursfivetimesinEscherichiacoligenome,withitslengthmorethan5,000bp,andamismatchprobabilitylessthan4%.
出版日期
2004年03月13日(中国期刊网平台首次上网日期,不代表论文的发表时间)