摘要
Inthispaper,anewmethod,namedasL-treematch,ispresentedforextractingdatafromcomplexdatasources.Firstly,basedondataextractionlogicpresentedinthiswork,anewdataextractionmodelisconstructedinwhichmodelcomponentsarestructurallycorrelatedviaageneralizedtemplate.Secondly,adatabase-populatingmechanismisbuilt,alongwithsomeobject-manipulatingoperationsneededforflexibledatabasedesign,tosupportdataextractionfromhugetextstream.Thirdly,top-downandbottom-upstrategiesarecombinedtodesignanewextractionalgorithmthatcanextractdatafromdatasourceswithoptional,unordered,nested,and/ornoisycomponents.Lastly,thismethodisappliedtoextractaccuratedatafrombiologicaldocumentsamountingto100GBforthefirstonlineintegratedbiologicaldatawarehouseofChina.
出版日期
2005年06月16日(中国期刊网平台首次上网日期,不代表论文的发表时间)