L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

在线阅读 下载PDF 导出详情
摘要 Inthispaper,anewmethod,namedasL-treematch,ispresentedforextractingdatafromcomplexdatasources.Firstly,basedondataextractionlogicpresentedinthiswork,anewdataextractionmodelisconstructedinwhichmodelcomponentsarestructurallycorrelatedviaageneralizedtemplate.Secondly,adatabase-populatingmechanismisbuilt,alongwithsomeobject-manipulatingoperationsneededforflexibledatabasedesign,tosupportdataextractionfromhugetextstream.Thirdly,top-downandbottom-upstrategiesarecombinedtodesignanewextractionalgorithmthatcanextractdatafromdatasourceswithoptional,unordered,nested,and/ornoisycomponents.Lastly,thismethodisappliedtoextractaccuratedatafrombiologicaldocumentsamountingto100GBforthefirstonlineintegratedbiologicaldatawarehouseofChina.
机构地区 不详
出版日期 2005年06月16日(中国期刊网平台首次上网日期,不代表论文的发表时间)
  • 相关文献