学科分类
/ 1
8 个结果
  • 简介:DocumentStampsoftheTibetGaxagGovernment¥LIRONGHUAandYEYUSHUNThedesignofofficialdocumentstampsmostoftenrevealstronglocalflavor...

  • 标签:
  • 简介:Manyalgorithmshavebeenimplementedfortheproblemofdocumentcategorization.ThemajorityworkinthisareawasachievedforEnglishtext,whileaveryfewapproacheshavebeenintroducedfortheArabictext.ThenatureofArabictextisdifferentfromthatoftheEnglishtextandthepreprocessingoftheArabictextismorechallenging.ThisisduetoArabiclanguageisahighlyinflectionalandderivationallanguagethatmakesdocumentminingahardandcomplextask.Inthispaper,wepresentanAutomaticArabicdocumentsclassificationsystembasedonkNNalgorithm.Also,wedevelopanapproachtosolvekeywordsextractionandreductionproblemsbyusingDocumentFrequency(DF)thresholdmethod.TheresultsindicatethattheabilityofthekNNtodealwithArabictextoutperformstheotherexistingsystems.Theproposedsystemreached0.95micro-recallscoreswith850Arabictextsin6differentcategories.

  • 标签: ARABIC DOCUMENTS classification KNN VECTOR model
  • 简介:Asemi-structureddocumenthasmorestructuredinformationcomparedtoanordinarydocument,andtherelationamongsemi-structureddocumentscanbefullyutilized.Inordertotakeadvantageofthestructureandlinkinformationinasemi-structureddocumentforbettermining,astructuredlinkvectormodel(SLVM)ispresentedinthispaper,whereavectorrepresentsadocument,andvectors'elementsaredeterminedbyterms,documentstructureandneighboringdocuments.TextminingbasedonSLVMisdescribedintheprocedureofK-meansforbriefnessandclarity:calculatingdocumentsimilarityandcalculatingclustercenter.TheclusteringbasedonSLVMperformssignificantlybetterthanthatbasedonaconventionalvectorspacemodelintheexperiments,anditsFvalueincreasesfrom0.65-0.73to0.82-0.86.

  • 标签: HTML语言 XML语言 半结构文件模型 版本开采 结构信息
  • 简介:Officeautomation(OA)hasevolvedwiththedevelopmentofcomputerscience,improvingstaffefficiency.UnstructuredinformationprocessingisanimportantaspectofOA;therefore,inthispaper,weproposeanefficientmethodfordistinguishingscannedandrasterizeddocumentimageswhichcanbeusedinthisprocess.Toensuretheefficiencyandprecisionofourmethod,twostepsareincluded:rapidprocessingandclassificationusingnoisefeatures.Inthefirststep,color,skew,andisolatednoisefeaturesareusedtoidentifythesourceoftheimages.Inthesecondstep,noisefeaturesareextractedfromtheinputimageandasupportvectormachine(SVM)classifierisusedforclassification.Ourexperimentsshowthatourmethodhashighprecisionandspeedfordistinguishingscannedandrasterizeddocumentimages.

  • 标签: 光栅扫描 图像文件 办公自动化 文档图像 分类使用 噪声特性
  • 简介:AccordingtospecificationsforWeldingProcedureQualificationofASMEIXSectionandChinesecode,JB4708-2000,asoftwarepackageformanagingweldingdocumentshasbeenrebuilt.Consequently,thenewsoftwarepackagecanbeusedinaLimitedAreaNetwork(LAN)with4differentlevelsofauthoritiesfordifferentusers.Therefore,theweldingdocuments,includingDWPS(DesignforWeldingProcedureSpecifications),PQRs(ProcedureQualificationRecords)andWPS(WeldingProcedureSpecifications)canbesharedwithinacompany.Atthesametime,thesystemprovidesusersvariousfunctionssuchasbrowsing,copying,editing,searchingandprintingrecords,andhelpsuserstomakedecisionofwhetheranewPQRtestisnecessaryornotaccordingtothecodesaboveaswell.Furthermore,superuserscanalsobrowsethehistoryofrecordmodificationandretrievetherecordswhenneeded.

  • 标签: C/S结构 软件包 焊接 文档管理系统 数据库 DWPS