简介:DocumentStampsoftheTibetGaxagGovernment¥LIRONGHUAandYEYUSHUNThedesignofofficialdocumentstampsmostoftenrevealstronglocalflavor...
简介:Manyalgorithmshavebeenimplementedfortheproblemofdocumentcategorization.ThemajorityworkinthisareawasachievedforEnglishtext,whileaveryfewapproacheshavebeenintroducedfortheArabictext.ThenatureofArabictextisdifferentfromthatoftheEnglishtextandthepreprocessingoftheArabictextismorechallenging.ThisisduetoArabiclanguageisahighlyinflectionalandderivationallanguagethatmakesdocumentminingahardandcomplextask.Inthispaper,wepresentanAutomaticArabicdocumentsclassificationsystembasedonkNNalgorithm.Also,wedevelopanapproachtosolvekeywordsextractionandreductionproblemsbyusingDocumentFrequency(DF)thresholdmethod.TheresultsindicatethattheabilityofthekNNtodealwithArabictextoutperformstheotherexistingsystems.Theproposedsystemreached0.95micro-recallscoreswith850Arabictextsin6differentcategories.
简介:Asemi-structureddocumenthasmorestructuredinformationcomparedtoanordinarydocument,andtherelationamongsemi-structureddocumentscanbefullyutilized.Inordertotakeadvantageofthestructureandlinkinformationinasemi-structureddocumentforbettermining,astructuredlinkvectormodel(SLVM)ispresentedinthispaper,whereavectorrepresentsadocument,andvectors'elementsaredeterminedbyterms,documentstructureandneighboringdocuments.TextminingbasedonSLVMisdescribedintheprocedureofK-meansforbriefnessandclarity:calculatingdocumentsimilarityandcalculatingclustercenter.TheclusteringbasedonSLVMperformssignificantlybetterthanthatbasedonaconventionalvectorspacemodelintheexperiments,anditsFvalueincreasesfrom0.65-0.73to0.82-0.86.
简介:Officeautomation(OA)hasevolvedwiththedevelopmentofcomputerscience,improvingstaffefficiency.UnstructuredinformationprocessingisanimportantaspectofOA;therefore,inthispaper,weproposeanefficientmethodfordistinguishingscannedandrasterizeddocumentimageswhichcanbeusedinthisprocess.Toensuretheefficiencyandprecisionofourmethod,twostepsareincluded:rapidprocessingandclassificationusingnoisefeatures.Inthefirststep,color,skew,andisolatednoisefeaturesareusedtoidentifythesourceoftheimages.Inthesecondstep,noisefeaturesareextractedfromtheinputimageandasupportvectormachine(SVM)classifierisusedforclassification.Ourexperimentsshowthatourmethodhashighprecisionandspeedfordistinguishingscannedandrasterizeddocumentimages.
简介:AccordingtospecificationsforWeldingProcedureQualificationofASMEIXSectionandChinesecode,JB4708-2000,asoftwarepackageformanagingweldingdocumentshasbeenrebuilt.Consequently,thenewsoftwarepackagecanbeusedinaLimitedAreaNetwork(LAN)with4differentlevelsofauthoritiesfordifferentusers.Therefore,theweldingdocuments,includingDWPS(DesignforWeldingProcedureSpecifications),PQRs(ProcedureQualificationRecords)andWPS(WeldingProcedureSpecifications)canbesharedwithinacompany.Atthesametime,thesystemprovidesusersvariousfunctionssuchasbrowsing,copying,editing,searchingandprintingrecords,andhelpsuserstomakedecisionofwhetheranewPQRtestisnecessaryornotaccordingtothecodesaboveaswell.Furthermore,superuserscanalsobrowsethehistoryofrecordmodificationandretrievetherecordswhenneeded.