archive-com.com » COM » C » CLRES.COM

Total: 469

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".
  • CL Research Home Page
    The information on this page is displayed in frames If your browser cannot view frames click here to go to CL Research Site Map

    Original URL path: http://www.clres.com/ (2016-02-11)
    Open archived version from archive


  • Cl Research Index
    CL Research

    Original URL path: http://www.clres.com/indexa.html (2016-02-11)
    Open archived version from archive

  • CL Research Home
    a FrameNet frame element dictionary used to create a frame element taxonomy identifying hypernymic links between frame elements and the number of frames in which these frame elements appear see Electronic Dictionaries and an online version that allows exploration of this taxonomy a dictionary of all English prepositions courtesy of Oxford University Press further developed and analyzed in The Preposition Project with an online version and broken down into preposition classes with digraphs showing derivational relationships and with preposition corpora containing 80 000 sentences including tokenized lemmatized and dependency parsed versions in CoNLL X format see Electronic Dictionaries A corpus pattern analysis of preposition behavior with a pattern dictionary of English prepositions has been initiated with an online version following principles developed by Patrick Hanks with current data available for download in MySQL files and the Oxford Dictionary of English 1st and 2nd editions The electronic versions of the Oxford and Macquarie lexical resources are not publicly available but may be licensed through CL Research for research purposes An additional DIMAP dictionary has been created for use in CL Research s implementation of Minnesota Contextual Content Analysis MCCA a content analysis tool used for statistical characterization and analysis of texts from tweets and Likert scales to newspaper articles and blogs including multiple person texts such as transcripts of focus groups or plays such as Hamlet The results of our research in examining the role of computational lexicons are incorporated in the Knowledge Management System KMS which is a unified platform for parsing and analyzing text from most formats including Word PDF XML and web pages answering free form natural language questions summarizing one or more documents generally or topic based extracting information exploring document contents and dynamically creating ontological representations of document contents KMS is accompanied by several supporting programs

    Original URL path: http://www.clres.com/home.html (2016-02-11)
    Open archived version from archive

  • CL Research Software
    WordNet a licensed dictionary or a user developed dictionary for easy lookup and selection of one or more appropriate senses Search of Definitions Regular expression search on all fields with search results shown on screen or printed to a file with format to your specifications Extract subdictionaries using search mechanism to create a file of selected entries that can be uploaded into a new DIMAP dictionary definition parsing using the Proximity Parser Definition Parsing Using the Proximity Parser Parse individual definitions or all definitions in step or batch mode start at any entry with position remembered between sessions automatically identify and or add semantic relations discovered during parsing including synonyms with user customizable regular expression patterns for recognition diagnostic definition parsing aids print to files such things as parse output identified semantic relations bad parses definitions with no identified semantic relations comparison to WordNet hierarchy and unknown words Definition Analysis Compare and map definitions across dictionaries useful for mapping among a main dictionary and independently developed derived dictionaries all or individual entries with or without stop list word overlap using best fit and componential analysis using score based on matches between hypernyms and other semantic relations using WordNet synsets to allow fuzzy matches visual display of edit distance difference between definitions Analysis of dictionary digraph based on hypernym links to identify primitive senses among the definitions for whole or partial dictionaries such as thesaurus groupings summarizes hypernym links among entries identifies non primitive derived entries and senses identifies primitive defining vocabulary identifies definitional cycles particularly useful when thesaurus entries are linked to definitions Conversion Import and Export Uploading dictionary data from other sources requires specific format Downloading dictionary data for use elsewhere according to your own format template editor to facilitate format specification including addition of your own strings such as SGML HTML or XML codes Lexical Acquisition Create dictionaries based on analyzing your own texts lists or continuous including Latin 1 languages all words with automatic tokenizer capitalized phrases with join words approximating named entity acquisition longest contiguous non interrupted phrase without a stop word or punctuation with dictionary lookup approximating compound noun acquisition No lookup Batch creation using integrated licensed dictionary or WordNet Requires licensed dictionary Integrated Dictionary Lookup WordNet with all information converted into DIMAP format thus allowing a word based use of WordNet rather than a synset based use Licensed dictionary or use WordNet as auxiliary dictionary Requires DIMAP WordNet Download Owner and Licensee Updates Latest executable and help files are dated 10 9 01 Please click here to reach password protected area and to see latest changes License Agreements CL Research is seeking beta testers and others interested in assisting the furtherance of lexicological and text analysis research objectives for its Windows based DIMAP dictionary and content analysis software To obtain DIMAP you must first complete a license DIMAP is available without cost for beta testing and research purposes to academic organisations and by arrangement for beta testing evaluation and research purposes to commercial organisations

    Original URL path: http://www.clres.com/software.html (2016-02-11)
    Open archived version from archive

  • Demos
    paste and then parse any sentence to see the parse tree results upon which further analysis is performed The demo files include an HTML file describing the non terminal and terminal part of speech nodes in the parse tree Download and unzip the file and run the executable XML Analyzer The XML Analyzer was designed principally to investigate texts processed into XML format by the DIMAP text processor for question answering in TREC 2002 The full set of XML renderings for documents answering the TREC 2002 questions are available from NIST further details upon request A sample XML file is provided along with instructions for examining particular aspects of the text Alphabetic WordNet 3 0 An alphabetic version of WordNet was created to assist the construction of customized dictionaries using the WordNet distribution See the full description of how the alphabetic version of WordNet 3 0 was created then follow the link here to register your download UMLS Specialist Lexicon The UMLS Specialist Lexicon 2009 has been totally converted into alphabetical format into a DIMAP dictionary The Specialist Lexicon of the Unified Medical Language System is designed for the specialized lexical needs of medical community This lexicon contains over 385 000 terms and was developed to provide the lexical information needed for the SPECIALIST Natural Language Processing System Alphabetic DIMAP dictionaries have been created for 371 000 main entries as well as for a variants dictionary of 225 000 entries These dictionaries provide comprehensive coverage of general English in addition to the extensive coverage of biomedical terms The data elements in the lexicon describe syntactic characteristics of each entry including inflection codes case gender syntactic category complements for verbs and nouns modification types for adverbs and more This is lexicon was developed as a free publicly available resource with only

    Original URL path: http://www.clres.com/demos.html (2016-02-11)
    Open archived version from archive

  • Electronic Dictionaries
    5 onward The size of the compressed files is approximately 20 MB uncompressed the size of the DIMAP dictionaries is 55 MB UMLS Specialist UMLS Specialist Lexicon The UMLS Specialist Lexicon 2012 has been totally converted into alphabetical format into a DIMAP dictionary The Specialist Lexicon of the Unified Medical Language System is designed for the specialized lexical needs of medical community This lexicon contains over 462 000 terms and was developed to provide the lexical information needed for the SPECIALIST Natural Language Processing System Alphabetic DIMAP dictionaries have been created for 444 000 main entries as well as for a variants dictionary of 284 000 entries These dictionaries provide comprehensive coverage of general English in addition to the extensive coverage of biomedical terms The data elements in the lexicon describe syntactic characteristics of each entry including inflection codes case gender syntactic category complements for verbs and nouns modification types for adverbs and more This is lexicon was developed as a free publicly available resource with only moderate restrictions e g you can t claim it as your own The DIMAP distribution includes an extensive help file that describes how each element of Specialist has been handled along with the Perl scripts used to create the files uploaded into DIMAP FrameNet Alphabetic FrameNet Dictionary The FrameNet 1 5 data have been converted into an alphabetic dictionary This dictionary contains 11053 entries with 8568 entries for lexical items many having multiple senses with different parts of speech and 2485 entries that encode the frames and frame relations Details of these items can be found through the main FrameNet site The help file accompanying the FrameNet Dictionary provides a more detailed description of the dictionary and how it was constructed Frame Element Dictionary FrameNet Frame Element Dictionary The FrameNet 1 3 frame to frame relations and frame element defintions have been analyzed to create a dictionary of frame elements see details This dictionary contains 1004 entries with hypernymic links between frame elements that permit the creation of a frame element taxonomy An online version also permits examination of this taxonomy The distribution also includes files used in the creation of a MySQL database of the taxonomy The Preposition Project Data The Preposition Project Data Data from The Preposition Project Online include a DIMAP dictionary of all English prepositions November 2008 courtesy of Oxford University Press containing much of the data and with disambiguated hypernymic relationships as used in the digraph analysis of preposition classes The Preposition Project Corpora The Preposition Project Corpora This package contains three preposition corpora 1 the training and test sets used in the SemEval 2007 task on preposition disambiguation drawn from FrameNet FN 24 481 sentences 2 a set of 7 650 sentences from the Oxford English Corpus OEC as examples for senses in the Oxford Dictionary of English ODE and 3 a set of 48 000 sentences from the written portion of the British National Corpus drawn with methodology used in the Corpus Pattern Analysis project CPA

    Original URL path: http://www.clres.com/elec_dictionaries.html (2016-02-11)
    Open archived version from archive

  • DIMAP Implementation of MCCA
    tokens in each of the text groups in the input text as well as a concordance of their uses See list screen shot See concordance screen shot Words in Category tokens in a specified text group that have been used at least a specified number of times that is a list of the name of each category the tokens in that category meeting the cutoff restriction and for each token its use percentage relative to the total number of tokens in the text group and its frequency See screen shot Word List tokens in a specified text group that have been used at least a specified number of times in decreasing frequency order that is a list consisting of a token s rank in the list the token itself its use percentage relative to the total number of tokens in the text group its frequency and its category number and name See screen shot E Score High Categories emphasis scores E scores for those categories for which either an E score for one of the text groups has an absolute value greater than 5 0 or the difference between the E scores for two of the text groups is greater than 5 0 The 116 MCCA categories are grouped into 23 super categories The results include the percent of words in all the texts in the supercategories and the important categories and summary statistics on the categories including the mean and standard deviation of the Escores for categories meeting the cutoff criterion See screen shot Selected Plots plots of emphasis scores E scores for those categories for which either an E score for one of the text groups has an absolute value greater than 5 0 or the difference between the E scores for any two of the text groups is greater than 5 0 These plots consist of arrows from a zero point to an approximate plus or minus point corresponding to the E score for the category allowing the E scores from different texts to be compared See screen shot Difference Analysis the difference in E scores between any one of the text groups and all the others This includes some summary word accounting statistics for each text group showing the total number of tokens and unique tokens along with their percentages of all words in the text and then these broken down into tokens that were classified and tokens that were not classified or leftover with percentages of the total number of tokens and unique tokens the E score mean and standard deviation over these categories and the low E score the high E score and the E score range and finally the E score differences between the specified text and the others for the selected categories those with scores or differences of 5 0 or more are printed See screen shot Diagnostic Plots 43 emphasis score Escore combinations that may be usable for analyzing a text group plotted for easy comparisons See screen shot Distance Matrix

    Original URL path: http://www.clres.com/mcca.html (2016-02-11)
    Open archived version from archive

  • Dictionary Analysis Services
    Ken Litkowski Thesaurus Development A thesaurus contains synonyms broader than and narrower than terms With DIMAP you or CL Research can parse a set of dictionary definitions to identify how different entries relate to one another The amount of effort depends of course on the size of the dictionary As a guide processing of Webster s 2 nd International Dictionary containing 120 000 headwords and 270 000 definitions took approximately 40 hours much of which was background processing Familiarization may require additional time To create the thesaurus yourself you will need to put your dictionary entries into the format used to upload them into DIMAP format The file format is described in the help file provided with the experimental DMP3A If you are unable to create the entries directly CL Research will provide the C source code for a program applicable against a marked up ASCII file Alternatively CL Research will modify the program to meet your format for 200 Once the data are in the proper format for uploading into DIMAP dictionaries the experimental DMP3A can be used with a couple of menu selections to create the dictionaries Parsing the definitions and creating the thesaurus require only a few more menu and dialog selections After DIMAP dictionaries are created they will then be suitable for more extended thesaural and semantic relations as DMP3A is developed further If you require further assistance CL Research can customize DMP3A to meet your needs Please inquire Ontology Development An ontology is an organization of concepts with one another most specifically a categorization of entities and actions A full ontology may deal with all knowledge but it is possible to construct an ontology for a single field of study The main organizing principle of an ontology is the ISA backbone a horse is an

    Original URL path: http://www.clres.com/services.html (2016-02-11)
    Open archived version from archive



  •