archive-com.com » COM » C » CLRES.COM

Total: 469

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".
  • DIMAP Implementation of MCCA
    Research Implementation of Minnesota Contextual Content Analysis C Score Plots Raw Plots of the raw context scores C scores for the text groups for each context type Traditional Practical Emotional

    Original URL path: http://www.clres.com/mcca.php?show=mcca-csrp (2016-02-11)
    Open archived version from archive


  • DIMAP Implementation of MCCA
    Content Analysis C Score Distance Matrix A euclidean distance between the context scores of each pair of text groups in the input file Texts that are more similar to one

    Original URL path: http://www.clres.com/mcca.php?show=mcca-csdist (2016-02-11)
    Open archived version from archive

  • MRDs in WSD for Senseval-2
    in NODE there were 13 entries where bar was the first word in an MWU and 50 entries where it was the head noun for begin there was only one entry For the all words texts a list was made of all the task words to be disambiguated including some phrases and a subdictionary constructed from this list For both tasks the creation of these subdictionaries was fully automatic no hand manipulation was involved The NODE dictionaries were then mapped into the WordNet dictionaries see Litkowski 1999 using overlap among words and semantic relations The 73 dictionaries for the lexical sample words gave rise to 1372 WordNet entries and 1722 NODE entries 3 Only 491 entries were common i e no mappings were available for the remaining 1231 NODE entries 881 entries in WordNet were therefore inaccessible through NODE For the entries in common there was an average of 5 6 senses of which only 64 were mappable into WordNet The a priori probability of successful mapping into the appropriate WordNet sense is 0 064 the baseline for assessing WSD via another dictionary mapped into the WordNet sense tagged keys 4 2 Disambiguation Techniques The lexical sample and all words texts were modified slightly Satellite tags were removed and entity references were converted to an ASCII character In the all words texts contraction and quotation mark discontinuities were undone These changes made the texts more like normal text processing conditions The texts were next reduced to sentences For the lexical sample a sentence was assumed to consist of a single line For the all words texts a sentence splitter identified the sentences which were next submitted to the parser The DIMAP parser produced a parse tree for each sentence with constituent phrases when the sentence was not parsable with the grammar allowing the WSD phase to continue The first step in the WSD used the part of speech of the tagged word to select the appropriate sense inventory Nouns verbs and adjectives were looked up in the phrase dictionary if the tagged word was part of an MWU the word was changed to the MWU and the MWU s sense inventory was used instead The dictionary entry for the word was then accessed Before evaluating the senses the topic area of the context provided by the sentence was established only for NODE Subject labels for all senses of all content words in the context were tallied Each sense of the target was then evaluated Senses in a different part of speech were dropped from consideration The different pieces of information in the sense were assessed collocation patterns contextual clue words contextual overlap with definitions and examples and topical area matches 5 Points were given to each sense and the sense with the highest score was selected in case of a tie the first sense in the dictionary was selected Collocation pattern testing requiring an exact match with surrounding text was given the largest number of points 10 sufficient in general

    Original URL path: http://www.clres.com/senseval2.html (2016-02-11)
    Open archived version from archive

  • Syntactic Clues and Lexical Resources in Question-Answering
    dictionaries during which most of the raw data were put into specific fields of a DIMAP dictionary e g headword part of speech definitions example usages and many features characterizing syntactic properties and other information particularly a link to Macquarie s thesaurus and identification of a derivational link for undefined words to their root form After conversion and upload the entire dictionary of 270 000 definitions was parsed The purpose of the parsing was to populate the raw dictionary data by adding semantic relations links with other words The most important result was the identification of the hypernyms of each sense Other relations include synonyms discernible in the definitions typical subjects and objects for verbs and various semantic components such as manner purpose location class membership and class inclusion This dictionary accessed during the question answering process is thus similar in structure to MindNet Richardson 1997 The Macquarie thesaurus was provided in the form of a list of the words belonging to 812 categories which are broken down into paragraphs 3 or 4 for each part of speech and subparagraphs A subparagraph contains about 10 words that are generally synonymous We were also provided Green 2000 with a set of perl scripts for inverting the thesaurus data into alphabetical order where each word or phrase was listed along with the number of entries for each part of speech and an entry for each distinct sense identifying the category paragraph and subparagraph to which the word or phrase belongs The resultant thesaurus is thus in the precise format of the combined WordNet index and data files Fellbaum 1998 facilitating thesaurus lookup 3 5 Question Answering Routines For TREC 9 a database of documents was created for each question as provided by the NIST generic search engine A single database was created for the questions themselves The question answering consisted of matching the database records for an individual question against the database of documents for that question The question answering phase consists of three main steps 1 coarse filtering of the records in the database to select potential sentences 2 detailed analysis of the question to set the stage for detailed analysis of the sentences according to the type of question establishing an initial score of 1000 for each sentence 3 extracting possible short answers from the sentences with some adjustments to the score based on matches between the question and sentence database records and the short answers that have been extracted and 4 making a final evaluation of the match between the question s key elements and the short answers to arrive at a final score for the sentence The sentences and short answers were then ordered by decreasing score for creation of the answer files submitted to NIST 3 5 1 Coarse Filtering of Sentences The first step in the question answering phase was the development of an initial set of sentences The discourse entities in the question records were used to filter the records in the document database Since a discourse entity in a record could be a multiword unit MWU the initial filtering used all the individual words in the MWU The question and sentence discourse entities were generally reduced to their root form so that issues of tense and number were eliminated In addition all words were reduced to lowercase so that issues of case did not come into play during this filtering step Finally it was not necessary for the discourse entity in the sentence database to have a whole word matching a string from the question database Thus in this step all records were selected from the document database having a discourse entity that contained a substring that was a word in the question discourse entities MWUs were analyzed in some detail to determine their type and to separate them into meaningful named entities We examined the capitalization pattern of a phrase and whether particular subphrases were present in the Macquarie dictionary We identified phrases such as Charles Lindbergh as a person and hence possibly referred to as Lindbergh President McKinley as a person with a title since president is an uncapitalized word in the Macquarie dictionary Triangle Shirtwaist fire as a proper noun followed by a common noun hence looking for either Triangle Shirtwaist or fire as discourse entities The join between the question and document databases produced an initial set of unique document number sentence number pairs that were passed to the next step 3 5 2 Identification of Key Question Elements As indicated above one record associated with each question contained an unbound variable as a discourse entity The type of variable was identified when the question was parsed and this variable was used to determine which type of processing was to be performed The question answering system categorized questions into six types usually with typical question elements 1 time questions when 2 location questions where 3 who questions who or whose 4 what questions what or which used alone or as question determiners 5 size questions how followed by an adjective and 6 number questions how many Other question types not included above principally why questions or non questions beginning with verbs name the were assigned to the what category so that question elements would be present for each question Some adjustments to the questions were made There was a phase of consolidating triples so that contiguous named entities were made into a single triple Then it was recognized that questions like what was the year or what was the date and what was the number were not what questions but rather time or number questions Questions containing the phrase who was the author were converted into who wrote in those with what is the name of the triple for name was removed so that the words in the of phrase would be identified as the principal noun Other phraseological variations of questions are likely and could be made at this stage Once the question type had been determined and the initial set of sentences selected further processing took place based on the question type Key elements of the question were determined for each question type with some specific processing based on the particular question type In general we determined the key noun the key verb and any adjective modifier of the key noun for each question type For who questions we looked for a year restriction For where questions we looked up the key noun in the Macquarie dictionary and identified all proper nouns in all its definitions hence available for comparison with short answers or other proper nouns in a sentence For what questions we looked for a year restriction noted whether the answer could be the object of the key verb and formed a base set of thesaurus categories for the key noun For size questions we identified the size word e g far in how far For number questions we looked for a year restriction 3 5 3 Extraction of Short Answers After the detailed question analysis processing for each question then examined each selected sentence attempting to find a viable short answer and giving scores for various characteristics of the sentence For time location size and number questions it was possible that a given sentence contained no information of the relevant type In such cases it was possible that a given sentence could be completely eliminated In general however a data structure for a possible answer was initialized to hold a 50 byte answer and the sentence was assigned an initial score of 1000 An initial adjustment to the score was given for each sentence by comparing the question discourse entities including subphrases of MWUs with the sentence discourse entities giving points for their presence and additional points when the discourse entities stood in the same semantic relation and had the same governing word as in the question 1 Time Questions The first criterion applied to a sentence was whether it contained a record that has a TIME semantic relation The parser has specific mechanisms for recognizing prepositional phrases of time or other temporal expressions e g last Thursday During the analysis of the parser output the database records created for these expressions were given a TIME semantic relation We also examined triples containing in or on as the governing word looking for phrases like on the 21st which may not have characterized as a TIME phrase or numbers that could conceivably be years After screening the database for such records the discourse entity of such a record was then examined further If the discourse entity contained an integer or any of its words were marked in the parser s dictionary as representing a time period measurement time month or weekday the discourse entity was selected as a potential answer 2 Where Questions Each sentence was examined for the presence of in at on of or from as a semantic relation or the presence of a capitalized word not present in the question modifying the key noun The discourse entity for that record was selected as a potential answer Discourse entities from of triples were slightly disfavored and given a slight decrease in score If the answer also occurred in a triple as a governing word with a HAS relation the discourse entity from that triple was inserted into the answer as a genitive determiner of the answer 3 Who Questions The first step in examining each sentence looked for the presence of appositives relative clauses and parentheticals If a sentence contained any of these an array was initialized to record its modificand and span The short answer was initialized to the key noun Next all triples of the sentence were examined First the discourse entity possibly an MWU was examined to determine the overlap between it and the question discourse entities The number of hits was then added to all appositives which include the word position of the discourse entity within its span A sentence could have nested appositives so the number of hits can be recorded in multiple appositives The next set steps involved looking for triples whose governing word matched the key verb particularly the copular be and the verb write For copular verbs if the key noun appeared as the subject the answer was the object and vice versa For other verbs we looked for objects matching the key noun then taking the subject of the verb as the answer A test was included here for examining whether the key noun is in the definition a hypernym or thesaurus category of the discourse entity but this was not tested and was removed when the system was frozen Another major test of each discourse entity that contained a substring matching the key noun was whether it was modified by an appositive If this was the case the appositive was taken as a possible short answer the discourse entities of the appositive were then concatenated into a short answer Numerical and time discourse entities were also examined when there was a date restriction specified in the question to ascertain if they could be years and if so whether they matched the year restriction In the absence of a clear sentence year specification the document date was used 4 What Questions The first step in examining the sentences was identical to that of the who questions namely looking for appositives in the sentence and determining whether a discourse entity had overlaps with question discourse entities If the key noun was a part of a discourse entity we would note the presence of the key noun if this occurrence was in a discourse entity identified as an adjective modifier the modificand was taken as a short answer and if this short answer was itself a substring of another sentence discourse entity the fuller phrase was taken as the answer Similarly when the key noun was a proper part of a discourse entity and began the phrase i e a noun noun compound the remaining part was taken as the short answer As with who questions if the key noun was identified as the modificand of an appositive the appositive was taken as the possible answer Similarly to who questions we also looked for the copular be with the key noun as either the subject or object taking the other as a possible answer When the key verb was have and the key noun was equal to the object the subject of have was taken as the short answer In cases like these we would also insert any adjective modifiers of the noun discourse entities at the beginning of the short answer If the key noun was not equal to the discourse entity of the triple being examined we tested whether the key noun against the DIMAP enhanced Macquarie dictionary looking for its presence 1 in the definition of the discourse entity 2 as a hypernym of the discourse entity or 3 in the same Macquarie thesaurus category For example in examining Belgium in response to the question what country where country is not in definition and is not a hypernym since it is defined as a kingdom we would find that country and kingdom are in the same thesaurus category Finally as with who questions we examined TIME and number discourse entities for the possible satisfaction of year restrictions 5 Size Questions For these questions each triple of a selected sentence was examined for the presence of a NUM semantic relation or a discourse entity containing a digit If a sentence contained no such triples it was discarded from further processing Each numerical discourse entity was taken as a possible short answer in the absence of further information However since a bare number was not a valid answer we looked particularly for the presence of a measurement term associated with the number This could be either a modificand of the number or part of the discourse entity itself joined by a hyphen If the discourse entity was a tightly joined number and measurement word or abbreviation e g 6ft the measurement portion was separated out for lookup The parsing dictionary characterizes measurement words as having a measures unit MEASIZE or abbr part of speech so the modificand of the number was tested against these If not so present in the parsing dictionary the Macquarie definition was examined for the presence of the word unit When a measurement word was identified it was concatenated with the number to provide the short answer 6 Number Questions The same criterion as used in size questions was applied to a sentence to see whether it contained a record that has a NUM semantic relation If a selected sentence had no such triples it was effectively discarded from further analysis In sentences with NUM triples the number itself the discourse entity was selected as the potential answer Scores were differentially applied to these sentences so that those triples where the number modified a discourse entity equal to the key noun were given the highest number of points TIME and NUM triples potentially satisfying year specifications were also examined to see whether a year restriction was met In the absence of a clear sentence year specification the document date was used 3 5 4 Evaluation of Sentence and Short Answer Quality After all triples of a sentence were examined the quality of the sentences and short answers was further assessed In general for each question type we assessed the sentence for the presence of the key noun the key verb and any adjective qualifiers of the key noun The scores were increased significantly if these key items were present and decreased significantly if not In the absence of a clear sentence year specification for who what and number questions containing a year restriction the document date was used For certain question types there were additional checks and possible changes to the short answers For location questions where we accumulated a set of proper nouns found in the definition of the key noun the score for a sentence was incremented for the presence of those words in the sentence Proper nouns were also favored and if two answers were found a proper noun would replace a common noun proper nouns also present as proper nouns in the Macquarie dictionary were given additional points Similarly if a sentence contained several prepositional phrases answers from in phrases replaced those from of or from phrases For questions in which the key verb was not be we tested the discourse entities of the sentence against the DIMAP enhanced Macquarie dictionary to see whether they were derived from the key verb e g assassination derived from assassinate For who and what questions when a sentence contained appositives and in which satisfactory short answers were not constructed we examined the number of hits for all appositives In general we would construct a short answer from the modificand of the appositive with the greatest number of hits However if one appositive was nested inside another and had the same number of hits we would take the inside appositive For these questions we also gave preference to short answers that were capitalized this distinguished short answers that were mixed in case For these two question types we also performed an anaphora resolution if the short answer was a pronoun In these cases we worked backward from the current sentence until we found a possible proper noun referent As we proceeded backwards we also worked from the last triple of the each sentence If we found a plausible referent we used that discourse entity as the short answer and the sentence in which it occurred as the long answer giving it the same score as the sentence in which we found the pronoun For size questions we deprecated sentences in which we were unable to find a measurement word We also looked for cases in which the discourse entities in several contiguous triples has not been properly combined such as number containing

    Original URL path: http://www.clres.com/trec9.html (2016-02-11)
    Open archived version from archive

  • Question-Answering Using Semantic Relation Triples
    in an annotation such as number and case being added to the growing parse tree Nodes and possibly further annotations such as potential attachment points for prepositional phrases are added to the parse tree when reaching some end states The parser is accompanied by an extensible dictionary containing the parts of speech and frequently other information associated with each lexical entry The dictionary information allows for the recognition of phrases as single entities and uses 36 different verb government patterns to create dynamic parsing goals and to recognize particles and idioms associated with the verbs the context sensitive portion of the parser The parser output consists of bracketed parse trees with leaf nodes describing the part of speech and lexical entry for each sentence word Annotations such as number and tense information may be included at any node The parser does not always produce a correct parse but is very robust since the parse tree is constructed bottom up from the leaf nodes making it possible to examine the local context of a word even when the parse is incorrect In TREC 8 the parse output was unusable for only 526 of the 63 118 sentences 0 8 percent Usable output was available despite the fact that there was at least one word unknown to the parsing dictionary in 5 027 sentences 8 0 percent 3 3 Document and Question Database Development The key step in the CL Research question answering prototype was the analysis of the parse trees to extract semantic relation triples and populate the databases used to answer the question A semantic relation triple consists of a discourse entity a semantic relation which characterizes the entity s role in the sentence and a governing word to which the entity stands in the semantic relation In general terms the CL Research system is intended to be part of a larger discourse analysis processing system The most significant part of this system is a lexical cohesion module intended to explore the observation that even within short texts of 2 or 3 sentences the words induce a reduced ontology i e a circumscribed portion of a semantic network such as WordNet or MindNet The objective is to tie together the elements of a discourse in this case a document using lexical chains and coreference to create a hierarchical characterization of a document The implementation in TREC 8 does not attain this objective but does provide insights for further development of a lexical cohesion module The first step of this discourse processing is the identification of suitable discourse entities For TREC 8 this involved analyzing the parse tree node to extract numbers adjective sequences possessives leading noun sequences ordinals time phrases predicative adjective phrases conjuncts and noun constituents as discourse entities To a large extent these entities include as subsets named entities and time expressions as single entities although not specifically identified as such in the databases The semantic relations in which entities participate are intended to capture the semantic roles of the entities as generally understood in linguistics This includes such roles as agent theme location manner modifier purpose and time For TREC 8 we did not fully characterize the entities in these terms but generally used surrogate place holders These included SUBJ OBJ TIME NUM ADJMOD and the prepositions heading prepositional phrases The governing word was generally the word in the sentence that the discourse entity stood in relation to For SUBJ OBJ and TIME this was generally the main verb of the sentence For prepositions the governing work was generally the noun or verb that the prepositional phrase modified Because of the context sensitive dynamic parsing goals that were added when a verb was recognized it was possible to identify what was modified For the adjectives and numbers the governing word was generally the noun that was modified The semantic relation and the governing word were not identified for all discourse entities but a record for each entity was still added to the database for the sentence Overall 467 889 semantic relation triples were created in parsing the 63 118 sentences an average of 7 4 triples per sentence The same functionality was used to create database records for the 200 questions The same parse tree analysis was performed to create a set of records for each question The only difference is that one semantic relation triple for the question contained an unbound variable as a discourse entity The question database contained 891 triples for 196 questions an average of 4 5 triples per question 3 4 Question Answering Routines For TREC 8 a database of documents was created for each question as provided by the NIST generic search engine A single database was created for the questions themselves The question answering consisted of matching the database records for an individual question against the database of documents for that question The question answering phase consists of three main steps 1 coarse filtering of the records in the database to select potential sentences 2 more refined filtering of the sentences according to the type of question and 3 scoring the remaining sentences based on matches between the question and sentence database records The sentence were then ordered by decreasing score for creation of the answer file submitted to NIST 3 4 1 Coarse Filtering of Sentences The first step in the question answering phase was the development of an initial set of sentences The discourse entities in the question records were to filter the records in the document database Since a discourse entity in a record could be a multiword unit MWU the initial filtering used all the individual words in the MWU The question and sentence discourse entities were generally reduced to their root form so that issues of tense and number were eliminated In addition all words were reduced to lowercase so that issues of case did not come into play during this filtering step Finally it was not necessary for the discourse entity in the sentence database to have a whole word matching a string from the question database Thus in this step all records were selected from the document database having a discourse entity that contained a substring that was a word in the question discourse entities The join between the question and document databases produced an initial set of unique document number sentence number pairs that were passed to the next step 3 4 2 Refinement of Viable Sentences The second step of the question answering process applied more detailed screening of the sentences This screening involved the application of criteria based on the type of question As indicated above one record associated with each question contained an unbound variable as a discourse entity The type of variable was identified when the question was parsed and this variable was used to determine which type of processing was to be performed during the sentence refinement step The prototype system recognized six question types usually with typical question elements 1 time questions when 2 location questions where 3 who questions who or whose 4 what questions what or which used alone or as question determiners 5 size questions how followed by an adjective and 6 number questions how many Question phraseology not envisioned during the prototype development principally questions beginning with why or non questions beginning with name the were assigned to the what category so that question elements would be present for each question Some adjustments to the question type were made just prior to the refined filtering Specifically it was recognized that questions like what was the year or what was the date and what was the number were not what questions but rather time or number questions Other phraseological variations of questions are likely and could be made at this stage In general the functionality for the screening step involved elimination of sentences from further processing based on criteria described below and initialization of the data structure for holding a 50 byte answer An initial score of 1000 was assigned for each sentence during this process And the number of viable sentences was limited Time Questions The first criterion applied to a sentence was whether it contained a record that has a TIME semantic relation The parser has specific mechanisms for recognizing prepositional phrases of time or other temporal expressions e g last Thursday During the analysis of the parser output the database records created for these expressions were given a TIME semantic relation After screening the database for such records the discourse entity of such a record was then examined further If the discourse entity contained an integer or any of its words were marked in the parser s dictionary as representing a time period measurement time month or weekday the discourse entity was selected as a potential answer Where Questions Each sentence was examined for the presence of in as a semantic relation The discourse entity for that record was selected as a potential answer Who Questions There was no elimination of sentences for these questions All sentences were continued to the next step A potential answer was developed by searching for a record that had the same governing word as that of the unbound variable For example who created would show create as the governing word a match would be sought for a sentence record with create as the governing word The head noun of the discourse entity would be the potential answer What Questions There was no elimination of sentences for these questions All sentences were continued to the next step A potential answer was developed by searching for a record that had the same governing word as that of the unbound variable The discourse entity would be the potential answer Size Questions The first criterion applied to a sentence was whether it contained a record that has a NUM semantic relation The parser has specific mechanisms for recognizing numbers During the analysis of the parser output the database records created for these expressions were given a NUM semantic relation If these expressions were followed by a noun the noun would be captured as the governing word After screening the database for NUM records the governing word of such a record was then examined further If any of the words of the discourse entity were marked in the parser s dictionary as representing a measure a unit or a measurement size the discourse entity a space and the governing word were constructed as a potential answer Number Questions The same criterion as used in size questions was applied to a sentence to see whether it contained a record that has a NUM semantic relation In these cases the number itself the discourse entity was selected as the potential answer 3 4 3 Sentence Scoring Each sentence that passed the screening process of the previous step was assigned a base score of 1000 and was then evaluated for further correspondences to the question database records Each record of the question database was examined in relation to each record for the sentence in the document database Points were added or deducted based on correspondences If the discourse entity in the sentence record is a proper or complete substring of the discourse entity in the question record 5 points are added when the semantic relation or governing word match completely Five points are deducted if the match is not complete If the question discourse entity is an MWU each word of the MWU is examined against the discourse entity in the sentence record If a word in the MWU is a substring of the sentence discourse entity 5 points are added to the score If the last word of the MWU is a substring of the sentence discourse entity generally corresponding to the head noun of the MWU 20 points are added When we have a substring match we also test the semantic relation and the governing word of the two records adding 5 points for each match In general then points are added because of matches in the semantic relation and governing word fields but only when there is at least a partial match between the discourse entities of the two records Thus the focus of the matching is on the structural similarity between the question records and the sentence records i e on whether the discourse entities participate in the same type of semantic relations with the same governing word Many of the sentences passed to this step will have minimal changes to their scores while those that match on structural similarity will tend to separate out relative to other sentences in the documents After scores have been computed for all sentences submitted to this step the sentences are sorted on decreasing score Finally the output is constructed in the desired format for both 50 byte and 250 byte answers with the original sentences retrieved from the documents 4 TREC 8 Q A Results The official score for the CL Research 250 byte sentence submission was 0 281 This means that over all questions the CL Research prototype provided a sentence with the correct answer as its 4 th choice This compares to an average score of 0 332 among the 25 submissions for the TREC 8 Q A track i e a correct answer in the 3 rd position In examining all the answers submitted by the various teams it appears that the CL Research prototype was the only one that submitted full sentences as opposed to a 250 byte window around an answer The CL Research prototype submitted sentences containing correct answers for 83 of the 198 questions Compared to the median scores for the 198 questions the CL Research prototype was better than the median for 40 questions equal for 109 questions and less for 49 questions Since CL Research did not provide a correct sentence for 115 questions this means that for 66 questions the median score among the 25 participating systems was unable to provide a correct answer Finally the CL Research prototype equaled the best score for 46 questions and the worst score for 115 questions i e the questions where CL Research did not provide a correct answer The CL Research prototype performed better than the average score of 0 332 for 56 questions On these questions the average score was 0 447 that is a correct answer was given as the 2 nd ranked answer over all participating systems Thus the questions for which the CL Research prototype provided correct answers were in general easier than the remaining questions However among these questions 39 were easier than the average and 17 were more difficult than the average of 0 332 In other words the CL Research prototype did not just answer the easier questions but was able to answer some of the more difficult questions as well 5 Analysis The results achieved by the CL Research prototype seem to indicate that the general approach of matching relational structures between the questions and the documents is viable The prototype selected 937 sentences and at least 83 correct sentences out of over 63 000 sentences in the source documents so the approach clearly performed much better than random Since the prototype was an initial implementation focused primarily on just providing a set of answers without any evaluation of alternative approaches and without inclusion of several generally available research findings e g named entity recognition time phrase computations and coreference resolution the approach seems to have much promise Even with the claimed level of performance however it seems that the official results significantly understate the viability of the general approach in the prototype This statement is based primarily on the fact that only the top 10 documents were used in an attempt to answer the questions when frequently an answer did not appear in any of these documents There are several other simple changes such as resolution of relative time phrases to a specific date where the appropriate phrase was identified in the prototype as one of the submitted answers which would result in a higher score Overall based on post hoc analysis of the cases where the CL Research prototype did not provide the correct answer it is estimated that a more accurate overall score is approximately 0 482 This estimate is based on post hoc analysis of 25 percent of the questions where no correct answer was provided The reasons justifying this estimate are detailed below Cutting off sentences For three questions the limitation to 250 byte strings cut off the portion that would have recognized by NIST evaluators as correct In each case the appropriate sentence was ranked first adding 0 015 to the overall score Inclusion of document containing answer Post hoc analysis has revealed that for one third of the questions the answer was not in the top 10 documents included in the database for the question When a document containing the answer was added to the database correct answers were identified in two thirds of the cases with an average inverse rank of 0 320 adding 0 153 to the overall score Relative time resolution One fifth of the questions answered incorrectly required resolution of relative time phrases last Thursday today two years ago The functionality for this time resolution is essentially already present in the CL Research prototype and the document date necessary for this computation is contained in the document databases The average inverse rank for the sentences provided in the prototype results is 0 292 adding 0 033 to the overall score There are some considerations in addition to the above that also would portray the CL Research prototype more favorably but for which no immediate estimate of improvement in the overall score is claimed For 6 percent of the incorrect answers a sentence containing the correct answer was generated and was tied with an answer that was submitted However because the sentence was not generated in a timely order it was not submitted The correct

    Original URL path: http://www.clres.com/trec8.html (2016-02-11)
    Open archived version from archive

  • senseval
    for an adjective For nouns identified by the parser as being noun modifiers we did not require that there be a sense marked as mod However if there was such a sense and the tagged word was so used we dumped the existing set of viable senses and started anew unless we already had multiple senses marked as noun modifiers For verbs the presence of an object phrase a noun phrase an infinitive clause or a that clause was key It was first necessary to find the verb type among the features for the sense If found we could then determine whether the presence or absence of an object was consistent with the type If consistent we added the sense to the set of viable senses At this time the verb processing is very rudimentary If a verb was marked as ergative or absolute we viewed this as a match If there was an object and the sense was marked transitive ditransitive or reflexive we viewed this as a match If there was no object and the sense was marked intransitive we viewed this as a match If we had already identified a verb type but had not yet made a match and we came upon the feature passive we recorded a match if the verb was a past participle form and the HECTOR data indicated rarely often usually or only for this sense Finally if we had flagged the present participle form found a verb type but not yet a match and dfrm data from the HECTOR sense matched the tagged token we recorded a match If after processing all features for the sense we had recorded a verb match we added the sense to the set of viable senses The kind field in the HECTOR data emerged as perhaps the most significant recognition device several clues in the HECTOR data were rewritten into kind features as it became apparent that this field could serve very efficiently In the HECTOR data the kind field is designed primarily to list compounds and combinations in which a sense appears For the most part this applies to nouns and provides kinds that may frequently be separate headwords e g brass band cocktail onion and football shirt As initially implemented the presence of a kind feature simply resulted in examining the word immediately prior to the tagged word It was then observed that in noun phrases the kind could be separated from the tagged word by intervening words as a result we allowed the recognition of the kind if the value of this feature appeared in the same noun phrase It was next observed that the functionality to accomplish this could be generalized to handle many of the HECTOR clues equations i e where the target word in a clue was represented by an equals sign Satisfaction of a kind feature is deemed absolute in sense selection That is if the value of the kind attribute is matched the current set of viable senses for the tagged word is discarded and we only allow further senses to be added to the set if they similarly match the value of a kind attribute There are only a few words where an identical kind equation appears under more than one sense As currently implemented a kind specification has the following format 1 no equals sign is assumed in which case the value is interpreted as specifying what must appear prior to the target word 2 if there is an equals sign then what appears before the equals sign must appear prior to the target word and what appears after the equals sign must appear after the target word 3 any word appearing in the equation gives rise to a search for the word in its root form 4 if a word is quoted it must appear exactly as given 5 a word type may be specified in brackets e g prpos indicates that a possessive pronoun must appear in that position so that a phrase like on one s knees would be recognized for any possessive pronoun before the word knees 6 an asterisk in the equation allows 0 or at most three intervening words and 7 the presence of a non quoted verb in the equation requires an object e g to handle cases like give a promise or cover one s bets We have not yet implemented the more elaborate equations available from the HECTOR clues such as those specifying grammatical codes and lexical sets There are some bugs in the kind processing and we have not yet examined cases in the training or evaluation data where phrases are present but not accurately recognized but it is clear that this is a very powerful and efficient mechanism for recognizing senses that have such specifications After all features of a sense have been examined we perform some default processing with respect to the sense If the sense has not yet been added to the set of viable senses we add it unless we have set a flag that indicates it should not be added This attempts to ensure that the set of viable senses is not empty and that senses are not excluded unless we have had some good reason for discarding them There are still some bugs in this process resulting in empty answers for one to three percent of the evaluation texts 2 3 5 Selection of an answer After adding to the set of viable senses and giving points for certain characteristics the set of viable senses is sorted and the one with the highest score is returned The sorting preserves the sense order present in the DIMAP entry for the tagged word so that the most frequent sense is selected from among those that have not otherwise been filtered out In most cases however the ordering does not reflect the HECTOR order but rather the frequency pattern present in the dry run data Further investigation of the effect of this reordering is necessary This is the point in the processing where it was intended to incorporate semantic or other criteria for sense selection However time constraints precluded any developments in this area Indeed it may be said that the fruitfulness of the prior steps lowered the priority for attending to this area Finally if the set of viable senses was empty at this point an empty answer was returned resulting in the SENSEVAL scoring of the answer as not attempted In the final run empty answers were returned for 4 1 percent of the texts although some of these arose from earlier parts of the program We need to separate out which of these are due to an empty set of viable answers As of the date of this paper we have now reduced the empty answers to 2 3 percent 3 Developmental Process The development of the CL Research system has been a fascinating and a satisfying experience primarily because of its incremental nature which provided immediate feedback on changes with demonstrable improvements in success with the SENSEVAL training data The effort can be divided into two segments 1 an unfocused phase consisting primarily of familiarization with the Proximity parser and the HECTOR dry run data and development of ancillary programs to facilitate gauging of progress and 2 a focused phase working with the HECTOR training data implementing functionality to perform the parsing and return answers that could be scored All efforts were focused on familiarization with the Proximity parser until mid May particularly with a new version of the parser received near the end of April We began working with the HECTOR dry run data in mid May primarily focusing on familiarization with HECTOR dictionary entries for conversion to DIMAP entries but with continued examination of the details of the Proximity parser code and data structures for the purpose of identifying how to make use of its parsing output along with development of the infrastructure programs primarily the integration within DIMAP These efforts continued until mid July when the training data and the final set of HECTOR definitions were obtained This marked the beginning of the second phase and the development and implementation of the analysis strategy described in the previous section The second more focused phase was marked by concentration on the exigencies of the moment what worked and where was the next most likely source of the greatest improvement in the results To this end we might focus on any of the three major components of the system the parser the DIMAP dictionary or the analysis strategy The development revolved around the 29 training corpora that were provided The objectives were 1 to create DIMAP dictionaries from each HECTOR dictionary 2 to ensure that the dictionary entries used by the parser were properly configured and 3 to test algorithms against the training corpora To gauge progress we maintained a set of statistics on the performance of the system against the individual corpora These statistics included not only the percent of correct assignments as measured by the assignments included in the training data but also a characterization of the failures the empty assignments described in the previous section This enabled us to see where the system seemed to be experiencing the greatest failures We began with the smaller training corpora covering words with small numbers of definitions in the three major parts of speech onion invade and wooden We focused first on simply getting the system to run against the training corpora since at first there were many difficulties in interacting properly with the parsing output We gradually improved this interaction up until the final evaluation run Once over the first hurdle this became less and less of a problem but there were still difficulties in the final run some of which have since been removed Once beyond the parsing difficulties we were able to focus on implementing the functionality constituting the analysis strategy We worked first on ensuring a connect between the parse output and the basic parts of speech For nouns and adjectives this was generally straightforward except in cases when they were capitalized and part of proper noun phrases For verbs this posed more of a difficulty For example invading might be identified by the parser as either an adjective or a verb An important next step for verbs was the recognition of whether there was an object We moved on to words that had somewhat more complex sets of definitions but ones that were generally unambiguous once appropriate recognitive devices had been set in place The word shirt was particularly useful since it contained a number of senses that called simply for the recognition of shirt used in set phrases or as a noun modifier We were able to implement some basic functionality to recognize phrases making use of all three components of the system The parser has the ability to recognize phrases as a single unit our implementation involved merely ensuring that the parser dictionary included such phrases Similarly DIMAP has mechanisms for indicating that a word is part of a phrasal entry our implementation ensured that conversion of HECTOR dictionary data to DIMAP format included the creation of such phrasal entries Finally in the analysis functionality we made sure that the Proximity results for phrases were properly used in conjunction with the DIMAP entries and then added part of the functionality to make use of a kind feature to recognize phrases such as sweat shirt But we still had not developed a mechanism for handling phrases like in one s shirtsleeves At this point the last day of July we had most of the mechanisms in place for making complete passes through the training data We were still experiencing some substantial difficulties in getting complete runs but had managed to process all the training data failing on assignments for 12 percent of the 13 000 texts but with an overall initial recall of 48 percent It was at this point that we obtained the evaluation data and could begin to determine what difficulties we would have in processing it An important part of the development process was maintaining the statistics for each system run against the training and evaluation corpora These statistics were useful not only for identifying what to do next but could also measure the effect of any changes One of the first surprizes from analysis of the changes came from discovering that a change to improve the results for one word could lead to a degradation in the results for another word Examination of the changes particularly when a degradation occurred proved to be an excellent debugging tool enabling us to focus in on problematic parts of the code On some occasions a degradation in performance came from more rigorous and proper use of the HECTOR dictionary data For example better routines for finding the object of a verb making sure that the parser identified the NP as the object rather than just using a following NP led to a disheartening reduction in performance but correctly so Maintaining these statistics was also extremely useful for testing hypotheses or different methods of handling some aspect of the disambiguation Thus for example it was found that the HECTOR ordering of definitions did not correspond to the frequencies present in the training data changes were made for some of the HECTOR dictionaries to accord with these frequencies in some cases resulting in significant improvement in results We continued to work with the training data until the last minutes on the due date for the final evaluation results We had essentially only 10 days for development of the analysis strategy described in the previous section from the time when we first achieved an overall successful pass through the training data During this time we moved the results to failing on assignments for 3 9 percent of the training texts and an overall precision of 60 5 percent These results are based on fine grained scoring and are actually low since we did not take into account ambiguous assignments by the coders Curiously for some of the words we think now that these were words having hyphenated forms we had significant reductions in performance right at the end We were able to spend very little time in examining the results from the training data and none at all on the final evaluation corpora other than to ensure that we were producing output and finding out later that we had missed 2 3 percent of the texts and produced empty results for another two percent because of some problem with hyphenated words in the DIMAP dictionary used in the final run We also spent almost no time in looking at the texts in either the training or the evaluation corpora The little time that was spent on the training texts was primarily to examine the parse tree output in order to develop the next general changes in the analysis strategy and to find out general sources of problems We did not have the opportunity to systematically examine incorrect assignments 4 Results To be completed In Tables 1 and 2 we present the overall results and the results by task i e major part of speech first for recall percent correct of total and then for precision percent correct of total attempted as scored by the SENSEVAL coordinators for the coarse mixed and fine grained assessments The precision results are based on the SENSEVAL scoring routine determination of whether a given text had been attempted In Tables 3 and 4 we present analogous results by task i e individual word files In presenting these tables we provide some initial discussion primarily for the purpose of highlighting significant aspects in the jumble of numbers We go into more detail in the next section Table 1 Recall for major tasks percent correct of total number of texts Task Number of Texts Grain Attempted Coarse Mixed Fine Overall 8448 59 3 56 8 51 9 92 74 Noun 2756 67 8 63 0 58 1 91 11 Verb 2501 50 7 49 7 44 3 94 56 Adjective 1406 60 1 56 4 53 2 95 59 Indeterminate 1785 57 5 57 3 51 8 90 48 While we believe that the percent correct of the total number of texts is the best reflection of overall performance we think the results in Table 1 may slightly overstate the performance for the CL Research system in comparison with other systems as of the final evaluation run slightly above the median position We believe this reflects primarily the fact that the CL Research system was able to attempt more texts than most of the other systems Of special note here are 1 the relatively low recall for the verbs perhaps a reflection of their greater ambiguity 2 the lower successful attempts for nouns perhaps reflecting an inability to successfully deal with noun phrases and 3 the mirroring of the overall results in the corpora for which the part of speech first had to be determined since there was no special processing to deal with corpora in individual parts of speech Table 2 Precision for major tasks percent correct of attempted texts Task Number of Texts Grain Attempted Coarse Mixed Fine Overall 7835 63 9 61 2 55 9 92 74 Noun 2511 74 4 69 1 63 7 91 11 Verb 2365 53 7 52 6 46 9 94 56 Adjective 1344 62 9 59 0 55 7 95 59 Indeterminate 1615 63 5 63 3 57 2 90 48 In the CL Research system as described in the previous section empty answers were returned for texts for which the system had particular problems These answers were treated by the SENSEVAL scoring program as if no attempt had been made Strictly speaking this is inaccurate for the CL Research system and such results should have been scored as incorrect However as will be discussed in the next section the 613 texts characterized as not attempted are largely due to minor bugs that had not been examined at the time of the final evaluation run The precision results shown in Table 2 therefore are more reflective of the CL Research system performance that would occur when the bugs are corrected extrapolated to when the percent attempted moves closer to 100 percent as is occurring from modifications made since the submission of results Table 3 Recall for individual words percent correct of all texts Task Number of Texts Grain Attempted Coarse Mixed Fine Accident n 267 88 8 82 8 80 1 93 63 Amaze v 70 90 0 90 0 90 0 90 00 Band p 302 82 8 82 8 82 8 93 71 Behaviour n 279 92 8 92 8 84 2 96 77 Bet n 274 43 1 43 1 31 8 94 53 Bet v 117 51 3 48 7 47 9 94 02 Bitter p 373 44 2 44 2 44 0 88 47 Bother v 209 40 2 40 2 36 4 94 74 Brilliant a 229 49 3 49 3 39 7 94 32 Bury v 201 29 9 29 6 22 4 93 03 Calculate v 218 53 7 53 7 44 0 97 71 Consume v 186 44 1 40 1 35 5 93 01 Deaf a 122 73 0 73 0 56 6 88 52 Derive v 217 47 5 47 5 47 5 90 32 Disability n 160 90 6 90 6 83 7 96 25 Excess n 186 74 7 49 7 41 4 90 86 Floating a 47 55 3 55 3 55 3 95 74 Float n 75 40 0 34 7 34 7 100 00 Float v 229 35 8 32 3 28 8 94 76 Generous a 227 37 0 37 0 37 0 98 24 Giant a 97 1 0 1 0 0 0 96 91 Giant n 118 72 0 56 8 49 2 100 00 Hurdle p 323 36 5 36 5 9 3 95 98 Invade v 207 49 3 48 6 30 4 93 72 Knee n 251 67 7 61 2 57 8 88 05 Modest a 270 65 9 64 9 64 4 97 04 Onion n 214 82 2 82 2 82 2 97 20 Promise n 113 72 6 63 3 60 2 95 58 Promise v 224 68 3 67 2 56 7 97 32 Rabbit n 221 80 1 79 4 78 7 85 52 Sack n 82 57 3 57 3 57 3 75 61 Sack v 178 82 6 82 6 82 6 91 01 Sanction p 431 70 3 70 3 70 3 90 72 Scrap n 156 54 5 48 4 38 5 94 23 Scrap v 186 79 6 79 6 73 1 98 39 Seize v 259 26 3 25 1 25 1 96 91 Shake p 356 53 4 52 5 49 7 84 55 Shirt n 184 59 2 54 1 48 4 59 78 Slight a 218 77 5 54 6 54 6 91 74 Steering n 176 5 1 5 1 5 1 97 16 Wooden a 196 94 4 94 4 94 4 100 00 In the results above we particularly note the low percentage attempted for shirt shake bitter knee sack n deaf and rabbit Also the particularly low fine grained scores for giant a hurdle and steering Of these we attribute the poor showing for deaf rabbit hurdle and steering in part to the fact that there were no training data for these words we simply devoted little time to consideration of the HECTOR dictionary entries for these words For giant a we note that the Proximity parser dictionary did not have an adjective sense for giant so that the parser force fit all parses into other interpretations primarily a noun modifier sense For the remaining words of special note we found that the DIMAP dictionary entries for hyphenated forms of these words was not operating correctly This particularly affected recognition of T shirt in the shirt corpus accounting for the low percentage attempted Table 4 Precision for individual words percent correct of attempted texts Task Number of Texts Grain Attempted Coarse Mixed Fine Accident n 250 94 8 88 4 85 6 93 63 Amaze v 63 100 0 100 0 100 0 90 00 Band p 283 88 3 88 3 88 3 93 71 Behaviour n 270 95 9 95 9 87 0 96 77 Bet n 259 45 6 45 6 33 6 94 53 Bet v 110 54 5 51 8 50 9 94 02 Bitter p 330 50 0 50 0 49 7 88 47 Bother v 198 42 4 42 4 38 4 94 74 Brilliant a 216 52 3 52 3 42 1 94 32 Bury v 187 32 1 31 8 24 1 93 03 Calculate v 213 54 9 54 9 45 1 97 71 Consume v 173 47 4 43 1 38 2 93 01 Deaf a 108 82 4 82 4 63 9 88 52 Derive v 196 52 6 52 6 52 6 90 32 Disability n 154 94 2 94 2 87 0 96 25 Excess n 169 82 2 54 7 45 6 90 86 Floating a 45 57 8 57 8 57 8 95 74 Float n 75 40 0 34 7 34 7 100 00 Float v 217 37 8 34 1 30 4 94 76 Generous a 223 37 7 37 7 37 7 98 24 Giant a 94 1 1 1 1 0 0 96 91 Giant n 118 72 0 56 8 49 2 100 00 Hurdle p 310 38 1 38 1 9 7 95 98 Invade v 194 52 6 51 8 32 5 93 72 Knee n 221 76 9 69 5 66 1 88 05 Modest a 262 67 9 66 9 66 4 97 04 Onion n 208 84 6 84 6 84 6 97 20 Promise n 108 75 9 66 2 63 0 95 58 Promise v 218 70 2 69 0 58 3 97 32 Rabbit n 189 93 7 92 9 92 1 85 52 Sack n 62 75 8 75 8 75 8 75 61 Sack v 162 90 7 90 7 90 7 91 01 Sanction p 391 77 5 77 5 77 5 90 72 Scrap n 147 57 8 51 4 40 8 94 23 Scrap v 183 80 9 80 9 74 3 98 39 Seize v 251 27 1 25 9 25 9 96 91 Shake p 301 63 1 62 1 58 8 84 55 Shirt n 110 99 1 90 5 80 9 59 78 Slight a 200 84 5 59 5 59 5 91 74 Steering n 171 5 3 5 3 5 3 97 16 Wooden a 196 94 4 94 4 94 4 100 00 In general the precision results reflect the recall results with an expected increase whenever there was less than 100 percent attempted However the change was much more dramatic for words where there was a relatively lower percent attempted particularly for shirt deaf knee rabbit sack n and shake We also note the significant grain effect for several words giant n hurdle slight shirt scrap n and promise n 5 Examination of Results and Possible Improvements In this section we first describe the CL Research systems performance against other systems and relative to the best baseline results We then examine some of the reasons for our failures and examine the prospects for improving our scores As discussed in section 3 describing the development process our examination of the submitted results provides the basis for making improvements In this regard it is important to distinguish between immediate changes that can be made to improve performance and those that require further research The immediate changes are essentially bug fixes that would have been made without recourse to the answers making such changes is important to a more accurate assessment of the CL Research system In this section we discuss 1 bug fixes and immediate changes 2 changes that appear viable from a more complete exploitation of the available resources and 3 research efforts that would almost assuredly lead to improved performance 5 1 Overall Assessment The precision scores as provided for the CL Research system in Table 2 were used as the basis for comparing the SENSEVAL systems Although the SENSEVAL documentation suggests comparisons only with systems of the same type we first present such an overall comparison With respect to all systems the CL Research system performed better than the average at the fine grained level and below average at the mixed and coarse grained levels Our system performed at levels below the best baseline for all systems at all grains Table 5 shows our system s performance on the individual SENSEVAL tasks in comparison to all systems Table 5 Performance of CL Research System Compared to All Other Systems on Individual Tasks n 41 Relative Score Grain Coarse Mixed Fine Best 3 2 2 Above average 19 19 20 Below average 19 20 17 Worst 0 0 2 The CL Research system is an all words system that is it does not set parameters based on a required set of training data and it does not involve hand crafting of definitions In addition the system is theoretically designed to scale up without any training data or hand crafted definitions In the practicalities involved in the system s development neither of these conditions were satisfied in an absolute sense since we attempted to order definitions according to the frequencies observed in the training data although this is thought to have actually degraded our performance and we did some hand crafting of definitions primarily since we had not perfected our program for automatic conversion of Hector dictionary data Thus we believe that our system is most appropriately compared to other all words systems The CL Research system attained the best score at the fine grained level among all words systems and was above average at the mixed and coarse grained levels At the mixed grained level our system was at 61 2 percent compared to the best system at 61 6 percent with our recall at 56 8 percent compared to 3 1 percent of the best system At the coarse grained level our system was at 63 9 percent compared to the best system at 65 0 percent with our recall at 59 3 percent compared to 19 8 percent of the best system Table 6 shows our performance on the 41 individual SENSEVAL tasks Table 6 Performance of CL Research System Compared to All Word Systems on Individual Tasks n 41 Relative Score Grain Coarse Mixed Fine Best 16 18 19 Above average 16 14 12 Below average 7 7 6 Worst 2 2 4 We believe that the relative comparisons may not provide the best indicator of our system s performance We suspect that a more absolute assessment may be provided by comparing our performance against the best baselines Overall with respect to all systems our system performed below the precision levels of the best baseline Similarly with respect to all words systems our system performed better than the best baseline In Tables 7 and 8 we present our comparison against the best baseline for the 41 individual tasks Note that even though in Table 8 we performed below the best baseline on individual tasks our total scores were still above the baseline Table 7 Performance of CL Research System Compared to Best Baseline for All Systems on Individual Tasks n 41 Relative Score Grain Coarse Mixed Fine Above baseline 11 10 10 Below baseline 30 31 31 Table 8 Performance of CL Research System Compared to Best Baseline for All Word Systems on Individual Tasks n 41 Relative Score Grain Coarse Mixed Fine Above baseline 19 19 16 Below baseline 22 22 25 We have taken very little time to examine the reasons for the successes of the CL Research system We have not parceled out the contributions of individual steps and we have hardly looked at the texts themselves or the parse trees only focusing primarily on failures Our observations at this time are therefore somewhat scanty on the reasons for success with more insights provided in the analyses of the failures We focus here primarily on the fine grained recall since as will become clear we expect improvements to eliminate almost all SENSEVAL instances of not attempted and because the CL Research system does not yet exploit the HECTOR hierarchy As would be expected the CL Research system succeeds best when a HECTOR entry has few senses of a given part of speech and the first sense occurs much more frequently than the other senses This is the case for accident amaze band behaviour disability onion rabbit sack v scrap v and wooden The system does well discriminating the part of speech without requiring special handling of words which have multiple parts of speech the p or indeterminate files and bet float giant promise sack and scrap although still with difficulties when there are several senses of the same part of speech The system also seems to do well in picking up phrasal entries even those that involve inflected forms and interpolated elements not considered part of the phrase The system seems to have a design that can handle syntactic discriminators of senses reasonably well when they are available However many of these particularly those available in the HECTOR clues have not yet incorporated into the analysis this is reflected in the overall lower performance for verbs which would be somewhat lower were it not for relatively high recall percentages for a few verbs The system has several intervention points where semantic analysis can be performed This enabled some of the explorations described in the analysis strategy But nothing was implemented in time for the final evaluation run Connections were made to WordNet but were not used in analyzing the parse trees for selectional restrictions available in the HECTOR clues and definitions We also explored the possibility of building and using a set of semantic relations based on parsing the HECTOR definitions in the manner of MindNet described generally in Richardson et al 1998 and specifically in Richarson 1997 or as in Barriere 1997 but were unable to implement the necessary steps at this time By virtue of the development process and not as a conscious design the CL Research system seems to demonstrate the viability of using local information for making sense selections This was definitely an emergent property The system also suggests that a sieve approach works allowing only viable senses to pass and considering all possible parts of speech we had hoped to achieve a more organized hierarchization of the senses but were unable to implement it in sufficient time 5 2 Oops and Darn In this section we provide details of bugs that have been fixed and of other immediately obvious changes that have been made since the final evaluation results were submitted These details are provided to make it clear that these changes are not merely speculative Clearly this examination process can continue for some time so we provide those complete as of the date of this paper All discussion in this section describes only results in the fine grained analysis and focuses on the recall percentage since all indications are that we will be able to eliminate almost all of the 613 not attempted assessments in which case there will be no difference between the recall and precision results The first problem that was observed was the fact that the CL Research system missed 195 texts Most of these cases 191 were due to a bug in the preprocessing phase It was observed that many of the texts contained an extraneous closing quotation mark after the sentence We attempted to remove it even though it gave rise to no difficulties in parsing The routine for doing this did a forward search for a quotation mark and then a backward search erasing all text from the quotation mark to the end of the text when the forward and backward searches resulted in the same character in the sentence This assumed that a text would have balanced quotation marks and did not envision an opening quotation mark without a closing one As a result when there was an opening mark somewhere early in the sentence without a balancing closing mark we erased all material from the opening mark to the end of the string For the 191 cases this meant that there was no tagged word and thus not even an empty answer for the text Eliminating this erasure increased our overall attempted rate by 2 4 percent to 95 1 percent we obtained correct answers for 110 of these now analyzed texts 57 6 percent with a resultant overall increase in recall of 1 2 percent There were also 9 cases where the first text of a SENSEVAL file had not been analyzed We found this was due in part to the fact that these corpus files such as bury v had a first line that identified the part of speech for that file When we removed these lines we were successfully able to process 8 of those cases one still remains mysterious In examining results for excess we wondered why instances involving the prepositional phrase in excess of and the adverbial phrase to excess were being assigned empty answers despite the fact that both the parser and the relevant DIMAP dictionary performed appropriately We found that the relevant sense was not being added to the set of possible answers despite its not being excluded by any of the tests We modified the code to ensure that senses not excluded were added to the set For excess the

    Original URL path: http://www.clres.com/senseval.html (2016-02-11)
    Open archived version from archive

  • REQUIREMENTS OF TEXT PROCESSING LEXICONS
    moon range vi 6 change within limits reform vi change for the better resinify vi 1 change into a resin rote vi change by rotation run vi 11b change to a liquid state run into vt la change into solate vi change to a sol specialize vi 3 change adaptively transfer vi 2 change from one vehicle or transportation line to another transform vi CHANGE transship vi change from one ship or conveyance to another turn vi 3b 1 change from ebb to flow or flow to ebb turn vi 4c l change from submission or friendliness to resistance or opposition usu used with against turn vi 6b 1 CHANGE used with into or to turn vi 6b 2 change to turn off vi 2b change to a specified state waver vi lb change between objects conditions uses or otherwise weaken vi 2 change from a complex to a simple sound as from a diphthong to a long vowel change from a strong to a weak sound change from an open to a close vowel whiffle vi lb 2 change from one course or opinion to another as if blown by the wind 4 1 Syntactic Rules and Usage Notes The first requirement for a sense selection network is that it should contain all the meanings of each set of homographs For example the SSN for change should contain all its noun and verb definitions Since a dictionary may contain several homographs in the same part of speech e g bore has three distinct verb entries all the definitions of each homograph would have to be combined into one SSN The first task of a parser then is to identify the correct part of speech for each word encountered in an utterance in so doing of course the parser may have to deal with inflected forms of a word To some extent syntactic parsing may permit further discrimination in the SSN In fact it may eventually be possible to group many definitions of a word according to the patterns of the syntactic context in which they can occur This notion was previously explored with some success see Earl 1973 for details and other references under the rubric of word government The extent to which this notion can be used for sense discrimination can be determined only after each SSN is elaborated i e only after determining how much sense discrimination must rely on semantic considerations Clearly if syntactic parsing can do the job a computer system will be much more efficient Certainly in the case of transitive and intransitive verbs or verbs which use particles syntactic parsing will be very useful in traversing the SSN Many verbs have both transitive and intransitive definitions for such verbs answering the question through syntactic parsing whether the verb has an object can provide one branching node in its SSN There is also a large number of transitive verbs in the dictionary with definitions which specify the object that must be present for them to be the applicable sense For example bail has one sense viz to clear water from a boat by dipping and throwing over the side which requires the object to be the word water In many cases the object is specified generically e g two senses of abandon specify the object as oneself indicating that the object must be a reflexive pronoun another sense of bail specifies the object as personal property indicating that the object must satisfy this selectional restriction The nature and treatment of selectional restrictions is discussed in section 4 4 In these cases questions about the object during syntactic parsing can provide additional branching nodes in the SSN Another significant class of definitions that may be recognized syntactically arises from verbs which take an adjective complement The applicable definitions of such verbs always end with the phrase to be e g one sense of feel is defined by perceive oneself to be Thus for verbs with this type of definition or for verbs defined by a verb which takes an adjective complement a syntactic question regarding the presence of an adjective complement can provide a branching node In W3 many verb definitions have accompanying usage notes which provide information about the use of the verb being defined usually in the form of a comment on idiom syntax semantic relationship status or various other matters Of interest here are those usage notes which identify a particular idiom in which the particular sense of the verb is used an accompanying particle such as up or out or an accompanying prepositional phrase For example 520 of the 788 senses of the verb take all of which would be included in a complete SSN for this verb involve some peculiarity of usage identified in the dictionary Four senses of change labeled 2a 2b and 4b in Table 1 have usage notes three definitions in which change is used as shown in Table 2 fade turn vi 4c 1 and turn vi 6b 1 also have usage notes The comments made in these usage notes can be used to formulate branching questions for an SSN although not as directly as perhaps would be desired These usage conditions do not specify that the presence of the idiom particle or preposition indicates the applicability of the definition but only that the absence of the condition indicates the nonapplicability of the definition 4 2 Preposition Definitions Before continuing with the description of how to build a sense selection network it is necessary to digress into a discussion of prepositional definitions since they will play a crucial role in attempting to develop semantic representations of definitions A preposition is defined as a linguistic form that combines with a noun pronoun or noun equivalent to form a phrase that typically has an adverbial adjectival or substantival relation to some other word Prepositions are few in number I have identified 126 in W3 half of which are phrases but rich in significance for text processing where they are typically used to identify conceptual cases However from my examination of preposition definitions I do not believe their significance has been fully exploited Bennett 1975 asserted that spatial and temporal prepositions a high percentage of all prepositions lead to 23 primitive conceptual cases even though in W3 the number of their definitions is at least two orders of magnitude higher The difference seems to lie in the apparent polysemy which as Bennett says arises from the inclusion in prepositional definitions of redundant features already determined by the environment In other words many preposition definitions contain information about the context surrounding the preposition I believe such redundancy can be exp oited in developing a semantic parser which will have a much greater facility for the type of conceptual case resolution that Small is concerned with Like verbs prepositions appear to form a closed system in which they are defined in terms of other prepositions However unlike verbs their primitives appear to be more easily identified Of approximately 1400 definitions in W3 70 percent are defined in terms of other prepositions 20 percent are defined only by usage notes and 10 percent are defined by verb forms The usage note definitions which have the appearance of primitives uniformly begin with the phrase used as a function word to indicate It is what follows the word indicate that can be used in developing the parser As mentioned above in its definition a preposition forms a relation between its object and some other word The nature of this relation is what follows the word indicate in the usage notes What I have found is that such relations follow certain patterns which can be articulated in formal recognition rules 1 for inclusion in a semantic parser 2 for developing a semantic representation of verb definitions and 3 for determining how to drive the parser Usage note definitions of prepositions may specify a condition i e selectional restriction that the object of the preposition must satisfy e g that it is an age a time a state or a group a condition that must be satisfied by the context surrounding the prepositional phrase e g the presence of an action verb a specific type of action something that is enveloped or covered the assignment of a characterization to the prepositional object e g that it is a location an instrument a purpose or goal or a result i e the traditional cases or that the object is a thing observed as a spectator or a quantity of movement and the assignment of a characterization to some element of the context outside the prepositional phrase e g the fact of being present of having parts or elements or of being insertable into something else Every such definition does not contain all these specifications at this time I have not attempted an analysis of these definitions into their components However examples of how I have used these notions are described in subsequent sections At this point I will only make some general observations about what these definitions imply with respect to parsing and semantic representation In the first place it appears that many words can be typecast with particular prepositional definitions e g some verbs can be characterized as governing the patterns embodied in certain prepositional definitions Such patterns are descernible from the definitions of such verbs Furthermore it should be clear from the four types of specifications mentioned above how these definitions can be integrated into a general parser both for performing the parse and for building a semantic representation of what is being parsed By extension these same considerations mean that it is possible to use specific prepositional definitions in parsing verb definitions and in creating the semantic representations of those definitions either explicitly or in terms of frames with slots accompanied by selectional restrictions that must be satisfied when a word is used in a particular utterance These issues are dealt with in more detail below along with specific examples 4 3 Predicates Slots and Selectional Restrictions The syntactic considerations described in section 4 2 clearly will not suffice for the construction of complete SSNs Their further elaboration requires the development of semantic questions for use at the branching nodes To develop such questions it is necessary to know the semantic representations of the senses as they will appear at the terminal nodes of the SSN This is a circular statement since it is necessary to know how to discriminate among the senses before distinct representations can be rendered Therefore the full elaboration of an SSN involves a process of iterative refinement of the discriminatory and representational components Moreover accurate representations must eventually be given in terms of primitive units any intermediate representations must therefore be considered in this light In the following discussion only the development of representations for verbs will be considered although as will be seen the reresentation of other parts of speech is inextricably involved The representation of a verb definition essentially involves the assignment or identification of 1 an appropriate predicate 2 the appropriate arguments or slots and 3 selectional restrictions if any for each slot The predicate and arguments would be arrayed as an n tuple with the selectional restrictions placed in the appropriate slots The representation of particular definitions may involve a logical combination of more than one such n tuple In Rieger s system a predicate is considered a label for the accompanying argument configuration but it has no intrinsic meaning This is a convenient starting point for assigning a predicate but in the case of analytic definitions which consist of a genus and differentiae the predicate should be the ultimate generic term For example the definitions shown in Table 2 can be assigned the predicate change This is more than just a label but rather can be used to indicate that the basic argument configuration and selectional restrictions for the particular definition come from the definitions for change This is discussed in more detail below However for the verb change the predicate become different will be used The argument configuration must be developed from an analysis of the definitions and usually requires an examination of the definitions of the constituent words To illustrate this process definitions 1 and 2 of change will be used For both definitions the first argument or slot will be used to indicate the subject of the verb since the subject may be in the PAT or AGT case to be determined by the context the corresponding slot for SUBJ will indicate that PAT v AGT is to be assigned to SUBJ The words become different in both definitions imply the presence of four slots FROM STATE TO STATE TIME1 and TIME2 However since different is modified in two ways some additional complexity is introduced In definition 1 there is the notion that only an accidental attribute of the PAT v AGT becomes different while in definition 2 there is the notion that some essential attribute becomes different with the result that the PAT v AGT no longer exists The net effect of this distinction is that for definition 1 there must be a FROM STATE a TO STATE and a RESPECT in which the change occurs while for definition 2 there must be a FROM STATE which in this case is the SUBJ of change and a TO STATE which is the RESULT of the change Possible semantic representations of these two definitions are shown in Figure 1 Figure 1 Basic Frames for Definitions 1 and 2 of change Definition 1 become different in one or more respects without becoming something else BECOME DIFFERENT FROM STATE NE TO STATE SUBJ PAT v ACT D PAT v ACT ESSENTIAL ATTRIBUTES ACCIDENTAL ATTRIBUTES RESPECT TIME1 FROM STATE TIME2 TO STATE Definition 2 become something materially different from before BECOME DIFFERENT FROM STATE NE TO STATE SUBJ PAT v ACT TIME1 FROM STATE D PAT v ACT RESULT TIME2 TO STATE D RESULT These representations could appear at terminal nodes of an SSN and one could be the contribution made as a result of parsing the verb change unless further analysis were to lead to one of the subsenses of these definitions The final aspect of representing a verb definition requires the incorporation of selectional restrictions into the representations As with the predicate and the argument configuration selectional restrictions on what can fill particular arguments are derived from the definitional matter As mentioned before whatever representational formalism is used must have a capability for identifying the selectional restrictions that must be satisfied by the context For the subsenses of definition 1 of change the selectional restrictions may be so detailed as to fill in some slots of the basic frame for definition 1 or lead to the necessity for additional slots As a result using the terminology of Norman et al 1975 the concept satisfying an argument of the basic frame may be completely determined in representing a subsense For the most part the subsenses of definition 1 follow the basic frame shown in Figure 1 by providing information about the respect in which the subject of the verb becomes different To determine that this is the case it was first necessary to examine the definitions of the main verbs of each subsense In each instance the examination showed that the notion of becoming different is part of the meaning of the verb Having arrived at this finding it was then determined that most of the remaining information in the subsenses pertains to the respect in which the change occurs These respects are shown in Table 3 for each subsense and would be used to replace the word RESPECT in the basic frame for definition 1 It should be noted that for subsenses lb 2 1c 1f part of 1g part of 1h and 1i it was necessary to search for the respect in the definitions of the subsense s constituents It should be added that it was this analysis of the subsenses that led to the placement of the RESPECT slot under the slot for ACCIDENTAL ATTRIBUTES which in turn modifies the subject of the verb Each respect in which the change could occur was required via the phrase without becoming something else not to change the essential nature of the subject Table 3 Selectional Restrictions on RESPECT Slot of the Frame for Definition 1 of change Subsense Selectional Restrictions 1a characteristic property or tendency 1b 1 form appearance position state or stage 1b 2 facial complexion 1c size quantity number degree value intensity power authority reputation wealth amount strength etc 1d customs methods or attitudes specif religious attitudes 1e phase of the moon 1f capacity of being sour e g disposition taste smell acidity capacity of being tainted e g subject to putrefaction corruption moral contamination 1g means of conveyance vehicle or transportation line being used 1h register of the voice voice s tone pitch or intensity 1i method tempo or approach The subsenses may also provide further selectional restrictions about the direction of the change These restrictions as shown in Table 4 would be added to the FROM STATE the TO STATE or as a relation between the two states Other information may add new arguments as in definition 1e or give values to other slots as in definitions 1e and 1h Identification of the predicate the argument pattern and the selectional restrictions for all definitions in itself requires a sophisticated semantic parser Identification of the predicate can be accomplished in part by a taxonomic analysis of the type proposed by Lehmann 1976 for example all the definitions in Table 2 the intransitive uses of change could be assigned the predicate CHANGE However this is not valid since ultimately the definitions in Table 2 should be assigned the predicate BECOME DIFFERENT or whatever primitive turns out to be appropriate Table 4 Other Selectional Restrictions of the Frame for Definition 1 of change Subsense Selectional Restriction 1a becomes deprived of lose

    Original URL path: http://www.clres.com/online-papers/acl80.html (2016-02-11)
    Open archived version from archive

  • Models of the Semantic Structure of Dictionaries
    to understand the meaning of the constituent word s definitions we must understand the meaning of the constituent words definitions Continued repetition of the process is nothing more than the outward branching process described by Quillian however as mentioned before we must make this branching more disciplined in order to deal with vicious circles and avoid unwanted circularities If we are to have a fully consistent dictionary its model must show how each definition is related to all others Thus for each definition X the model should enable us to identify 1 those definitions of the constituent words of X that apply and those that do not apply and 2 the production rules that generated X from these definitions For example in the definition of the noun broadcast the act of spreading abroad it is necessary that the model indicate 1 which of the definitions of the act of spread and abroad apply and 2 the production rules by which the and act and all other collocations occur together Note 4 If this can be done for each definition in the dictionary and if any inconsistencies are reconciled then as will be shown it should be possible to find the primitive concepts in the dictionary and to transform each definition into a canonical form 5 BASIC MODEL The theory of labeled directed graphs digraphs is used as the formalism for the models Note 5 Digraph theory deals with the abstract notions of points and directed lines its applicability to the problem before us therefore depends on how these notions are interpreted In this respect it is important to distinguish the manner in which this theory is used here from the manner in which it previously has been used in semantics and linguistics The two most common uses are 1 where trees display phrase and syntactic structures cf Katz Fodor 1963 or 2 where directed graphs portray the sequential generation of words in a sentence or phrase cf Simmons Slocum 1972 In these cases and others cf Quillian 1968 and Bennett 1975 graphs are used primarily as a vehicle for display and no results from graph theory are explicitly employed to draw further inferences However as used here graphs constitute an essential basis for the analysis and hence will play an integral role in a number of assertions that are made In the simplest model a point can be interpreted as representing all the definitions appearing under a single main entry the main entry word can be construed as the label for that point The part of speech labels status or usage labels and usage notes are considered integral to the definitions and may be viewed as part of a set of characteristics of the individual definitions A directed line from x to y will be used to represent the asymmetric relations x is used to define y thus if the main entry x appears exactly or in an inflected form in a definition of y the xRy This does not preclude a distinct line for yRx or xRx Therefore we can establish a point for every main entry in a dictionary and draw the appropriate directed lines to form a digraph consisting of the entire dictionary This digraph may be disconnected but probably not An example which is a subgraph of the dictionary digraph is shown in Figure 1 Except for broadcast only the labels of each point are shown but each represents all the definitions appearing at its respective main entry The directed line from act to broadcast corresponds to the fact that act is used to define broadcast since its token appears in the act of spreading abroad In this model the token spreading is not represented by a point since it is not a main entry Since the definition shown is not the only one for broadcast this point has additional incoming lines which are not shown The resultant digraph for even a small dictionary is extremely large perhaps consisting of well over 100 000 points and 1 000 000 lines Clearly such a digraph provides little fine structure but even so it does have some utility The manner in which it can be used is described in Section 9 6 EXPANSION OF THE MODEL POINTS AS DEFINITIONS Letting each point in the basic model represent all the definitions of a main entry provides very little delineation of subtle gradations of semantic content As a first step toward understanding this content it seems worthwhile to let each point represent only one definition However the basic model will not trivially accommodate such a specification primarily because of the interpretation given to the directed line and thus it must first be modified In the basic model the existence of a line between two points x and y asserts that xRy i e x is used to define y Since the points represent all the definitions under the main entries the existence of a line arises from the simple fact that x appears in at least one of y s definitions If the point y represents only one definition say y j there is no difficulty in saying that xRy j However if we wish every point to represent only one definition then we must find the definition of x say x i for which x i Ry j is true Referring to the subgraph in Figure 1 this amounts to determining for example which definition of abroad is used to define the token abroad in the act of spreading abroad that is finding the i such that abroad i R the act of spreading abroad or abroad i R broadcast j It should be intuitively clear that interpretation of points as single definitions is desirable However there are no a priori criteria by which the appropriate value of i can be determined and hence there is no immediate transformation of the basic model into a model where each point represents one definition Since this objective is worth pursuing it is therefore necessary to develop criteria or rules according to which the desired transformation can be made In the application of rules that may be developed it will be convenient to make use of a model intermediate between the basic one and the one with points as definition For this purpose we can combine the two models of employing a trivial relations x i Rx which says that the ith definition of x is used to define x this holds for all definitions of x The line reflecting xRy j would remain in the model so that the digraph would show both x i Rx and xRy j and x would be a carrier as illustrated in Figure 2 In this case the unsubscripted abroad represents all the definitions of abroad only some of which are shown If and when suitable criteria establish for example that abroad 1 but not abroad 2 abroad 3 fits the context of the token abroad in the definition of broadcast it would then be possible to draw a line directly from abroad 1 to broadcast without the intermediation of the unsubscripted point abroad thus eliminating paths from abroad 2 abroad 3 to broadcast This model thus includes the points of the basic model and adds points to represent each individual definition in the dictionary The lines between these points ensure that no relation in the basic model is lost As described in the example it is necessary to develop rules according to which the points representing more than one definition can be eliminated or bypassed so that the only relations xRy that remain are such that x and y are points which represent one definition It may happen during the application of rules that some lines to carriers will be eliminated with more than one still remaining In such a case it will still be useful to modify the digraph as much as possible For example if xRy in the basic model where x has m definitions and y has n and xRy j in the expanded model then x 1 x m Ry j It may be that some criterion indicates that say x 1 x 2 Ry j but not x 3 x m Ry j When this occurs we can create two points x a and x b such that x 1 x 2 Rx a Ry j and x 3 x m Rx b but with no line from x b to y j as illustrated in Figure 3 The utility of this type of grouping will be demonstrated in Section 9 In any event since many criteria will eventually be required in the elimination of points representing two or more definitions this ability to group definitions is a necessary mechanism for modeling intermediate descriptions of the dictionary It should be noted here that all such points will not be eliminated those that remain will indicate an essential ambiguity in the dictionary this is further discussed in Section 8 7 SEMANTIC STRUCTURAL AND SYNTACTIC PARSING OF DEFINITIONS The basic and expanded models exampled in Figures 1 2 and 3 do not portray any of the meaning of the dictionary but rather indicate where particular relationships exist In fact these two models portray only the relation is used to define as if there is no other relation between definitions This approach does not capture some very important elements that go to make up a definition Instead of being analyzed directly into its ultimate constituents as in Figures 1 and 2 the definition the act of spreading abroad should first be broken down into subphrases and then into its ultimate constituents as in Figure 4 A desirable property of the new points is that they have the syntactical structure of definitions Thus the act and spreading abroad have the form of noun definitions spread abroad has the form of a verb definition and of spreading abroad not shown but feasible under a different parsing has the form of an adjective definition This would eliminate such combinations as act of or of the The points representing phrase constituents of a definition thus have the form of definitions but lack a label The absence or presence of a label seems to make no difference in understanding the definition represented In fact it seems valid to represent identically worded definitions or phrase constituents regardless of the number of main entries under which they appear by a single point with multiple labels Thus if each of the main entries disperse scatter and distribute has a definition verbalized as spread abroad these three words can be labels of the point spread abroad in Figure 4 Such a construction has no effect on the analysis of the definition the act of spreading abroad or spread abroad as shown in Figure 4 and similarly the analysis there would have no effect on any analysis involving disperse scatter or distribute Since there is a large number of instances where duplicate wording appears in a dictionary the approach given here would effect a substantial reduction in the size of the digraph This is not to say that the words disperse scatter and distribute have the same meaning but rather that in some instances these words can express the same concept The definition X the act of spreading abroad is essentially an entity unto itself The definitions of its component words have similar independence However like atoms in molecules we need to identify those forces which hold the components together and which endow the whole with whatever characteristics it has The definitions of the component words may require several words for their expression but they are symbolized by one word in the definition X even so the symbol and the definition both represent the same entity which has certain characteristics enabling it to be acted upon by certain forces These characteristics are the semantic structural and syntactic properties of definitions and the forces are the production rules by which the entities i e the component definitions or their symbols are brought together A definition may be viewed as the realization of such rules operating on the characteristics of other definitions The Herculean task before us is to build a parsing system or recognition grammar which will articulate the characteristics to be attached to each definition and which will capture the production rules necessary to portray the relationship between definitions The remainder of this section will present my ideas on how to approach this task The process which I have used for finding primitives entails showing that one definition is derived from another thereby excluding the former as a candidate for being primitive Such a demonstration of a derivation relationship requires a parser Each pattern which I observe between definitions helps to exclude further definitions and simultaneously becomes part of the parser As a result identification of the characteristics to be attached to each definition does not have to be accomplished all at once as will become clear below our purposes can be served as the components of the parser are delineated Thus success does not require full articulation of the parser before any parsing is initiated The following represents general observations about the form of the parser as it has emerged thus far The first set of characteristics would result from the syntactic parsing of each definition The purpose of this step would be simply to establish the syntactic pattern of each definition The output of this step would be similar to that generated by Winograd 1972 in his parser The dictionary for the parser would be the very dictionary we are analyzing although only the main entry its inflectional forms and its part of speech label would be used in this step Ambiguous parsings and failures would be kicked out the failures in particular would provide an excellent source for refining the parser used by Winograd Clearly this step is not trivial and it might even be argued that it is beyond the state of the art However by using a corpus as large as a dictionary and by kicking out failures and ambiguities I believe that this step will significantly advance the state of the art The second set of characteristics would be determined from a semantic parsing of the definitions that is an attempt to identify the cases and semantic components present within each definition For this study I have found the following distinction to be useful A case is a semantic entity which is not intrinsic to the meaning of a word e g that someone is an agent of an action whereas a component is an intrinsic part of the meaning e g a human being is animate It is necessary to articulate recognition rules for determining that a particular case or semantic component is present The little that has been done to develop such rules has been based primarily on syntactic structure or a priori assertions that a given case or component is present Despite the recognized deficiencies of dictionaries I believe that it is possible to bring much greater rigor to such rules with evidence gleaned directly from the definitions For example cut has a definition penetrate with an instrument this definition would be parsed as having the instrument case Note also that this definition makes the instrument case intrinsic to cut However in most cases it will be necessary to examine the definitions of the constituent words For example the verb knife has the definition cut with a knife although it is quite obvious in this instance that a knife is an instrument rigor demands that we go to its definitions where we find a simple instrument A great deal of analysis may ultimately be required to discern the intrinsic characteristics to be attached to a definition but I believe that many of these can come from the dictionary itself rather than from intuition Although the number of cases and components discussed in the literature is not very large the number of ways in which they may be expressed at least in English is significantly larger In addition there is still a large amount of ambiguity i e not every form specifically indicates the presence of a particular case For example a definition act with haste does not indicate that haste in an instrument rather with haste expresses a manner of acting Unraveling all these nuances requires a great deal of effort However it appears that a particularly good source of help in this endeavor might be found in the definitions of prepositions which are used primarily to indicate sense relations Bennett 1975 found it possible to express the meaning of spatial and temporal prepositions a high percentage of all prepositions with only 23 components However in Webster s the number of their definitions is at least two orders of magnitudes higher The difference seems to lie in the apparent polysemy which as Bennett says arises from the inclusion in prepositional definitions of redundant features already determined by the environment In other words many prepositional definitions contain information about the context surrounding the preposition particularly what sort of entities are related by the prepositions My examination of verb definitions containing prepositions has led to the observation of many noticeable word patterns i e collocations which appear to be useful in the recognition of cases For example one definition of of states that its object indicates something from which a person or thing is delivered In examining verb definitions there appears to be a distinct set of verbs with which this sense is used in the following frame transitive verb object of something The verbs that fit the slot are exemplified by free clear relieve and rid Thus if this pattern appears the object of the preposition can be assigned the meaning something from which a person or thing is delivered Through the use of prepositional definitions in this way I have therefore been able to articulate some semantic recognition rules by which the sense or case of a noun phrase the object of a preposition can be identified My use of this technique has barely begun so that it is presently unclear whether this approach will suffice to disclose all the case information that we wish to identify with a semantic parser but if not it will certainly make significant strides toward this objective Parsing a definition according to the preceding notions is still not sufficient to identify the semantic components which should be attached to a main entry since much of the semantic content is only present by virtue of the definition s constituent words Thus a complete rendering of a definition s semantic content must be derived from the semantic characteristics of its constituents in a recursive fashion all the way down to the primitives Although identification of these primitives is the primary goal of the approach being presented here and hence intrinsically incomplete until the analysis is completed the set of semantic characteristics for a particular definition can be developed as we proceed toward our goal To do this it will be necessary to articulate rules which indicate how semantic characteristics may be transmitted from one definition to another An example of such a rule is If the noun X possesses the semantic component animate and if X is the core noun i e genus in definition y i of the noun Y then Y will also have the component animate Another example is If a verb X has a definition x i which has been parsed as having an instrument case and X is the core verb of a definition y j of Y and y j also has been parsed as having the instrument case then the instrument in y j is a type of the instrument in x i It will also be necessary to articulate other derivational such as the application of a causative derivation to a state verb and transformational such as the application of a gerundial transformation to any verb rules This process of delineating how semantic characteristics are transmitted will at the same time give more meaning to the lines of the dictionary digraph than simply is used to define The third and final set of characteristics that must be attached to a definition is a specification of the context that must be present if that definition is intended The context restrictions may require that the definiendum must be used in a particular syntactical way for example as a transitive or intransitive verb Usage restrictions may specify the presence of particular words such as particles or objects For example there is a distinct set of definitions for the idiom take out which thus requires the presence of the particle out in addition to the verb One definition of the transitive verb chuck requires the object baseball Other definitions may require a specific subject Finally there are semantic restrictions that may be discernible only from the definition itself For example two definitions of the verb cheer are to give new hope to and lift from discouragement dejection or sadness to a more happy state if the second definition is intended it seems necessary that the context indicate the prior state of discouragement dejection or sadness since we cannot presume such a state for someone might have been in a happy or non sad state and simply received some new hope In the absence of the necessary context we would default to the first definition Thus far in my research I have not devoted any effort toward developing procedures for prescribing the context based on the definition I expect that initiation of this step will benefit from further results of the first two steps Although the parsing system outlined in this section may appear to be exceedingly complex such an eventuality is not unexpected The characteristics to be attached to each definition are not significantly different from those proposed by Fillmore 1971 It is also important to note that some of the goals of analyzing the contents of a dictionary are to reduce the amount of redundancy to remove vicious circles and to represent the meaning of a word in a more efficient way Hopefully this type of analysis would eventually lead to a substantial reduction in the size of a dictionary the prospects for this are considered further in the next section 8 THE ULTIMATE MODEL POINTS AS CONCEPTS At this juncture it is necessary to ask whether the points of the digraph models sufficiently correspond to meaning as we wish it to be represented In the two models described thus far the analysis of a definition was deemed complete when the appropriate definitions of the constituent words had been identified This situation is not entirely satisfactory since if a constituent word has more than one definition that applies the definition being analyzed is subject to more than one interpretation and hence may be called ambiguous with respect to the constituent For example if the two definitions of abroad over a wide area and at large fit the definition of broadcast to yield either the act of spreading over a wide area or the act of spreading at large it is not legitimate to exclude one This situation is only a reflection of the fact that natural language is almost always somewhat ambiguous However in accepting this fact it is necessary that we incorporate it into our models Parts of the parsing system described in the last section will help to discriminate and select those definitions of a constituent word which fit a given context As the parser is refined the candidates for a particular context will be narrowed as described in Section 6 but many instances will remain where more than one definition fits the context We might say that any point representing more than one definition thus constitutes an ambiguity Viewed differently we might also that the context is not sufficient to distinguish among all the definitions of a word In other words we can blame the ambiguity on the context We must expect that ambiguity will be present in the dictionary and deal with it on that basis For purposes of illustration let us say that abroad shown in Figure 4 is one such point To remove such points from the digraph we must make two points for the definition of broadcast one representing the act of spreading abroad 1 and the one representing the act of spreading abroad 2 These two points use the same words for expressing a definition and will be distinguishable only by the fact that their underlying definitions are different Because of this situation it is no longer valid to say that a point of the model represents a definition rather we will say that a point represents a concept It is also possible that the concepts represented by two or more points can be shown to be equivalent The concept the act of spreading abroad has been shown to be equivalent to the act of spreading over a wide area If the latter phraseology appears under some main entry say distribution then both it and the definition of broadcast would eventually be analyzed in the same way We will say that both expressions may represent the same concept and hence are equivalent at least to this extent Since the other definitions of these words would be different they are not totally equivalent This concept will thus be represented by one point labeled by either broadcast or distribution and equivalently verbalized as the act of spreading abroad or the act of spreading over a wide area This interpretation is a reflection of the fact that in ordinary speech a single concept may be verbalized in more than one way The observations in this section lead to the following description of the ultimate model The semantic content of a dictionary may be represented by means of a digraph in which 1 a point represents a distinct concept which may be verbalized in more than one way and may have more than one label and to which is appended a set of syntactic semantic and usage features and 2 a line represents an instance of some one of a set of operators which act on the verbalizations or labels of a point according to the features of that point to yield the parametric values of another point It should go without saying that the complete portrayal of a dictionary according to this model requires a considerable amount of further work nonetheless I believe that the model provides the appropriate framework for describing a dictionary 9 PROCEDURES FOR FINDING THE PRIMITIVES In section 3 I stated that the model of a dictionary should permit the transformation of each definition into its primitive components Based on the preceding descriptions it is suggested that the full articulation of the ultimate model will satisfy this objective for the following reasons An elementary theorem in the theory of digraphs asserts that every digraph has a point basis that is a set of points from which every point in the digraph may be reached Since points represent concepts in the ultimate model it seems reasonable to assert that the point basis of its digraph represents the set of primitive concepts out of which all others in the dictionary may be formed Based on the characteristics of the points in that model it is possible and perhaps even necessary that each primitive concept would be verbalized in several ways and symbolized in several ways as will be shown below Since the digraph has a finite number of points and lines the sets of primitive concepts and operators are also finite It only remains to find the primitive concepts this will be done by applying rules based on the models and the parsing system to identify words and definitions which cannot be primitives Essentially the assertion that a word or definition is non primitive requires a showing that it is derived from a more primitive concept and that a primitive cannot be derived from it These non primitives can be set aside and their full syntactic and semantic characterization can be accomplished after the primitives have been identified Although no primitives have yet been identified since the described procedures have not been fully applied their form and nature will be delineated To demonstrate the validity of my approach I have been applying rules developed thus far to the set of verbs in Webster s Third New International Dictionary 20 000 verbs and their 111 000 definitions This set was chosen because of their importance cf Chafe 1970 and the bare feasibility of coping with them manually although it may be another 3 4 years before I am finished at my current rate of progress I have attempted to formulate my procedures with some rigor keeping in mind the ultimate necessity of computerization I have developed some detailed specifications for some of my procedures envisioning the use of computer tapes developed by Olney but I have not completed these since I do not presently have access to a computer Despite the focus on verbs it will become clear that words from other parts of speech are inextricably involved in the analysis Also the rules that are presented can for the most part be applied to other parts of speech Notwithstanding the fact that the meaning of many verbs is derived in part from nouns and adjectives I believe that each verb definition also contains a primitive verb constituent Each verb definition consists of a core verb obligatory and some differentiae optional The definitions of other parts of speech have a similar structure i e a core unit from the same part of speech and some differentiae The subgraph of the total dictionary digraph formed by core verbs accords fully with the models described in Sections 4 5 and 7 Therefore any rules developed on the basis of those models will apply equally to the verb subgraph We need only keep in mind that the differentiae come from other parts of speech and become embodied in the core verb This is how the verb cut comes to have the instrument case intrinsically To begin the analysis we will let E represent the set of those verb definitions which have been identified as non primitive initially this set is empty Rule 1 If a verb main entry is not used as the core unit of any verb definition in the dictionary then all of its definitions may be placed in E This rule applies to points of the basic model which have outdegree 0 i e no outgoing lines Since no points can be reached from such a verb it cannot be primitive In Figure 5 the point labeled by pram represents the definition to air as a child in or as if in a baby carriage since pram is the core unit for no definition in the dictionary all its definitions may be excluded as non primitive In W3 this rule applies to approximately 13 800 verbs out of 20 000 the number of definitions in the verbs excluded is not known Rule 2 If a verb main entry is used only as the core unit of definitions already placed in E then all its definitions may also be placed in E This rule applies to points of the basic model with positive outdegree The uses of such verbs as core units follow definitional paths that dead end hence they cannot be primitive Figure 6 shows a portion of the dictionary digraph dictionary where the verb cake defines only barkle which in turn is not used to define any verb Thus the definitions of cake may be included in E after the definitions of barkle have been entered In W3 this rule applies to approximately 1 400 of the 6 200 verbs that remained after application of Rule 1 Rule 3 If the verbs forming a strong component are not used as core units in any definitions except those in the strong component or in definitions of verbs already placed in E by Rules 1 2 or 3 then the definitions of all verbs in the strong component may be placed in E This rule applies to points of the basic model which constitute a strong component i e a maximal set of points such that for every two points u and v there are paths from u to v and from v to u This rule does not apply when the strong component consists of all points and not yet placed in E A strong component consisting of the verbs aerate aerify air and ventilate is shown in Figure 7 Except for oxygenate the other verbs defining the set constituting the strong component are not shown Since it is possible to start at any of the four and follow a path to any other of the four there is no real generic hierarchy among them It is possible to emerge from the strong component and follow paths to pram eventilate and perflate to which however Rule 1 applies If we follow a definitional path that lead into this strong component we can never get out again or if we do we will only dead end Hence the definitions of all the verbs in the strong component are not primitive and my be placed in E In W3 this rule applies to approximately 150 of the 4 800 remaining after the application of Rule 2 Actually rules 2 and 3 may be applied in tandem based on those placed in E Thus after Rule 3 places the definitions of aerate aerify air and ventilate in E it so happens that Rule 2 then applies to the definitions of oxygenate After Rules 1 2 and 3 are applied to the digraph of the basic model the remaining points constitute a strong component of approximately 4 500 points This differs from those to which Rule 3 applies in that there would be no points left if we placed all its points in E This final strong component is the basis set of the basic model that is any point of the basic model i e any main entry in the dictionary may be reached from any point in the final strong component but not conversely At this juncture we can proceed no further with the basic model alone it is necessary to expand the points of the final strong component into two or more points each representing a subset of the definitions represented by the original point as previously shown in Figure 3 In part this can be accomplished by identifying individual definitions which are not used Rule 4 If any definition can be shown to be not used as the sense of any core unit or only those already in E it may be placed in E This rule is essentially a restatement of Rule 1 for individual definitions and includes the following two subrules among others not presented Rule 4a If all the remaining uses of a verb are transitive intransitive then its intransitive transitive definitions are not used and may be placed in E The expansion of a point into transitive and intransitive uses is a good example of how the points of the basic model are transformed into points of the expanded model Rule 4b If a definition is marked by a status label e g archaic or obsolete a subject label or a subject guide phrase it may be placed in E Lexicographers creating W3 were instructed not to use such marked definitions in defining any other word Other rules have been developed in an attempt to identify the specific sense of the core verb or those senses of a verb which have not been used in defining other verbs but are not presented here However there are too many instances where the differentiae of a definition do not provide sufficient context to exclude all but

    Original URL path: http://www.clres.com/online-papers/ajcl78.html (2016-02-11)
    Open archived version from archive



  •