archive-com.com » COM » C » CLRES.COM

Total: 469

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".
  • CL Research Experiments in TREC-10 Question Answering.wpd
    With greater complexity and with a document database where a simple join does not produce an answer the logic required to examine a path of relations becomes more difficult As indicated above the text analysis module develops four lists at the same time as the semantic relation triples 1 events the discourse segments 2 entities the discourse entities 3 verbs and 3 semantic relations the prepositions Each document consists of one or more tagged segments which may include nested segments Each discourse entity verb and preposition in each segment is then tagged A segment may also contain untagged text such as adverbs and punctuation Each item on each list has an identification number used in many of the functions of the text analysis module As indicated above the discourse analysis assigns attributes to each segment and subsegment discourse entity verb and preposition For segments the attributes include the sentence number if the segment is the full sentence a list of subsegments if any the parent segment if a subsegment the text of the segment the discourse markers in the sentence and a type e g a definition sentence or appositive For discourse entities the attributes include its segment position in the sentence syntactic role subject object prepositional object syntactic characteristics number gender and person type anaphor definite or indefinite semantic type such as person location or organization coreferent if it appeared earlier in the document whether the noun phrase includes a number or an ordinal antecedent for definite noun phrases and anaphors and a tag indicating the type of question it may answer such as who when where how many and how much For verbs the attributes include its segment position in the sentence the subcategorization type from a set of 30 types its arguments its base form when inflected and its grammatical role when used as an adjective For prepositions the attributes include its segment the type of semantic relation it instantiates based on disambiguation of the preposition and its arguments both the prepositional object and the attachment point of the prepositional phrase After all sentences in a document have been processed the four lists are used to create an XML tagged version of the document The XML tagging is performed for each segment within the XML element segment with the attributes listed in the tag opening The tag content is initialized to the segment text and we proceed to mark up this text according to the text contained within each subsegment discourse entity discent verb verb and preposition semrel in the segment As these XML elements are generated their attributes are added to the tag opening The resultant XML tagged text for individual documents were combined into one overall file of documents each with a tag for the document number For TREC the output consisted of groups of ten documents from the NIST provided top documents for each question Since we only processed the top 20 documents we had 500 XML files for the top ten documents and 500 for documents ranked 11 th through 20 th These are the files used for answering the TREC questions 4 Question Answering Using Document Databases For TREC 11 the question answering against the document databases was little changed from previous years We refer to our earlier detailed descriptions Litkowski 2002a Litkowski 2001 and provide only a brief overview here For TREC 11 a database of documents was created for each question as provided by the NIST generic search engine A single database was created for each question in the main task The question answering consisted of matching the database records for an individual question against the database of documents for that question The question answering phase consists of four main steps 1 detailed analysis of the question to set the stage for detailed analysis of the sentences according to the type of question 2 coarse filtering of the records in the database to select potential sentences 3 extracting possible short answers from the sentences with some adjustments to the score based on matches between the question and sentence database records and the short answers that have been extracted and 4 making a final evaluation of the match between the question s key elements and the short answers to arrive at a final score for the sentence The sentences and short answers were then ordered by decreasing score The short answer for each question an exact answer its score and its sentence the justification were printed to a file This file was then sorted by score to create a confidence ordered answer set submitted to NIST 5 Question Answering Using XML Tagged Documents As described earlier question answering against XML files essentially involves describing a path XPath from the top of the tree s to a discourse entity in our case to a discent node which is returned as the answer To do this a question is converted into an XPath expression used to select nodes in the files For example for question 1593 What percent of Egypt s population lives in Cairo an XPath expression is segment contains Cairo discent contains percent and tag howmany The first double slash says to find any node in all documents being searched that are marked as segment elements and contains the word Cairo The second double slash says to find all discent elements that are descendants of such segments containing the word percent and that have an attribute tag with value equal to howmany This XPath expression will return zero or more nodes from however many documents are processed In general question answering consists of the following steps 1 analyze the question and convert it into an XPath expression 2 load the XML file s and select the nodes satisfying the XPath expression and 3 if necessary score and or evaluate the nodes returned and present them to the user The second step is the easiest consisting of a loop over the files being processed with a single statement to load the file and another single statement to select the nodes The first step determining the XPath expression is more difficult As can be seen for q1593 not all the question elements are present in the query This may be characterized as a backoff strategy beginning with all the terms in the query and removing some that are not necessary or are too restrictive For q1593 including all the terms will result in zero nodes This is frequently the case with questions often providing much more information than is likely to appear in one sentence The third step evaluating the nodes selected is generally not as complicated a well formulated XPath expression generally returns only a couple of answers although there are some question types that require more extensive processing We will describe our observations about the first and the third steps in more detail below As indicated earlier we were not able to implement our question answering against the XML tagged documents for our official submission Using these documents has required an entirely new conceptual approach involving the resolution of many intertwined issues This new approach has been evolving since our submission many refinements are necessary and many possibilities for making these changes have been emerging To begin with the whole tagging process described in general terms above requires dealing with virtually the full panoply of natural language processing including tokenization sentence splitting parsing word sense disambiguation anaphora resolution and discourse analysis While we have developed a system that comprehends all these components many of the components have not yet been implemented to the state of the art For example our anaphora resolution module is currently estimated at 55 percent correct whereas the state of the art has been attaining levels over 80 percent Also our typing and characterization of prepositional semantic relations is currently operating at about 20 percent see Litkowski 2002b for our lexicographic approach to this problem so that we have to rely on the preposition itself as the bearer of information about the semantic relation Further our discourse structure analysis is an initial implementation presently handling only appositives and relative clauses A second major issue to be faced is the selection of tags and their attribute names and values This issue involves identifying what information will be useful and then developing techniques for extracting the information using whatever other resources may be available such as dictionaries and thesauruses An important question given our semantic predilections is what semantic classes to use for characterizing discourse entities Another important question is how to group information what sentence parts should be grouped together and which modifiers should be separated or put into attributes of a discourse entity Dealing with these issues identifying problems with the functioning of our XML output generation and examining representational alternatives is very complex and requires the development of mechanisms for analyzing them This has led to two steps in our development cycle 1 the development of an analysis interface for assessing problems and 2 the use of the TREC questions as guidance for inadequacies in our representations As will be suggested below the use of XSLT has demonstrated not only a capability for dealing with these issues but also provides a strong indication that an XML representation of text will be extremely useful for a wide range of applications including question answering 5 1 Step 1 An XML Analysis Interface The generation of 1000 XML files each containing 10 TREC documents provides a large amount of data the XML files are approximately five times the size of the TREC documents The XML files can be viewed with retention of the nested structure in Microsoft s Internet Explorer but this does not allow any systematic examination of the data Conventionally those working with XML files develop XML stylesheets for portraying the data XSLT perhaps embedded in interactive browser web pages However this requires a prior design something not yet developed for the files generated here Moreover XLST is somewhat involved and not convenient for the analysis required here Instead we developed a GUI interface which enables lower level access to the XML data and provides an easier development vehicle for the kinds of exploration needed here Lessons learned from this interface can guide future development of applications using XML tagged documents Our development environment known as XMLPartner provides powerful tools for low level access to the XML data A well structured XML file has the form of a completely hierarchical tree wherein nodes contain the data and the attributes In our system an XML file of any size with extremely large files using a buffered stream is loaded with one statement Similarly a search for nodes providing the answer to some query the XPath expression conforming to the XML Path Language is accomplished with one statement This enables us to focus on development of queries and examination of search results perhaps with further search statements We have developed surrounding GUI components to facilitate examination of different aspects of the XML data referred to below as XML Analyzer 5 1 1 Global Examination of Data In the first place we used XML Analyzer to examine and sometimes extract interesting phenomena in the text XML Analyzer can be used as a concordancer a suitable XPath expression can extract all sentences in our TREC XML files 80 MB that begin with After in four minutes Similarly we can find all discourse entities that contain a capitalized word to examine whether we have assigned them an appropriate named entity type In general we use this basic capability to examine words the entities and sentences in which they occur and their attributes We display results of a search with the entity if requested the document title the document number for TREC documents and the sentence containing the entity When we are searching only for sentences no entity is given A user can select a sentence and ask to see all the entities in that sentence A user can select an entity and request all other entities which co refer to it or have it as an antecedent 5 1 2 Detailed Investigation of Discourse Entities The XML Analyzer can be used to examine details about particular discourse entities For example question 1502 asks when was President Kennedy killed In the NIST top 10 documents for this question a search on Kennedy in discourse entities identified 152 occurrences the Kennedy clan Narrowing the search to those also containing Edward gave 7 instances expanding this to include entities where Edward was contained in the antecedent attribute identified an additional 14 instances An examination of the attributes of these 21 instances showed 14 as the subject one as the object one as a possessive pronoun and three as a genitive determiner and two as a prepositional object Use of the XML Analyzer in this way suggests that a user can examine the different relations in which an entity participates For those as subject we can examine the verbs to determine what kind of actions the subject performs for action verbs or what properties the subject has for stative verbs For those as possessive pronoun or genitive determiner the user can examine what kinds of possessive relationships the entity can have e g as brother his back or his commitment For those as prepositional object we can examine the relations the entity has with other entities More generally this suggests the possibility of an interactive web page allowing a user to explore the different relations in which a discourse entity participates perhaps moving to other discourse entities with which it shares a relation 5 2 Step 2 Answering Questions with XPath Expressions As our first step in developing techniques for answering questions we examined whether the answers as contained in the patterns occurred as discourse entities in our XML output For virtually all cases the answers were present in distinct entities in those where they were not we identified several bugs we were able to correct in our XML output processing This process generalizes well with our interface create an XPath expression determine whether it leads to appropriate discourse entities and if not make changes in some part of our system either correcting bugs or altering our XML representation This process has involved learning the intricacies of XPath expressions which have proved capable of returning the exact answer to almost all TREC questions We developed XPath expressions for a contiguous 20 percent sample of the TREC questions providing a basis for drawing conclusions In general the XPath expressions are highly confirmatory of techniques developed over the years in the QA track The XPath expressions show that simple string patterns are quite effective and that syntactic and semantic information can be quite useful Our development of these expressions shows that characterizing the patterns in the underlying text via XML elements and attributes is worthwhile for QA and potentially other applications We demonstrate this by showing the XPath expressions for several question types In each of these cases the development of an XPath expression proceeds by 1 further characterization of the question type 2 development of a query component that selects segments and 3 refinement of the query in specifying characteristics of the discourse entity 5 2 1 WhatIs and WhatNP Questions What questions have the highest frequency constituting more than 40 percent of the questions and have the most subtypes Four principal varieties are 1 What is was the NP NP called the ORD NP NP1 s NP2 NP where NP is a noun phrase and ORD is an ordinal e g first 2 What NPA is NP2 PP did NP2 V PREP where NPA is NP1 or NP1 s NP3 PP is a prepositional phrase PREP is a preposition and the internal indicates an optional element 3 What is NP s real original nick name and 4 What do NP V does NP stand for where V is a verb For the most general variety What is was the NP a canonical answer would be X is wa s the NP Examples are What is the oldest college bowl game 1529 What is the most populated country in the world 1544 and What is the text of an opera called 1583 A suitable XPath expression can ask segment contains is was the NP i e a simple string match or perhaps suitable subsets of the NP To get at the specific discourse entity the XPath expression would continue with discent contains NP and synrole obj preceding sibling verb was preceding sibling discent synrole subj which says find a discourse entity containing NP with syntactic role object that is preceded by a verb equal to was and that is preceded by a discourse entity with syntactic role subject This discourse entity is the answer to the question Another possibility for the general What is was the NP as well as the third variety above What is name and the second alternative of the fourth variety What does NP stand for is a search for a relative clause appositive or parenthetical As mentioned earlier our text analysis and XML tagging modules generally identify these as subsegments Our segment search for these can be formulated as segment contains NP and child segment or contains or or contains which looks for a segment that contains the NP and contains either a nested segment or a simple string a comma and or or an opening parenthesis In these cases the desired discourse entity would be obtained by first looking for discent contains NP and then either preceding sibling discent or following sibling discent In the case asking what something stands for the NP is usually an abbreviation or acronym In this case it is possible to build a more elaborate XPath expression that tests whether the letters of the answer node s correspond to the NP With our low level access to the answer nodes however it may be more efficient to perform this test in a post processing phase that

    Original URL path: http://www.clres.com/trec11.html (2016-02-11)
    Open archived version from archive

  • Principles and Procedures of Category Development
    readable dictionary MRD a searchable reproduction of a paper dictionary used to identify parts of speech such as nouns verbs and adverbs inflectional forms such as the past tense or gerundial forms of verbs and derivational forms such as that management is derived from manage 2 an 1800 page description of grammatical and semantic properties of the English language used to identify additional features and characteristics of words Quirk et al 1985 and 3 WordNet a freely available rigorously developed database of approximately 120 000 words and phrases with these words and phrases grouped into synonym sets synsets and organized into a hierarchical and relational semantic network Miller et al 1990 WordNet can be used to identify common semantic components for words since its principal relation is the hierarchical ISA relation a horse is a mammal establishing that horse has the semantic component mammal note 3 5 2 Initial Stage Based on Part of Speech Analysis The first stage of category analysis involves looking at the part of speech of the words in the categories This stage corresponds to the earliest developments in computational text processing in the 1950s when the focus was on the part of speech of words Eleven categories in MCCA such as Have Prepositions You I Me He A An The consist of only a few words from closed classes note 4 The category The contains one word and Prepositions contains 18 words About 20 categories Implication If Colors Object Being consist of a relatively small number of words 34 22 65 11 12 respectively taken primarily from syntactically or semantically closed class words such as subordinating conjunctions and relativizers or words which are found at the top levels of WordNet and represent abstract concepts like colors To determine that these categories consist primarily of closed class words the words in the category were passed through DIMAP to extract just this set from the integrated MRD Inspection of the part of speech field confirmed the intuitions about the category assignment When the part of speech of words in a category belong to open classes analysis becomes a little more difficult When the words are all in one class that is all nouns verbs adjectives or adverbs a unifying principle is sought from the hierarchical relationships among the words One possible principle is that the words fall into a small number of categories in a thesaurus such as that of Roget Another possibility is that the words are related by broader than narrower than or synonymic relations as assigned in keyword indexing thesauruses Yet another possible principle is one used for dictionary definitions and consists of examining definitions of the words to identify an umbrella genus word with more specific terms underneath Using WordNet this step involves identifying the hierarchical groupings of the words in the category The remaining 80 or so categories in MCCA consist primarily of just such open class words nouns verbs adjectives and adverbs sprinkled with closed class words auxiliaries subordinating conjunctions Several categories consist of words from a single part of speech as is the case with Functional roles Detached roles and Human roles which all include only nouns To examine such unified sets of words it is valid to examine their definitions for common genus terms DIMAP implements the more convenient method of using WordNet to examine hierarchical relations as in Table 1 which shows a sample dictionary entry note 5 where the field Isa links shows that animal is of type creature Table 1 Lexical entries Example of semantic features Word animal Type r Code 00026 No Defs 1 Sense 1 Cat nil Isa links creature d 0 Features EDIBLE boolean Word creature Type r Code 00025 No Defs 1 Sense 1 Cat nil Isa links ind obj d 0 Features AGE scalar SEX gender To show how this field is used consider the MCCA category Detached roles which has a total of 66 words including the words academic artist biologist creator critic historian instructor observer philosopher physicist professor researcher reviewer scientist sociologist These words fall under the WordNet synsets headed by person although not including this word in particular synsets headed by creator expert authority professional intellectual Other synsets under expert and authority do not fall into this category and would thus be included in other MCCA categories Thus it is possible to characterize Detached roles as words used to describe persons performing intellectual or thinking activities This is a concept well captured by its heuristic name and distinguished from Human roles such as uncle or bride and Functional roles such as janitor or firefighter Identification of these synsets facilitates extension of the MCCA dictionary for this category to include further hyponyms that is types of creators professionals or intellectuals of these synsets 5 3 Semantic Features and Semantic Components The heuristic name given to the category of Detached roles along with the defining WordNet synsets suggests the next stage in the process of category development as well as the next step in linguistic consideration of the lexicon Table 1 also shows the field Features indicating properties of the lexical items such as Age and Sex Katz Fodor 1963 proposed the use of semantic features to characterize entries in a lexicon In the sample category there is a feature Human with a value and a feature Role with a value Detached Several more features might be proposed to encode words in this category hundreds if not thousands of other features can be used to characterize the full set of words in the English For example Whissell 1996 developed a Dictionary of Affect encoding dimensions of emotion activation and pleasantness Laffal 1995 likewise based his dictionary 43 000 words and 168 concepts on semantic features coding words in the same category based on the core meanings of words that is having the same semantic component Nida 1975 174 characterized a semantic domain as consisting of words sharing semantic components However he also suggests Nida 1975 193 that domains represent an arbitrary grouping of the underlying semantic features Thus we see that the 1960s development of the notion of semantic features has become a very prominent basis for the development of category systems The subtrees rooted at particular nodes in the WordNet hierarchies provide a readily available basis for category development that reflects implicit assignment of common semantic features and components Litkowski 1997 proposes making these semantic features and components more explicit specifically for the purpose of facilitating category development 5 4 Syntax and Semantic Roles The 1960s saw the rapid development of formalisms for representing the syntactic structures of phrases clauses and sentences but there was relatively little research toward integrating semantics that is meanings into the representations Fillmore 1968 began a process of characterizing the semantic roles of noun phrases in a sentence particularly as related to the main verb Thus in addition to identifying the subject and object of a verb and the object of a preposition it was possible to characterize the role of these syntactic items by referring to them as for example agent patient theme instrument and location There about 30 to 50 semantic roles although there is still no full agreement on what the complete set should be Table 2 shows a lexical entry for the word eat and illustrates the way in which syntactic and semantic role information is encoded Important to this example is the requirement that the word eat have associated syntactic items of subject and object The subject identifying an agent who performs the act of eating and a theme describing the thing being consumed are both encoded as features of the lexical item Table 2 Lexical entries Example of syntax and semantic roles Word eat Type r Code e00000 No Defs 1 Sense 1 Cat vrb Defin ingest solid food through mouth and swallow it Isa links ingest d 0 Features root var0 subj root var1 cat n obj root var2 optional cat n AGENT var1 THEME var2 Syntactic and semantic role information is normally used for parsing text but it can be important for category development as well This can be seen in the analysis for the MCCA category Sanction which contains 120 words including the following words applaud applause approve congratulate congratulation convict conviction disapproval disapprove honor judge judgment judgmental merit mistreat reject rejection ridicule sanction scorn scornful shame shamefully While this set of words includes words from several parts of speech discussed in more detail below it is rooted primarily in the Levin 1993 verb sets of Characterize class 29 2 Declare 29 4 Admire 31 2 and Judgment 33 This means that the set has particular syntactic and semantic patterning in addition to the synonymic and hierarchical relations that can be discovered using the techniques described in the previous section Levin has identified a considerable set of syntactic properties associated with the classes she has developed and thus a useful resource itself for category development but has not yet formally characterized the semantic properties Instead the definition of this class might following Davis 1996 inherit a sort notion rel which has a perceiver and a perceived argument thus capturing syntactic patterning with perhaps a selectional restriction on the perceiver that the type of action is an evaluative one thus providing semantic patterning In other words the underlying conceptualization of the MCCA category indicates that there is an action involved as indicated by the verb that this action involves some idea or notion on the part of the actor the perceiver and that this notion the perceived is inherently an evaluation WordNet synsets explicitly contain some syntactic information and implicitly some semantic role information However it does not have the depth required for the analysis described above Other resources such as Levin 1993 as well as some databases being constructed for on line access contain more of this detail What this means for purposes of characterizing and extending the words in the category Sanction is that not only can the WordNet hierarchy be used but also it is possible to include words that correspond to conversion of verb concepts into noun counterparts for example the action judge corresponds to the result of a judging action that is a judgment 5 5 Selectional Restrictions Semantic Relations and Knowledge Bases The evolution of artificial intelligence and semantics in the 1970s and the 1980s Amsler 1980 Evens et al 1980 Winograd 1972 Schank 1975 Markowitz et al 1986 has provided significant amounts of understanding about potential information that can be included in lexical entries that can be used in category development This discussion illustrates three pieces of information selectional restrictions semantic relations and knowledge base information that may be included in lexical entries and that can assist in the process of category development These are discussed for the sake of completeness but are not described in the present analysis of MCCA categories because of space considerations As alluded to in the last section a restriction was placed on the type of notion involved in the use of a word in the Sanction category namely that it had to be evaluative in nature Table 3 shows a lexical entry for the preposition in with two senses Basically this entry says that in is used to begin prepositional phrases the pp adjunct with noun phrase objects In the first sense this says that the phrase may be attached to another noun phrase which may be an object or an event and that the object of the prepositional phrase is a location in some physical object The second sense says that the prepositional phrase is attached to a verb which describes an event and that the object of the preposition describes a location which may additionally be characterized as a destination These specifications are called selectional restrictions and serve to limit the range of words that may appear in the identified syntactic positions Table 3 Lexical entries Example of selectional restrictions Word in Type r Code i00000 No Defs 2 Sense 1 Cat prp Defin located within the confines of Features root var1 pp adjunct root var0 obj root var2 cat n var1 OR object event location var2 physobj Sense 2 Cat prp Defin into the destination of Features root var1 pp adjunct root var0 obj root var2 cat n var1 event destination var2 location relaxable to physobj Table 4 shows a lexical entry describing an event of which there may be many types But additionally the entry states that any word describing an event is inherently related in several possible ways to other lexical entries These are known as semantic relations They are encoded here as features with values preceded by plus signs which are taken to mean that the following word is actually a selectional restriction on what other lexical entries may appear in the particular relation The relations shown in Table 4 are quite general and would apply to many lexical entries However the number of possible relations is unbounded similar to the open class words and hence a relation may be of arbitrary depth and specificity For example a chemical event relation hydrogenate could be defined and specify that its location is a test tube Table 4 Lexical entries Example of semantic relations Word event Type r Code 00012 No Defs 1 Sense 1 Cat nil Isa links all d 0 Features SUBEVENTS event SUBEVENT OF event TIME 0 MEASURING UNIT second LOCATION place CAUSED BY event CAUSES event PRECONDITION event EFFECT event Table 5 presents a lexical entry for the word or concept teach Teaching is a communicative event that involves a teacher as the agent and knowledge as the thing that is passed on The lexical entry specifies that a teaching event may consist of three subevents where a teacher performs a describing action where there may be a request subevent when a student asks for information and where there may be an answering process The corresponding lexical entries for the answering and describing subevents show that they inherit information from the teaching event The three lexical entries considered as a unit are construed as part of a script see Schank Abelson 1977 Table 5 Lexical entries Example of knowledge base data Word teach Type r Code 00014 Isa links communicative event d 0 Features AGENT intentional agent default teacher THEME knowledge BENEFICIARY intentional agent default student PRECONDITION default AND teach know 1 NOT teach know 2 EFFECT default teach know 2 SUBEVENTS AND teach describe teach request info teach answer Word teach answer Type r Code 00019 Isa links answer d 0 Features AGENT teach agent THEME teach request info theme BENEFICIARY teach beneficiary Word teach describe Type r Code 00017 Isa links describe d 0 Features AGENT teach agent THEME teach theme BENEFICIARY teach beneficiary Lexical entries containing information on selectional restrictions semantic relations and knowledge base data can be used in category development primarily by enabling an analysis of how the embodied concepts fit together that is which ones are in more subsidiary positions The lexical entries described in Tables 3 4 and 5 illustrate the general linguistic finding that the representation of meaning is focused principally on the verbs and that these verbs may themselves be arranged in hierarchies 5 6 Lexical Rules Derivations and Sense Relations The final type of information in lexical entries that we will consider is based on the phenomena by which new lexical entries are derived from existing ones The most basic of these derivational relations is the one in which inflected forms are generated These are generally quite simple and include the formation of plural forms of nouns the formation of tensed past past participle gerund forms of verbs and the formation of comparative and superlative forms of adjectives The discussion above of the MCCA Detached roles and Sanction categories did not mention the possibility of including these inflected forms but in fact these forms are included Several more elaborate forms of relations are also possible For the purpose of illustrating these additional derivational rules we introduce another MCCA category known as Normative This is a complex category consisting of 76 words and like the Sanction category also has words from all parts of speech This category includes the following along with various inflectional forms absolute absolutely consequent consequence consequently correct correctly dogmatism habit habitual habitually ideologically ideology necessarily necessary norm obviously prominence prominent prominently regular regularity regularly unequivocally unusual unusually The use of the heuristic Normative to label this category clearly reflects the presence in these words of a semantic component oriented around characterizing something in terms of expectations or standards Of particular interest here are the derivational relations that form adjectives from nouns nouns from adjectives and adverbs from adjectives There were similar kinds of relations in the Sanction category where most of the concepts seemed to be based on underlying verb forms In that category a number of words were clearly noun adjective and adverb derivations from the underlying verbs These derivational relations can be encoded in lexical entries in the same way as the semantic relations shown in Table 4 The feature name in such relations would describe the relation such as nominalization with a value identifying the derived form which would also be a lexical entry having the inverse relation nominalization of with a value showing the base form of the word Some of these relations are shown in WordNet but a more complete source is a dictionary which shows an ordering of derived forms The MRD included with DIMAP shows these forms The adverb derivations in the Normative category have an additional interesting aspect to them The heuristic Reasoning has also been used to label this category When we examine the syntactic and semantic nature of these adverbs we find that they are considered to content disjuncts Quirk et al 1985 8 127 33 that is words indicating that the speaker is making a comment on the content of what the speaker is saying in this case compared to some

    Original URL path: http://www.clres.com/catprocs.html (2016-02-11)
    Open archived version from archive

  • Review of Linguistic Semantics
    to identify precisely what should be included in these representations In his examination of the five approaches to meaning Frawley extracts from each something that is salient to semantic representation Considering meaning as reference to facts and objects in the world Frawley makes the important point that reference takes place within a mentally projected world 18 enabling us for example to refer to Venus as both the Morning Star and the Evening Star However there are phenomena such as presupposition that argue against a completely referential representation of meaning though The present king of France is bald presupposes a truth in some possible universe of discourse it has an empty referential meaning because at present France has no king Then considering meaning as logical form he recognizes that formal semantics can help us discover the content of semantic representations and enable us to be precise about how grammatically sensitive semantic properties are components of truth in a model 35 At the same time he argues we must recognize that natural language is non formal in some respects In considering meaning as deriving from context and use that is as pragmatically determined and the position that context and use are always relevant to an interpretation Frawley concludes that linguistic expressions themselves bring semantic conditions with them into any context 44 because words must be viewed as having some stability Moreover through an analysis of Gricean maxims which specify what people assume when talking to one another it is possible to articulate a scale of context from conversational implicature to conventional implicature to presupposition to entailment and to conclude that somewhere along this progression is a place that separates pragmatic from linguistic semantic information The implicatures what is implicit are clearly dependent on context while entailment requires knowledge of what a word means if someone says 14 points the hearer must use context to determine whether the topic is test scores basketball type size whereas if someone says in time or location is entailed Frawley frequently invokes the scale in chapters 3 to 10 in examining specific pieces of grammatically encoded information Frawley next examines the view 45 that linguistic meaning is entirely determined by the cultural context in which the language occurs stemming from the Sapir Whorf Hypothesis The view that meaning derives from culture fails according to Frawley because the same meanings are there and the same conceptual structures exist 48 across cultures variations across cultures arise from the significance attached to these meanings and concepts Linguistic semantics is concerned with invariant meaning the constancies in spite of the variation of contexts and what is immune to cultural variation 48 Grammatically relevant semantic representations are invariant because they are constituted by relatively stable decontextualized semantic properties 50 Finally Frawley settles on meaning as represented in conceptual structure particularly as presented in Jackendoff s notion of the Cognitive Constraint that no fact is excluded from expression 51 Thus grammatical meaning is a subset of the intensions connotations that comprise semantic representations and this is in turn a subset of conceptual structure that is identifiable from how languages are actually put together and how speakers mentally project context culture and the world of reference 55 Semantic representation for Frawley involves reference to universal gradient inherent properties of a mentally projected world 18 This interpretation of meaning underlies the explication of the concepts presented and analyzed in detail in chapters 3 to 10 although not with the formality of representation pursued by Jackendoff or computational linguists Such an exercise is left to us The primary data in chapters 4 to 9 centers on verbs and the concepts they encode with chapter 4 presenting the backbone typology of verbs Events are the primary conceptual content of verbs where an event is defined as a relatively temporal relation in conceptual space 144 and where events are divided into four categories acts states causes and motion In his description of verbs Frawley synthesizes and unifies treatments of among many others Davidson Givón 1984 and Langacker Chapters 5 to 9 treat the principal conceptual attributes of verbs roles deixis aspect tense and time and modality and negation Example of an analysis of linguistic data In section 7 22 302 6 the author analyzes the aspectual nontemporal distinctions between telic and atelic events as a feature of the internal contours of events 302 that is their beginning duration completion and repetition Telic verbs have built in goals and are processes that exhaust themselves in their consequences 302 that is they consist of a process and its required result Thus in Bill drove to New York drive contains within its meaning both the process and the result of driving The distinction between telic and atelic verbs has particular reflexes that can be determined using tests of non interruption ambiguity with almost and use of durative for One might interrupt Bill in his driving to New York but he may still reach his destination However for the atelic verb reach an interruption of Bill reaching New York makes the entire event fail If Bill almost drove to New York there is ambiguity as to his starting or his arriving atelic verbs do not show ambiguity with almost Finally Bill might drive to New York but Bill reached New York for two hours has no meaning because reach is instantaneous not durative Frawley then describes how perfect tense e g Donna has driven and passive voice e g The door was closed induce telic interpretations by turning an event into a process that exhausts itself Further he notes how several languages allow the conversion of an atelic event into a telic one by inserting a morpheme of result 306 Example of a unified theoretical treatment In section 9 3 406 19 Frawley first surveys epistemic notions from the literature that is he considers the manner in which speakers express judgment about factual statements and the likelihood of a state of affairs He presents Palmer s account of judgments and evidentials and

    Original URL path: http://www.clres.com/online-papers/dsna94.html (2016-02-11)
    Open archived version from archive

  • The Clog
    particularly challenging and interesting The limitation of 140 characters would seem to make tasks easier since sentences would be relatively short e g compared to long sentences in newspaper articles However this limitation has brought with it some rather fundamental changes in the way we communicate primarily in the lexicon with novel creations e g l8 for late In addition tweets are full of non standard use of punctuation marks particularly in creating emoticons further complicating analysis A recent paper by Kyle Dent and Sharoda Paul Through the Twitter Glass Detecting Questions in Micro text took on the natural language processing NLP challenges described briefly in a Scientific American article developing NLP techniques to deal specifically with issues in tokenization the lexicon and parsing They built a system to classify 2304 tweets into real questions and not questions which had a superficial resemblance to questions Tweets share a property with Likert scales namely that they are both short The content analysis program MCCA Minnesota Contextual Content Analysis has been applied to an examination of Likert items in an attempt to improve the coherence of an entire scale I modified MCCA slightly so that it would perform a classification task applied it to the Twitter data used by Dent Paul and achieved results almost as good without having to deal with all the NLP issues This would suggest that MCCA can provide an initial classification tool as a first step in the analysis of Twitter data The MCCA analysis also showed that the tweets in this data set are extremely emotional anti practical and anti analytic more Content Analysis MCCA natural language processing tweets Twitterverse Enhanced Word Sketches Posted in March 10 2011 2 19 pmh Ken No Comments Recently I made a request on the ACL SIGLEX mailing list for tools that might help in analyzing preposition lexical samples In this request I indicated a need for software that would specifically provide enhanced word sketch analysis I only received a couple of replies one of which asked what I meant by this term I responded with some vagueness but the interchange sparked some thoughts that are worth exploring further In particular this discussion raised questions about the amount of information in preposition dictionary entries and what might help in expanding these entries I d like to expand on this particularly on the relation between current approaches to word sense disambiguation primarily statistical in nature and what ends up in the dictionary I think there is still something of a disconnect between the computational community and the lexicographers more Prepositions word sense disambiguation word sketches Semantic Primitives Posted in January 15 2011 5 30 pmh Ken No Comments In a recent posting to CORPORA on the topic of semantic primitives John Sowa says The so called primitives are the result of analysis by adults who have learned how to write dissertations about language I believe there are no primitives that are truly primitive in the sense that they cannot be analyzed

    Original URL path: http://www.clres.com/blog/ (2016-02-11)
    Open archived version from archive

  • CL Research Frame Element Taxonomy Tree
    CL Research Frame Element Tree Top Top Act Cause Degree Entity Path Place Purpose Reason Role State Topic Type

    Original URL path: http://www.clres.com/db/feindexa.php?op=e&fe=Top (2016-02-11)
    Open archived version from archive

  • Taxonomy Change Suggestions
    Taxonomy Change Suggestions Operation Move Split Merge Add Delete Operation select

    Original URL path: http://www.clres.com/db/fechange.php (2016-02-11)
    Open archived version from archive


  • 45 All trial All trial by 22 759 510 249 down 5 485 332 153 during 2 120 81 39 for 15 1430 951 479 from 16 1787 1206 581 in 15 2086 1397 689 inside 5 105 67 38 into 10 901 604 297 like 7 391 266 125 near 5 63 All trial All trial of 20 4484 3004 1480 off 7 237 161 76 on 25 1316 872 444 onto 3 175 117 58 out 6 755 Not used Not used outside 4 59 Not used Not used over 17 298 200 98 past 5 204 Not used Not used per 3 2 Not used Not used prior 1 3 Not used Not used regarding 1 24 Not used Not used round 8 263 181 82 since 1 13 Not used Not used than 2 46 Not used Not used through 16 649 441 208 throughout 2 20 Not used Not used to 17 1755 1183 572 together 1 7 Not used Not used towards 6 316 214 102 under 16 253 Not used Not used underneath 3 16 Not used Not used until 1 30 Not used Not used unto 3 2 Not used Not used up 4 575 Not used Not used upon 20 109 Not used Not used via 3 48 Not used Not used with 18 1770 1191 579 within 6 79 Not used Not used without 4 37 Not used Not used Totals 27092 16557 8111 In this table a link to a zipped file prep zip where prep identifies the preposition contains data generated during The Preposition Project specifically The Excel spreadsheet containing the data in the web pages Sense Analysis prep xls The tab separated text file used to create the Excel spreadsheet pp prep txt and the Excel spreadsheet

    Original URL path: http://www.clres.com/Preposition%20Analysis%20Summary%20Table.htm (2016-02-11)
    Open archived version from archive


  • Connector Means Evidence Handle Instrument Locus Criteria Cause Ground Treatment Manner Road very wide variety includes all by which clauses follows nouns adjectives including part participles See notes 6 2a TermInterp 2a as Communicate categorization Category a term often in quotes finite verb set passive of mean translate this is really a subset of agent or means with limited scope 7 2b NameUsed 2a as Name bearing Name a name or other personal identifier the word name or its synonyms after the object of call mention or address possibly included in this sense a rose by any other name and similar by another name constructions which are a species of means phrases 8 2c Transport 2a 9 49 5 45 on in Arriving Departing Travel Path Shape Mode of transportation Path Vehicle relatively closed set nouns denoting transport FrameNet Mode of transportation verbs denoting travel 9 2d Parent 1 of a name or kinship term esp wife or husband typically son daughter child 10 2e Parent 1 of pedigree animal name animal name esp animal young 11 2f Accompaniment 2a while during notes describe the two classes of these see notes See notes 12 3 MarginSize 2a change of scalar change event time Change position on a scale Commerce Relative time Expansion Difference Interval Attribute Degree Speed Rate Size Change figures numbers terms denoting quantity mainly predicate after verb or its complement verbs typically are those that can denote a difference between two things 13 3a UnitSize 2a in a unit of measure time volume quantity mainly predicate after verb or its complement verbs typically describe a quantifiable activity 14 3b UnitName 2a a tangible object or a unit of time same as comp e g piece by piece See notes 15 3c UnitName 2a Categorization Separation Differentiation Criteria Criterion Quality a category of classification typically a high freq Hypernym sth that admits of categorization by the complement FrameNet Item verb forms speficying classification are often present segregate separate differ classify etc 16 3d Multiplier 1 2a figures numbers terms denoting quantity a figure or quantity of the same scale as the complement 17 4 TargetTime 2a 9 39 7 70 8 59 at on Quirk7 70 Departing Surpassing Time term or formula denoting a time or the time a temporal event often first in sentence by the time I get to Phoenix 18 5 LocBeside 2a 9 20 beside near alongside next to close to Being attached Body movement Change posture Placing Posture Self motion Social Event Goal Location Place Area tangible object located in space the reference object of the complement verbs motion or of location in space are usually present See also Frame Locative Relation 19 5a PlacePast 2a 8 41 past tangible object located in space the reference object of the complement See notes for sense 19 this is a quasi adverbial use 20 6 ActivePeriod 2a 9 34 5 46 during Cause temperature change Time mainly day and night FrameNet Time typically predicate little discernable

    Original URL path: http://www.clres.com/bytab.htm (2016-02-11)
    Open archived version from archive



  •