Even the most advanced linguistic theories cannot pretend to cover all computational problems, at least at present. Indeed, all of them evidently have the following limitations:
· Only the problems of morphology and syntax are under intensive elaboration in these theories, whereas semantics is investigated to a significantly lesser degree. The goal of atomization and quite consistent decomposition of semantic components remained unattained. The more limited problem of rational decomposition of word meaning, i.e., of the semantic representation for a given lexeme through the meaning of some other more simple ones, is not yet solved on a large scale in any language. This is the great problem of lexical semantics. It develops well, but computational linguistics considers this development too slow and lacking immediate connection with computations.
· Modern semantics cannot yet formalize its problems adjacent to pragmatics to the degree sufficient for applications. Indeed, there is no complete theory of links between the meaning of text and the goals of this text in a practical situation, as well as between the speaker’s intentions and the listener’s perception, though there are numerous valuable observations on these aspects. For example, computational linguistics cannot distinguish that the Spanish utterance ¿Dónde está la sal? is a real query for information about the salt, whereas ¿Podría usted pasarme la sal? is a request for the specific action given in a polite form. As another example, no automate can “comprehend” so far that the sentence Niños son niños is not a trivial tautology, but the idea that children have specific features of their own and thus should be treated properly. To cope with such intricacies, linguists should model the human world with respect to its customs, habits, etiquette, relations between generations, etc. This is an extralinguistic knowledge of encyclopedic type. Until now, computational linguistics and artificial intelligence do not know how to effectively distinguish, to assemble apart and then to combine the knowledge of purely linguistic and evidently encyclopedic type. What is more, a “dictionary,” which specifies all encyclopedic information needed for comprehension of texts rather simple for a human to understand, would be so huge that it is unlikely to be compiled with sufficient completeness in the nearest future.
· The results of the recent investigations mainly cover separate sentences, but not discourses. The complicated semantics of discourse, including the information on referential links between different entities, a target matter under explanation, current author’s estimations, and the author’s planning of the discourse, still waits its deeper research.
· It is well known in theoretical linguistics, that the set of wordforms comprising a sentence is chosen according to the main matter of this sentence, whereas the word order depends both on this wordform set (e.g., a preposition should precede the related noun) and on communicative structure of a text. This notion reflects what the author considers already known or presupposed at this stage of the communication process (i.e. topic) and what information he or she chooses to communicate right now (i.e. comment). In generative grammars, the variations of word order depending on communicative structure of texts had not been even noticed for a long time. The MTT and general linguistics as a whole give now a rather elaborated informal study of these problems. For example, this study explains the obvious difference in meaning between Spanish sentences Juan llegó ‘Juan came’ and Llegó Juan ‘It is Juan who came,’ where the same words go in different order. As a more complicated example using the same wordforms but in different order, the sentence En México se habla el español ‘In Mexico, Spanish is spoken’ turns to be unconditionally true, while the meaning of El español se habla en México ‘Spanish is spoken in Mexico’ is quite different and dubious, since Spanish is spoken not only in Mexico. In spite of all the theoretical advances, the global formalization of communicative structures is not yet attained. So far, these advances cannot be used for either text synthesis or analysis.
· The problem of how people learn natural language in their childhood remains unsolved. The idea of linguistic universalities once introduced in general linguistics has transformed now into the idea of the Universal Grammar by Chomsky. All languages are considered species of this grammar, with a finite set of generalized features supposedly adjustable to a specific “option” (Spanish, English, etc.). Newborn children are supposed to have the Universal Grammar in their brains, and their adaptation to a specific language is accomplished at childhood. However, the goal to discover the structure and laws of the Universal Grammar remains unattained until now. Thus, computational linguistics cannot propose any universal algorithms equally applicable to various languages.
Even proponents of the contemporary linguistic theories do not believe that all facts of languages can be interpreted through their favorite ideas, to solve current problems of computational linguistics. Meanwhile, the science advances, maybe slower than we wish.
The readers of this book will be able to learn from it a multiplicity of already known linguistic facts and laws. In the same time, they can realize that numerous interesting and very difficult problems, with which computational linguistics is faced nowadays, stay yet unsolved. They are still awaiting a Chomsky of their own.
CONCLUSIONS
A linguistic model is a system of data (features, types, structures, levels, etc.) and rules, which, taken together, can exhibit a “behavior” similar to that of the human brain in understanding and producing speech and texts. A functional linguistic model takes into account the observed language behavior of human beings rather than the physiological activity of the brain. This behavior is reflected in the texts or speech they produce in response to the texts or speech they perceive.
So far, the direct modeling of the brain structures has failed, and several functional models were proposed for the sake of computational linguistics. The modern functional models have many features in common. They are intended to be quite formal, have a dynamic and non-generative character, provide independence of linguistic algorithms from linguistic data, and consider dictionaries as one of the main, inalienable parts of the model.
Theoretical approaches provide a solid basis for both holistic and reduced models of language oriented to applications. The degree of the reduction in such a model heavily depends on the specific application.
EXERCISES
THIS SECTION CONTAINS some review questions recommended to the readers to verify their correct understanding of the contents of the book, and the problems recommended for exams.
REVIEW QUESTIONS
THE FOLLOWING QUESTIONS can be used to check whether the reader has understood and remembered the main contents of the book. The questions are also recommended for the exam on this course of Computational Linguistics The questions marked with the sign ° are the most important ones.
1. Why is automatic processing of natural language important for the humankind? 2. Why are theoretical aspects of linguistics necessary for computational linguistics? 3. How are related the methods of computational linguistics and of artificial intelligence? 4. How is coordinated computational linguistics with computer science? 5. What is general linguistics? 6. What aspects of natural language do phonology, morphology, syntax, semantic, and pragmatic study? 7. What is historical linguistics? Contrastive linguistics? Sociolinguistics? 8. What is dialectology? 9. What is lexicography? Why is it important for NL processing? 10. What are the narrower and the broader comprehension of mathematical linguistics? 11. What is computational linguistics? How is it related with applied linguistics? | |
° | 12. What is the structuralist approach in general linguistics? 13. What are constituents? What is constituency tree? 14. What mathematical means were proposed by Noam Chomsky? What purposes can they serve for? 15. What example of context-free grammar to generate simple sentences do you know? 16. What are transformation grammars? 17. What are valencies in linguistics? What is the difference between syntactic and semantic valencies? 18. What are subcategorization frames and how they describe valencies of verbs? 19. What are constraints in computational linguistics? 20. What is the notion of head in Head-driven Phrase Structure Grammar? 21. What is the idea of unification in computational linguistics? 22. Why should language be viewed as a transformer? 23. Why should this transformer be considered to contain several stages of transformation? 24. What is meant by Text in the Meaning Û Text Theory? 25. What is meant by Meaning in the Meaning Û Text Theory? 26. What are the main levels of language representation? Which levels are called surface ones, and which deep one? Why? 27. What are dependency tree in computational linguistics? 28. What are the two methods of information representation on semantic level? What are the semantic labels? Are they words? 29. What are government patterns in the Meaning Û Text Theory? How they describe syntactic and semantic valencies? 30. What are the main applications and classes of applications of computational linguistics? 31. What linguistic knowledge is used in hyphenation programs? Spell checkers? Grammar checkers? Style checkers? 32. What linguistic knowledge is used in information retrieval systems? In what a way does this knowledge influence the main characteristics of information retrieval systems? 33. How can we determine automatically the theme of the document? 34. How is the linguistic knowledge used in automatic translation? What are the main stages of automatic translation? Are all of these stages always necessary? 35. What is automatic text generation? 36. What are specifics of natural language interfaces? 37. What is extraction of factual data from texts? 38. What is language understanding? What linguistic knowledge should it employ? What are the main difficulties in creation of systems for language understanding? 39. What is EuroWordNet? 40. Do optical character recognition and speech recognition require linguistic knowledge? 41. What is modeling in general? 42. What kinds of linguistic modeling do you know? What are research linguistic models used for? 43. What are functional models in linguistics? What are their common features? 44. Are the Meaning Û Text Theory and Head-driven Phrase Structure Grammar functional models? 45. What are specific features of the Meaning Û Text model? 46. What are holistic and reduced models? Is the most detailed and broad model always the better one? 47. What aspects of language are not covered by modern linguistic models? |
° | 48. Word, wordform, and lexeme, what is the difference between them? When can we use each of them? 49. What is synonymy? What kinds of synonyms exist? Can synonymy be avoided in natural language? 50. What is homonymy? What kinds of homonyms exist? Can homonymy be avoided in natural language? 51. What are metaphoric and metonymic methods of creation of new words in natural language? 52. What are the specifics of computer-based dictionaries? 53. What is analogy in linguistics? How can we use it for NL processing? 54. What is empirical approach in linguistics? To what kind of problems can it be applied? |
° | 55. What is a sign? What is a linguistic sign? 56. What is the syntactics of a linguistic sign in the Meaning Û Text Theory? 57. What is the structure of the linguistic sign in Head-driven Phrase Structure Grammar? 58. Are there any commonalities in the linguistic description between generative, Meaning Û Text, and constraint-based approaches? |