ENGLISH ALGORITHMIC GRAMMAR

Hristo Georgiev

TABLE OF CONTENTS



Preface




Part 1




1. Algorithmic recognition of the verb




1.1 Introduction




1.2 Basic assumptions and some facts




1.3 Algorithm for automatic recognition of verbal and nominal word groups. Algorithm No 1.




1.4 Conclusion




2. Division of the sentence into phrases.




2.1 Introduction




2.2 Presentation




2.3 Algorithm for division of the sentence into phrases. Algorithm No 2




3.4 Discussion




3. Algorithmic recognition of parts of speech




3.1 Introduction




3.2 Presentation




3.3 Algorithm for recognition of parts of speech. Algorithm No 3




3.4 Discussion




3.5 Examples of the performance of the algorithm




3.6 Lists




3.7 Algorithmic procedure to determine the use of adjectives, nouns, participles, numerals and adverbs as attributes to the noun.




4. Algorithmic recognition of the tenses




4.1 Presentation




4.2 Presentation of the algorithm. Algorithm No 6




4.3 Discussion




Part 2




5. Syntactical structure of the sentence




5.1 Introduction




5.2 Syntactic structures




5.3 The segment




5.4 Presentation of the segment




6. Composition of the segments




6.1 Introduction




6.2 Examples of manual extraction of segments from a text




6.3 Types of segments




7. Parsing algorithm




7.1 Identification of the segment




7.2 Parsing of the segment




8. Links of predicates and incomplete segments




8.1 Links of P1.




8.2 Links of Pi




8.3 Links of j -segment




8.4 Links of G -segment




8.5 Links of v -segment




8.6 Links of and -segment




8.7 Links of the infinitive




8.8 Other links and discussion of the links




9. Reference




9.1 Reference within the segment




9.2 Pronominal reference




10. Recognition of the independent and dependent Clauses




10.1 Algorithm No. 20




10.2 Role and meaning of Conjunctions




10.3 The sentence




10.4 Creation of interrogative sentences




11. Further applications




Appendix I: List of Prepositions and Conjunctions and their most characteristic meaning




Appendix II: Internet downloads




General index of abbreviations




References




Index




Preface

The main purpose of this book is to bridge the gap between traditional and computational grammar, showing how the traditional grammar can be turned into computational without loss of readers. There were no previous attempts made in this direction, since all computational linguists have used Artificial Languages for their algorithmic notation. By doing so, they exclude those readers , who are unfamiliar with formal languages, computers and how they operate, but are eager to learn. Some of those readers are English language students and teachers. A computational grammar can be read and understood by humans and by computers only if it is written in a language they can both comprehend. For the humans, this is the Natural Language (in our case - English), for the computers, this is the rigid and unequivocal algorithmic language. When the algorithmic language is expressed in Natural Language, say English, it can be made legible for humans and at the same time it can be easily turned into a computer software program using one of the artificial languages to program it. So, in this book, we will provide a formal description of English grammar (syntax) for the computer, in two parts. In part 1 we will introduce procedures for automatic recognition (disambiguation) of the Parts of Speech in a text. Part 2 will deal with the sentence and the interrelationship of its constituent elements, including Parsing and Pronominal reference. The algorithmic approach to grammar is a step by step approach, in line with the digital thinking of the computer. Such an approach leaves nothing unresolved, since the computer cannot make a step further without having solved the task presented at the previous step first. The algorithmic approach leaves no room for errors. Errors accumulate, if not corrected on time, and frustrate the operation of the whole system. The functioning of the algorithm and, hence, the performance of the computer software program is entirely dependent on the formal method of description of the language. If this method is inadequate, if it cannot describe every word and every sentence, then this method is useless to the computer. The algorithmic approach, unlike other methods, can be verified. We can check each step of the algorithm, manually, and be personally convinced if the decision taken by the computer at this step is true or false. English grammar, as seen through the digital eyes of the computer, looks like an endless chain of operations (instructions) and decisions aimed at resolving a particular grammatical or semantical task. The present grammar is designed for text analysis, not for text synthesis, though, after some additions and exclusions, it can be used for the latter purpose if one is willing to generate syntactically correct, but meaningless sentences on a computer. In the classroom, for teaching purposes, the students may use it to generate meaningful sentences, by adding words to the list of syntactical structures. English Algorithmic Grammar has a very wide scope of application. It can be used to study, teach and exercise English grammar (syntax) at all levels. It could serve to introduce the linguist at undergraduate, postgraduate or faculty level to computers and to the computer way of thinking and decision taking and the computer scientist or hobbyist to linguistics. Many Natural Language Processing teams in the world may find its algorithms preferable for implementation. English Algorithmic Grammar is both a textbook and a reference book. It is accompanied by a Dictionary of Segments, available for free download on the Internet (see Internet Downloads at the end of the book), containing some 27000 syntactically correct structures, permitted by English grammar. The structures are pre-parsed and can be used for reference by English speakers and non speakers alike. The reader needs no special knowledge of the related fields (mathematics and computational linguistics) in order to be able to understand this book.