Exploring Natural Language:
Working with the British Component of the International Corpus of
English
“This book is a must for anyone who wants to explore the immense possibilities of the ICE-GB.” - The Year’s Work in English Studies, 2004, 83.1 |
Gerald Nelson, Sean Wallis and Bas Aarts, 2002, Amsterdam: John Benjamins. 355 pages hbk/pbk.
ISBN 90 272 4889 3 (Europe) / 1 58811 271 3 (US)
Number G29 in the series Varieties of English Around the World
(series editor: Edgar Schneider).
You may go to John Benjamin's in order to purchase this . The table of contents is listed below.
ICE-GB is a 1 million-word corpus of contemporary British English.
It is fully parsed, and contains over 83,000 syntactic trees. Together
with the dedicated retrieval software, ICECUP, ICE-GB is an unprecedented
resource for the study of English syntax.
Exploring Natural Language is a comprehensive guide to both corpus
and software. It contains a full reference guide for ICE-GB. The
chapters on ICECUP provide complete instructions on the use of the
many features of the software, including concordancing, lexical
and grammatical searches, sociolinguistic queries, random sampling,
and searching for syntactic structures using ICECUP's Fuzzy Tree
Fragment models. Special attention is given to the principles of
experimental design in a parsed corpus.
Six case studies provide step-by-step illustrations of how the corpus
and software can be used to explore real linguistic issues, from
simple lexical studies to more complex syntactic topics, such as
noun phrase structure, verb transitivity, and voice.
Keywords: Corpus Linguistics; International Corpus of English
(ICE); ICE-GB; ICECUP; Grammar; Parsing; Fuzzy Tree Fragments (FTFs);
Research Methods; Corpus Exploration; Experimental Design
See also:
by Lea Cyrus on the Linguist List
CONTENTS
SERIES EDITOR’S INTRODUCTION
FOREWORD
PREFACE
PART 1: Introducing the corpus
1. INTRODUCING ICE-GB
1.1 AIMS AND BACKGROUND
1.2 CORPUS DESIGN
1.3 EXTRA-CORPUS MATERIAL
1.4 COPYRIGHT
1.5 TRANSCRIPTION AND MARKUP
1.6 PART-OF-SPEECH TAGGING
1.7 SYNTACTIC PARSING
1.8 CROSS-SECTIONAL CHECKING
1.9 DIGITIZATION
1.10 EXAMINING ICE-GB TEXTS
2. THE ICE-GB GRAMMAR
2.1 INTRODUCTION
2.2 ICE WORD CLASSES
- Adjective (ADJ)
- Adverb (ADV)
- Article (ART)
- Auxiliary verb (AUX)
- Cleft it (CLEFTIT)
- Conjunction (CONJUNC)
- Connective (CONNEC)
- Existential there (EXTHERE)
- Formulaic expression (FRM)
- Genitive marker (GENM)
- Interjection (INTERJEC)
- Noun (N)
- Nominal Adjective (NADJ)
- Numeral (NUM)
- Preposition (PREP)
- Proform (PROFM)
- Pronoun (PRON)
- Particle (PRTCL)
- Reaction signal (REACT)
- Verb (V)
- Miscellaneous tags
2.3 FUNCTIONS AND CATEGORIES
- Adverbial (A) [Function]
- Adjective Phrase (AJP) [Category]
- Adjective Phrase Head (AJHD) [Function]
- Adjective Phrase Postmodifier (AJPO) [Function]
- Adjective Phrase Premodifier (AJPR) [Function]
- Adverb Phrase Head (AVHD) [Function]
- Adverb Phrase (AVP) [Category]
- Adverb Phrase Postmodifier (AVPO) [Function]
- Adverb Phrase Premodifier (AVPR) [Function]
- Auxiliary Verb (AVB) [Function]
- Central Determiner (DTCE) [Function]
- Clause (CL) [Category]
- Cleft Operator (CLOP) [Function]
- Conjoin (CJ) [Function]
- Coordinator (COOR) [Function]
- Detached Function (DEFUNC) [Function]
- Determiner (DT) [Function]
- Determiner Phrase (DTP) [Category]
- Determiner Postmodifier (DTPO) [Function]
- Determiner Premodifier (DTPR) [Function]
- Direct Object (OD) [Function]
- Discourse Marker (DISMK) [Function]
- Disparate (DISP) [Category]
- Element (ELE) [Function]
- Empty (EMPTY) [Category]
- Existential Operator (EXOP) [Function]
- Floating Noun Phrase Postmodifier (FNPPO) [Function]
- Focus (FOC) [Function]
- Focus Complement (CF) [Function]
- Genitive function (GENF) [Function]
- Imperative Operator (IMPOP) [Function]
- Indeterminate (INDET) [Function]
- Indirect Object (OI) [Function]
- Interrogative Operator (INTOP) [Function]
- Inverted Operator (INVOP) [Function]
- Main Verb (MVB) [Function]
- Nonclause (NONCL) [Category]
- Notional Direct Object (NOOD) [Function]
- Notional Subject (NOSU) [Function]
- Noun Phrase (NP) [Category]
- Noun Phrase Head (NPHD) [Function]
- Noun Phrase Postmodifier (NPPO) [Function]
- Noun Phrase Premodifier (NPPR) [Function]
- Object Complement (CO) [Function]
- Operator (OP) [Function]
- Parataxis (PARA) [Function]
- Parsing Unit (PU) [Function]
- Postdeterminer (DTPS) [Function]
- Predeterminer (DTPE) [Function]
- Predicate Element (PREDEL) [Category]
- Predicate Group (PREDGP) [Function]
- Prepositional (P) [Function]
- Prepositional Complement (PC) [Function]
- Prepositional Modifier (PMOD) [Function]
- Prepositional Phrase (PP) [Category]
- Provisional Direct Object (PROD) [Function]
- Provisional Subject (PRSU) [Function]
- Stranded Preposition (PS) [Function]
- Subject (SU) [Function]
- Subject Complement (CS) [Function]
- Subordinator Phrase Head (SBHD) [Function]
- Subordinator Phrase Modifier (SBMO) [Function]
- Subordinator (SUB) [Function]
- Subordinator Phrase (SUBP) [Category]
- Tag Question (TAGQ) [Function]
- Particle To (TO) [Function]
- Transitive Complement (CT) [Function]
- Verbal (VB) [Function]
- Verb Phrase (VP) [Category]
2.4 FEATURE LABELS
2.5 SPECIAL TOPICS IN THE ICE-GB GRAMMAR
- Inversion
- Interrogative
- Imperative
- Coordination
- Direct Speech
PART 2: Exploring the corpus
3. INTRODUCING THE ICE CORPUS UTILITY PROGRAM (ICECUP)
3.1 FIRST IMPRESSIONS
3.2 THE CORPUS MAP
3.3 BROWSING THE RESULTS OF QUERIES
3.4 VIEWING TREES IN THE CORPUS
3.5 VARIABLE QUERIES
3.6 ‘SINGLE GRAMMATICAL NODE’ QUERIES
3.7 MARKUP QUERIES
3.8 RANDOM SAMPLING
3.9 TEXT FRAGMENT QUERIES
3.10 FUZZY TREE FRAGMENT SEARCHES
3.11 OPEN FILE
3.12 SAVE TO DISK
3.13 SEARCH OPTIONS
4. BROWSING THE CORPUS
4.1 THE IDEA OF CORPUS EXPLORATION
4.2 NAVIGATING THE CORPUS MAP
4.3 BROWSING SINGLE TEXTS
4.4 THE TEXT BROWSER WINDOW
4.5 VIEWING WORD CLASS TAGS
4.6 CONCORDANCING A QUERY
4.7 DISPLAYING TREES IN THE TEXT
4.8 GRAMMATICAL CONCORDANCING IN ICECUP 3.1
4.9 DISPLAYING TREES IN A SEPARATE WINDOW
4.10 CONCORDANCING, MATCHING AND VIEWING TREES
4.11 LISTENING TO SPEAKERS IN THE CORPUS
4.12 SELECTING TEXT UNITS IN ICECUP 3.1
5. FUZZY TREE FRAGMENTS AND TEXT QUERIES
5.1 THE TEXT FRAGMENT QUERY WINDOW
5.2 SEARCHING FOR WORDS, TAGS AND TREE NODES
5.3 MISSING WORDS AND SPECIAL CHARACTERS
5.4 EXTENDING THE QUERY INTO THE TREE
5.5 INTRODUCING FUZZY TREE FRAGMENTS
5.6 AN OVERVIEW OF COMMANDS TO CONSTRUCT FTFS
5.7 CREATING A SIMPLE FTF
5.8 ADDING A FEATURE AND RELATING A WORD TO THE TREE
5.9 MOVING NODES AND BRANCHES
5.10 APPLYING A MULTIPLE SELECTION AND SETTING THE FOCUS OF AN FTF
5.11 TEXT-ORIENTED FTFS REVISITED
5.12 THE GEOMETRY OF FTFS
5.13 HOW FTFS MATCH AGAINST THE CORPUS
5.14 THE FTF CREATION WIZARD: A TOOL FOR MAKING FTFS FROM TREES
6. COMBINING QUERIES
6.1 A SIMPLE EXAMPLE
6.2 VIEWING THE QUERY EXPRESSION
6.3 MODIFYING THE LOGIC OF QUERY COMBINATIONS
6.4 USING DRAG AND DROP TO MANIPULATE QUERY EXPRESSIONS
6.5 REMOVING PARTS OF THE QUERY
6.6 LOGIC AND FUZZY TREE FRAGMENTS
6.7 EDITING QUERY ELEMENTS
6.8 MODIFYING THE FOCUS OF AN FTF DURING BROWSING
6.9 BACKGROUND FTF SEARCHES AND THE QUERY EDITOR
6.10 SIMPLIFYING THE QUERY
7. ADVANCED FACILITIES IN ICECUP 3.1
7.1 INTRODUCING ICECUP 3.1
7.2 THE LEXICON
7.3 THE GRAMMATICON
7.4 STATISTICAL TABLES
7.5 LEXICAL WILD CARDS
7.6 EXTENSIONS TO FUZZY TREE FRAGMENT NODES
- Performing exact matching in FTFs
- Specifying missing features and pseudo-features
- Specifying sets of functions, categories and features
- Specifying a logical formula
PART 3: Performing research with the corpus
8. CASE STUDIES USING ICE-GB
8.1 CASE STUDY 1: PRETTY MUCH AN ADVERB
8.2 CASE STUDY 2: EXPLORING THE LEXEME BOOK WITH THE LEXICON
8.3 CASE STUDY 3: TRANSITIVITY AND CLAUSE TYPE
8.4 CASE STUDY 4: WHAT SIZE FEET HAVE YOU GOT? WH-DETERMINERS IN NOUN PHRASES
8.5 CASE STUDY 5: ACTIVE AND PASSIVE CLAUSES
8.6 CASE STUDY 6: THE POSITIONS OF IF-CLAUSES
9. PRINCIPLES OF EXPERIMENTAL DESIGN WITH A PARSED CORPUS
9.1 WHAT IS A SCIENTIFIC EXPERIMENT?
9.2 WHAT IS AN EXPERIMENTAL HYPOTHESIS?
9.3 THE BASIC APPROACH: CONSTRUCTING A CONTINGENCY TABLE
9.4 WHAT MIGHT SIGNIFICANT RESULTS MEAN?
9.5 HOW CAN WE MEASURE THE ‘SIZE’ OF A RESULT?
- Relative size
- Relative swing
- Chi-square contribution
- Cramer’s phi
9.6 COMMON ISSUES IN EXPERIMENTAL DESIGN
- Have we specified the null hypothesis incorrectly?
- Are all the relevant values listed together?
- Are we really dealing with the same linguistic choice?
- Have we counted the same thing twice?
9.7 INVESTIGATING GRAMMATICAL INTERACTIONS
9.8 THREE STUDIES OF INTERACTION IN THE GRAMMAR
- Two features within a single constituent
- Two features in a structure
- A feature and an optional constituent
- Footnote: dealing with overlapping cases
PART 4: The future of the corpus
10. FUTURE PROSPECTS
10.1 EXTENDING THE ANNOTATION IN THE CORPUS
10.2 EXTENDING THE EXPRESSIVITY OF FUZZY TREE FRAGMENTS
10.3 INCORPORATING EXPERIMENTS IN SOFTWARE
10.4 KNOWLEDGE DISCOVERY IN CORPORA
10.5 AIDING THE ANNOTATION OF CORPORA
10.6 TEACHING GRAMMAR WITH CORPORA
REFERENCES
APPENDIX 1. ICE TEXT CATEGORIES AND CODES
A1.1 SPOKEN CATEGORIES
A1.2 WRITTEN CATEGORIES
APPENDIX 2. SOURCES OF ICE-GB TEXTS
A2.1 S1A-001 TO S1A-090: DIRECT CONVERSATIONS
A2.2 S1A-091 TO S1A-100: TELEPHONE CALLS
A2.3 S1B-001 TO S1B-020: CLASSROOM LESSONS
A2.4 S1B-021 TO S1B-040: BROADCAST DISCUSSIONS
A2.5 S1B-041 TO S1B-050: BROADCAST INTERVIEWS
A2.6 S1B-051 TO S1B-060: PARLIAMENTARY DEBATES
A2.7 S1B-061 TO S1B-070: LEGAL CROSS-EXAMINATIONS
A2.8 S1B-071 TO S1B-080: BUSINESS TRANSACTIONS
A2.9 S2A-001 TO S2A-020: SPONTANEOUS COMMENTARIES
A2.10 S2A-021 TO S2A-050: UNSCRIPTED SPEECHES
A2.11 S2A-051 TO S2A-060: DEMONSTRATIONS
A2.12 S2A-061 TO S2A-070: LEGAL PRESENTATIONS
A2.13 S2B-001 TO S2B-020: NEWS BROADCASTS
A2.14 S2B-021 TO S2B-040: BROADCAST TALKS (SCRIPTED)
A2.15 S2B-041 TO S2B-050: NON-BROADCAST SPEECHES (SCRIPTED)
A2.16 W1A-001 TO W1A-010: UNTIMED STUDENT ESSAYS
A2.17 W1A-011 TO W1A-020: STUDENT EXAMINATION SCRIPTS
A2.18 W1B-001 TO W1B-015: SOCIAL LETTERS
A2.19 W1B-016 TO W1B-030: BUSINESS LETTERS
A2.20 W2A-001 TO W2A-040: ACADEMIC WRITING
A2.21 W2B-001 TO W2B-040: POPULAR WRITING
A2.22 W2C-001 TO W2C-020: NEWSPAPER REPORTS
A2.23 W2D-001 TO W2D-010: ADMINISTRATIVE/REGULATORY WRITING
A2.24 W2D-011 TO W2D-020: SKILLS AND HOBBIES
A2.25 W2E-001 TO W2E-010: PRESS EDITORIALS
A2.26 W2F-001 TO W2F-020: FICTION
APPENDIX 3. BIBLIOGRAPHICAL AND BIOGRAPHICAL VARIABLES
APPENDIX 4. STRUCTURAL MARKUP SYMBOLS
APPENDIX 5. A QUICK REFERENCE GUIDE TO THE ICE GRAMMAR
APPENDIX 6. SPECIAL CHARACTERS USED IN ICE-GB
INDEX
This page last modified 14 May, 2020 by Survey Web Administrator.