Corpus-based preparation | Center for Augmented Interpretation

Develop an effective corpus-based preparation to interpreting assignments
Reflect on the difference between manually and automatically built corpora, in terms of time and quality of results
Learn how to analyze machine-built corpora with a concordancer: term extraction and analysis of concordances
Create more effective bilingual glossaries through corpus analysis

Option 1: the students are divided into 2 groups. Group A creates a corpus manually, group B with a dedicated tool.
Option 2: the students work in pairs. Student A creates a corpus manually, student B with a dedicated tool.
Provide the students with a topic, e.g. "solar energy"
The students do a short brainstorming session and come up with key words. You can also let students interpret a speech on the topic first and derive the key words from it. For solar energy, for example, you can use these key words: solar energy, photovoltaics, solar cell, single-axis suntracking, renewables, concentrating, photovoltaics system, dye-sensitized cells, renewable energy, solar panel, clean energy .
Students use the key words to do a web search (A) or to build the corpus with BootCat/CorpusMode (B). You can start with one language and build a comparable corpus after the discussion.
The groups/student pairs compare the time needed to build the corpus and discuss the perceived advantages and disadvantages in using the two methods, first with the groups/pairs, and then within a general session in preparation for exercise 2.

Provide examples of how to perform a search within BootCat or CorpusMode
Verify that the students know how to perform advanced searches on the web
Provide a maximum number of pages to use to build the corpus
The exercise can be repeated by switching the roles for the other language or for a different topic

Learn how to use a concordancer
Compare the use of a concordancer for terminology extraction with manual terminology extraction

Methodology:

Follow the same division in groups/pairs as in Exercise 1
Ask the students to prepare a terminology list with a maximum of 150 entries
Students should look for 50 unigrams (e.g. "renewables"), 50 bigrams (e.g. "carbon dioxide") and 50 trigrams (e.g. "woody biomass fuels")
Group discussion: which method took less time? Is the quality of the corpora comparable? What could be the advantages and disadvantages of using a concordancer? And of extracting the terminology manually? Are there some key terms that were not found using the concordancer? Which method is more effective to analyze concordances? Why?
Option 1: Create a bilingual glossary starting from the terminology list
Option 2: Use InterpretBank to automatically extract the terminology, either from the corpora or from preparation documents provided to the students

Tips:

Provide practical examples
You do not need to introduce all functions of a concordancer to the students, the introduction can be step by step
The exercise can be repeated by switching the roles for the other language or for a different topic
Show the students how corpus analysis can be used to confirm or reject the solutions found with other sources (e.g. online databases)