Corpus-based preparation

General objectives
  • Develop an effective corpus-based preparation to interpreting assignments
  • Reflect on the difference between manually and automatically built corpora, in terms of time and quality of results
  • Learn how to analyze machine-built corpora with a concordancer: term extraction and analysis of concordances
  • Create more effective bilingual glossaries through corpus analysis
Exercise 1: Corpus creation

Objective:

  • Learn how to use tools for corpus creation
  • Compare manually and automatically built corpora
  • Create a multilingual glossary

Tools:

BootCat

Methodology:

  • Option 1: the students are divided into 2 groups. Group A creates a corpus manually, group B with a dedicated tool.
  • Option 2: the students work in pairs. Student A creates a corpus manually, student B with a dedicated tool.
  • Provide the students with a topic, e.g. "solar energy"
  • The students do a short brainstorming session and come up with key words. You can also let students interpret a speech on the topic first and derive the key words from it. For solar energy, for example, you can use these key words: solar energy, photovoltaics, solar cell, single-axis suntracking, renewables, concentrating, photovoltaics system, dye-sensitized cells, renewable energy, solar panel, clean energy .
  • Students use the key words to do a web search (A) or to build the corpus with BootCat/CorpusMode (B). You can start with one language and build a comparable corpus after the discussion.
  • The groups/student pairs compare the time needed to build the corpus and discuss the perceived advantages and disadvantages in using the two methods, first with the groups/pairs, and then within a general session in preparation for exercise 2.

Tips:

  • Provide examples of how to perform a search within BootCat or CorpusMode
  • Verify that the students know how to perform advanced searches on the web
  • Provide a maximum number of pages to use to build the corpus
  • The exercise can be repeated by switching the roles for the other language or for a different topic
Exercise 2: Corpus analysis and glossary creation

Objective:

  • Learn how to use a concordancer
  • Compare the use of a concordancer for terminology extraction with manual terminology extraction

Tools:

AntConc

Methodology:

  • Follow the same division in groups/pairs as in Exercise 1
  • Ask the students to prepare a terminology list with a maximum of 150 entries
  • Students should look for 50 unigrams (e.g. "renewables"), 50 bigrams (e.g. "carbon dioxide") and 50 trigrams (e.g. "woody biomass fuels")
  • Group discussion: which method took less time? Is the quality of the corpora comparable? What could be the advantages and disadvantages of using a concordancer? And of extracting the terminology manually? Are there some key terms that were not found using the concordancer? Which method is more effective to analyze concordances? Why?
  • Option 1: Create a bilingual glossary starting from the terminology list
  • Option 2: Use InterpretBank to automatically extract the terminology, either from the corpora or from preparation documents provided to the students

Tips:

  • Provide practical examples
  • You do not need to introduce all functions of a concordancer to the students, the introduction can be step by step
  • The exercise can be repeated by switching the roles for the other language or for a different topic
  • Show the students how corpus analysis can be used to confirm or reject the solutions found with other sources (e.g. online databases)