General objectives
- Develop an effective corpus-based preparation to interpreting assignments
- Reflect on the difference between manually and automatically built corpora, in terms of time and quality of results
- Learn how to analyze machine-built corpora with a concordancer: term extraction and analysis of concordances
- Create more effective bilingual glossaries through corpus analysis
Exercise 1: Corpus creation
Objective:
- Learn how to use tools for corpus creation
- Compare manually and automatically built corpora
- Create a multilingual glossary
Tools:
Methodology:
- Option 1: the students are divided into 2 groups. Group A creates a corpus manually, group B with a dedicated tool.
- Option 2: the students work in pairs. Student A creates a corpus manually, student B with a dedicated tool.
- Provide the students with a topic, e.g. "solar energy"
- The students do a short brainstorming session and come up with key words. You can also let students interpret a speech on the topic first and derive the key words from it. For solar energy, for example, you can use these key words: solar energy, photovoltaics, solar cell, single-axis suntracking, renewables, concentrating, photovoltaics system, dye-sensitized cells, renewable energy, solar panel, clean energy .
- Students use the key words to do a web search (A) or to build the corpus with BootCat/CorpusMode (B). You can start with one language and build a comparable corpus after the discussion.
- The groups/student pairs compare the time needed to build the corpus and discuss the perceived advantages and disadvantages in using the two methods, first with the groups/pairs, and then within a general session in preparation for exercise 2.
Tips:
- Provide examples of how to perform a search within BootCat or CorpusMode
- Verify that the students know how to perform advanced searches on the web
- Provide a maximum number of pages to use to build the corpus
- The exercise can be repeated by switching the roles for the other language or for a different topic
Exercise 2: Corpus analysis and glossary creation
Objective:
- Learn how to use a concordancer
- Compare the use of a concordancer for terminology extraction with manual terminology extraction
Tools:
Methodology:
- Follow the same division in groups/pairs as in Exercise 1
- Ask the students to prepare a terminology list with a maximum of 150 entries
- Students should look for 50 unigrams (e.g. "renewables"), 50 bigrams (e.g. "carbon dioxide") and 50 trigrams (e.g. "woody biomass fuels")
- Group discussion: which method took less time? Is the quality of the corpora comparable? What could be the advantages and disadvantages of using a concordancer? And of extracting the terminology manually? Are there some key terms that were not found using the concordancer? Which method is more effective to analyze concordances? Why?
- Option 1: Create a bilingual glossary starting from the terminology list
- Option 2: Use InterpretBank to automatically extract the terminology, either from the corpora or from preparation documents provided to the students
Tips:
- Provide practical examples
- You do not need to introduce all functions of a concordancer to the students, the introduction can be step by step
- The exercise can be repeated by switching the roles for the other language or for a different topic
- Show the students how corpus analysis can be used to confirm or reject the solutions found with other sources (e.g. online databases)