Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora

Kontonatsios, Georgios, Korkontzelos, Ioannis, Tsujii, Jun'ichi and Ananiadou, Sophia (2014) Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), October 25–29, 2014, Doha, Qatar, pp. 1701-1712.

[img] PDF
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (380kB)


Automatically compiling bilingual dictionaries of technical terms from comparable corpora is a challenging problem, yet with many potential applications. In this paper, we exploit two independent observations about term translations: (a) terms are often formed by corresponding sub-lexical units across languages and (b) a term and its translation tend to appear in similar lexical context. Based on the first observation, we develop a new character n-gram compositional method, a logistic regression classifier, for learning a string similarity measure of term translations. According to the second observation, we use an existing context-based approach. For evaluation, we investigate the performance of compositional and context-based methods on: (a) similar and unrelated languages, (b) corpora of different degree of comparability and (c) the translation of frequent and rare terms. Finally, we combine the two translation clues, namely string and contextual similarity, in a linear model and we show substantial improvements over the two translation signals.

Item Type: Conference or Workshop Item (Paper)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Computing and Information Systems
Date Deposited: 08 Feb 2016 12:48

Archive staff only

Item control page Item control page