Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

Kontonatsios, Georgios, Korkontzelos, Yannis, Tsujii, Jun'ichi and Ananiadou, Sophia (2014) Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, 26–30 April 2014, Gothenburg, Sweden, pp. 111-116.

[img] PDF
E14-4022.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (158kB)

Abstract

We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.

Item Type: Conference or Workshop Item (Paper)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Computing and Information Systems
Date Deposited: 08 Feb 2016 12:49
URI: http://repository.edgehill.ac.uk/id/eprint/6991

Archive staff only

Item control page Item control page