文献类型: 会议论文
第一作者: Marta Vazquez Abuin
作者: Marta Vazquez Abuin 1 ; Marcos Garcia 1 ;
作者机构: 1.Centro Singular de Investigacion en Tecnoloxaas Intelixentes (CiTIUS), Universidade de Santiago de Compostela
关键词: WordNet;Lexical semantics;Distributional semantics;Galician
会议名称: EPIA Conference on Artificial Intelligence
主办单位:
页码: 280-291
摘要: This paper explores various strategies to expand Galnet (the Galician WordNet) with both word entries and sentence examples from the English WordNet. To obtain translation equivalents for a given word in a synset, we rely on lemmatized and POS-tagged bilingual word embeddings, used as probabilistic dictionaries. Concerning the examples, we use state-of-the-art English-Galician neural machine translation models. Based on these resources, we have designed and evaluated straightforward heuristics to expand Galnet. The proposed approach allows us to obtain more than 13k high-quality example sentences in Galician, and more than 4,5k new entries for Galnet. Critically, we have performed a set of careful qualitative analyses to verify the suitability of each step, assessing the adequacy of the obtained word forms of the quality of the automatic translation. The results of these analyses shed light on the performance of each stage of the process, which is valuable information also to adapt our method to other languages.
分类号: tp18-53
- 相关文献