Desinonimizacija kroz prizmu korpusne i vektorske analize kontekstualnih preferenci leksema kompjuter i računar
kompjuter, računar, synonyms, desynonymization, distributional semantics, context, corpus, word2vecAbstract
In this paper, we discuss the meanings of the Serbian words kompjuter and računar (‘computer’) in terms of their semantic proximity. Using web corpus data, as well as word2vec method for measuring the cosine similarity between their vector representations (based on their contextual preferences), we conclude that, contrary to popular belief, these words should not be considered absolute synonyms. Specifically, we propose that the loanword kompjuter has a narrower sense (‘PC’, ‘desktop computer’), whereas its loan-translation counterpart računar carries a broader meaning (‘any larger computational device’). This conclusion is based on the fact that the two lexemes in question do not share distributional patterns: while kompjuter is typically used as a syntactically free expression, računar is often preceded by various attributes that specify the meaning of the entire nominal phrase (e.g. laptop računar, tablet računar, iPad računar). Consequently, these words are not always contextually interchangeable. Additionally, we propose that computational resources should be utilized when addressing various practical and theoretical linguistic problems.
