Transformer model using dependency tree for paraphrase identification

Authors

DOI:

https://doi.org/10.17721/1812-5409.2024/1.28

Keywords:

natural language processing, paraphrase identification, machine learning, dependency tree, Transformer architecture

Abstract

Models to represent the semantics of natural language words, sentences, and texts are key in computational linguistics and artificial intelligence. Using quality vector representations of words has revolutionized approaches to natural language processing and analysis since words are the foundation of language. The study of vector representations of sentences is also critical because they aim to capture the semantics and meanings of sentences. Improving these representations helps understand the text at a deeper level and solve various tasks. The article is devoted to solving the problem of identifying paraphrases using models based on the Transformer architecture. These models have demonstrated high efficiency in various tasks. It was investigated that their accuracy can be improved by enriching the model with additional information. Using syntactic information such as part-of-speech tags or linguistic structures can improve the model's understanding of context and sentence structure. Enriching the model this way allows you to gain a broader context and improve adaptability and performance in different natural language processing tasks, making it more versatile for different applications. As a result, a model based on the Transformer architecture using a dependency tree was proposed. Its effectiveness compared to other models of the same architecture was investigated using the task of identifying paraphrases. Improvements in accuracy and completeness over the original model (DeBERTa) were demonstrated. In the future, it is advisable to study the use of this model for other applied tasks (such as plagiarism checking and determining the author's style) and in evaluating other graph structures for sentence representation (for example, AMR graph).

Pages of the article in the issue: 154 - 159

Language of the article: Ukrainian

References

Anisimov, A.V., Marchenko, O.O. & Vozniuk, T.G. (2014) Determining Semantic Valences of Ontology Concepts by Means of Nonnegative Factorization of Tensors of Large Text Corpora. Cybern Syst Anal 50, 327–337 https://doi.org/10.1007/s10559-014-9621-9

Bai, J., Wang, Y., Chen, Y., Yang, Y., Bai, J., Yu, J., & Tong, Y. (2021). Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 3011–3020). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.eacl-main.262

Bisong, E. (2019). Building Machine Learning and Deep Learning Models on Google Cloud Platform (1st ed.). Apress Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-8

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805

Dolan, B., Quirk, C., & Brockett, C. (2004). Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics (pp. 350–356). COLING.

Fellbaum, C. (1998). WordNet: An electronic lexical database. MIT Press. https://doi.org/10.7551/mitpress/7287.001.0001

google-bert/bert-base-cased · Hugging Face. (n.d.). https://huggingface.co/google-bert/bert-base-cased

He, P., Liu, X., Gao, J., & Chen, W. (2021). DeBERTa: Decoding-enhanced BERT with Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654

Jurafsky, D., & Martin, J. (2023). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. (Vol. 3) Retrieved January 1, 2024, from https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. https://doi.org/10.48550/arXiv.1909.11942

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284. https://doi.org/10.1080/01638539809545028

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692

Marchenko, O.O. (2016) A Method for Automatic Construction of Ontological Knowledge Bases. I. Development of a Semantic-Syntactic Model of Natural Language. Cybern Syst Anal 52, 20–29. https://doi.org/10.1007/s10559-016-9795-4

Marneffe, M.C., Manning, C., Nivre, J., & Zeman, D. (2021). Universal Dependencies. Computational Linguistics - Association for Computational Linguistics,, 47(2), 255–308. https://doi.org/10.1162/coli_a_00402

Mikolov, T., Chen K., Corrado G., & Dean J. (2013). Efficient Estimation of Word Representations in Vector Space. https://doi.org/10.48550/arXiv.1301.3781

SPACY · Industrial-strength Natural language processing in Python. (n.d.). https://spacy.io/

Downloads

Published

2024-09-12

How to Cite

Vrublevskyi, V. (2024). Transformer model using dependency tree for paraphrase identification. Bulletin of Taras Shevchenko National University of Kyiv. Physical and Mathematical Sciences, 78(1), 154–159. https://doi.org/10.17721/1812-5409.2024/1.28

Issue

Section

Computer Science and Informatics