Transformer model using dependency tree for paraphrase identification
DOI:
https://doi.org/10.17721/1812-5409.2024/1.28Keywords:
natural language processing, paraphrase identification, machine learning, dependency tree, Transformer architectureAbstract
Models to represent the semantics of natural language words, sentences, and texts are key in computational linguistics and artificial intelligence. Using quality vector representations of words has revolutionized approaches to natural language processing and analysis since words are the foundation of language. The study of vector representations of sentences is also critical because they aim to capture the semantics and meanings of sentences. Improving these representations helps understand the text at a deeper level and solve various tasks. The article is devoted to solving the problem of identifying paraphrases using models based on the Transformer architecture. These models have demonstrated high efficiency in various tasks. It was investigated that their accuracy can be improved by enriching the model with additional information. Using syntactic information such as part-of-speech tags or linguistic structures can improve the model's understanding of context and sentence structure. Enriching the model this way allows you to gain a broader context and improve adaptability and performance in different natural language processing tasks, making it more versatile for different applications. As a result, a model based on the Transformer architecture using a dependency tree was proposed. Its effectiveness compared to other models of the same architecture was investigated using the task of identifying paraphrases. Improvements in accuracy and completeness over the original model (DeBERTa) were demonstrated. In the future, it is advisable to study the use of this model for other applied tasks (such as plagiarism checking and determining the author's style) and in evaluating other graph structures for sentence representation (for example, AMR graph).
Pages of the article in the issue: 154 - 159
Language of the article: Ukrainian
References
Anisimov, A.V., Marchenko, O.O. & Vozniuk, T.G. (2014) Determining Semantic Valences of Ontology Concepts by Means of Nonnegative Factorization of Tensors of Large Text Corpora. Cybern Syst Anal 50, 327–337 https://doi.org/10.1007/s10559-014-9621-9
Bai, J., Wang, Y., Chen, Y., Yang, Y., Bai, J., Yu, J., & Tong, Y. (2021). Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 3011–3020). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.eacl-main.262
Bisong, E. (2019). Building Machine Learning and Deep Learning Models on Google Cloud Platform (1st ed.). Apress Berkeley, CA. https://doi.org/10.1007/978-1-4842-4470-8
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805
Dolan, B., Quirk, C., & Brockett, C. (2004). Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics (pp. 350–356). COLING.
Fellbaum, C. (1998). WordNet: An electronic lexical database. MIT Press. https://doi.org/10.7551/mitpress/7287.001.0001
google-bert/bert-base-cased · Hugging Face. (n.d.). https://huggingface.co/google-bert/bert-base-cased
He, P., Liu, X., Gao, J., & Chen, W. (2021). DeBERTa: Decoding-enhanced BERT with Disentangled Attention. https://doi.org/10.48550/arXiv.2006.03654
Jurafsky, D., & Martin, J. (2023). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. (Vol. 3) Retrieved January 1, 2024, from https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. https://doi.org/10.48550/arXiv.1909.11942
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284. https://doi.org/10.1080/01638539809545028
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692
Marchenko, O.O. (2016) A Method for Automatic Construction of Ontological Knowledge Bases. I. Development of a Semantic-Syntactic Model of Natural Language. Cybern Syst Anal 52, 20–29. https://doi.org/10.1007/s10559-016-9795-4
Marneffe, M.C., Manning, C., Nivre, J., & Zeman, D. (2021). Universal Dependencies. Computational Linguistics - Association for Computational Linguistics,, 47(2), 255–308. https://doi.org/10.1162/coli_a_00402
Mikolov, T., Chen K., Corrado G., & Dean J. (2013). Efficient Estimation of Word Representations in Vector Space. https://doi.org/10.48550/arXiv.1301.3781
SPACY · Industrial-strength Natural language processing in Python. (n.d.). https://spacy.io/
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Vitalii Vrublevskyi
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).