single-jc.php

JACIII Vol.27 No.3 pp. 511-521
doi: 10.20965/jaciii.2023.p0511
(2023)

Research Paper:

Effectiveness of Pre-Trained Language Models for the Japanese Winograd Schema Challenge

Keigo Takahashi ORCID Icon, Teruaki Oka, and Mamoru Komachi ORCID Icon

Graduate School of System Design, Tokyo Metropolitan University (TMU)
6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan

Received:
September 13, 2022
Accepted:
February 13, 2023
Published:
May 20, 2023
Keywords:
natural language processing, Winograd schema challenge, reference resolution, Japanese
Abstract

This paper compares Japanese and multilingual language models (LMs) in a Japanese pronoun reference resolution task to determine the factors of LMs that contribute to Japanese pronoun resolution. Specifically, we tackle the Japanese Winograd schema challenge task (WSC task), which is a well-known pronoun reference resolution task. The Japanese WSC task requires inter-sentential analysis, which is more challenging to solve than intra-sentential analysis. A previous study evaluated pre-trained multilingual LMs in terms of training language on the target WSC task, including Japanese. However, the study did not perform pre-trained LM-wise evaluations, focusing on the training language-wise evaluations with a multilingual WSC task. Furthermore, it did not investigate the effectiveness of factors (e.g., model size, learning settings in the pre-training phase, or multilingualism) to improve the performance. In our study, we compare the performance of inter-sentential analysis on the Japanese WSC task for several pre-trained LMs, including multilingual ones. Our results confirm that XLM, a pre-trained LM on multiple languages, performs the best among all considered LMs, which we attribute to the amount of data in the pre-training phase.

Transition before (left) and after (right) fine-tuning XLM

Transition before (left) and after (right) fine-tuning XLM

Cite this article as:
K. Takahashi, T. Oka, and M. Komachi, “Effectiveness of Pre-Trained Language Models for the Japanese Winograd Schema Challenge,” J. Adv. Comput. Intell. Intell. Inform., Vol.27 No.3, pp. 511-521, 2023.
Data files:
References
  1. [1] R. Iida et al., “Intra-Sentential Subject Zero Anaphora Resolution Using Multi-Column Convolutional Neural Network,” Proc. of the 2016 Conf. on Empirical Methods in Natural Language Processing, pp. 1244-1254, 2016. http://doi.org/10.18653/v1/D16-1132
  2. [2] R. Sasano and S. Kurohashi, “A Discriminative Approach to Japanese Zero Anaphora Resolution with Large-Scale Lexicalized Case Frames,” Proc. of 5th Int. Joint Conf. on Natural Language Processing, pp. 758-766, 2011.
  3. [3] H. J. Levesque et al., “The Winograd Schema Challenge,” Proc. of the 13th Int. Conf. on Principles of Knowledge Representation and Reasoning, pp. 552-561, 2012.
  4. [4] J. Devlin et al., “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.1 (Long and Short Papers), pp. 4171-4186, 2019.
  5. [5] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv:1907.11692, 2020.
  6. [6] A. Radford et al., “Improving Language Understanding by Generative Pre-Training,” 2018.
  7. [7] A. Radford et al., “Language Models are Unsupervised Multitask Learners,” 2019.
  8. [8] C. Raffelet al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” J. of Machine Learning Research, Vol.21, No.1, pp. 5485-5551, 2020.
  9. [9] A. Conneau et al., “Unsupervised Cross-Lingual Representation Learning at Scale,” Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440-8451, 2019. http://doi.org/10.18653/v1/2020.acl-main.747
  10. [10] V. Kocijan et al., “A Surprisingly Robust Trick for the Winograd Schema Challenge,” Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4837-4842, 2019. http://doi.org/10.18653/v1/P19-1478
  11. [11] K. Sakaguchi et al., “WinoGrande: An Adversarial Winograd Schema Challenge at Scale,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.34, No.5, pp. 8732-8740, 2020. https://doi.org/10.1609/aaai.v34i05.6399
  12. [12] T. Klein and M. Nabi, “Contrastive Self-Supervised Learning for Commonsense Reasoning,” Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7517-7523, 2020. http://doi.org/10.18653/v1/2020.acl-main.671
  13. [13] T. Klein and M. Nabi, “Attention-Based Contrastive Learning for Winograd Schemas,” Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2428-2434, 2021. http://doi.org/10.18653/v1/2021.findings-emnlp.208
  14. [14] A. Emami et al., “The KnowRef Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution,” Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3952-3961, 2019. http://doi.org/10.18653/v1/P19-1386
  15. [15] P. Trichelair et al., “How Reasonable Are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG,” Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), pp. 3382-3387, 2019. http://doi.org/10.18653/v1/D19-1335
  16. [16] M. Abdou et al., “The Sensitivity of Language Models and Humans to Winograd Schema Perturbations,” Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7590-7604, 2020. http://doi.org/10.18653/v1/2020.acl-main.679
  17. [17] H. Zhang et al., “WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge,” Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5736-5745, 2020. http://doi.org/10.18653/v1/2020.acl-main.508
  18. [18] Z. Yang et al., “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” H. Wallach et al. (Eds.), Advances in Neural Information Processing Systems, Vol.32, Curran Associates, Inc., 2019.
  19. [19] A. Tikhonov and M. Ryabinin, “It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning,” Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 3534-3546, 2021. http://doi.org/10.18653/v1/2021.findings-acl.310
  20. [20] A. Rahman and V. Ng, “Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge,” Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 777-789, 2012.
  21. [21] A. Wang et al., “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding,” Proc. of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353-355, 2018. http://doi.org/10.18653/v1/W18-5446
  22. [22] A. Wang et al., “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems,” H. Wallach et al. (Eds.), Advances in Neural Information Processing Systems, Vol.32, Curran Associates, Inc., 2019.
  23. [23] T. Shibata et al., “Nihongo Winograd Schema Challenge no kouchiku to bunseki,” Proc. of NLP, pp. 493-496, 2015 (in Japanese).
  24. [24] P. Amsili and O. Seminck, “A Google-Proof Collection of French Winograd Schemas,” Proc. of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017), pp. 24-29, 2017. http://doi.org/10.18653/v1/W17-1504
  25. [25] T. Shavrina et al., “RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark,” Proc. of the 2020 Conf. on EMNLP, pp. 4717-4726, 2020. http://doi.org/10.18653/v1/2020.emnlp-main.381
  26. [26] G. Melo et al., “Winograd Schemas in Portuguese,” Anais do XVI Encontro Nacional de Inteligência Artificial e Computacional, pp. 787-798, 2019. https://doi.org/10.5753/eniac.2019.9334
  27. [27] T. Bernard and T. Han, “Mandarinograd: A Chinese Collection of Winograd Schemas,” Proc. of the 12th Language Resources and Evaluation Conf., pp. 21-26, 2020.
  28. [28] J. Opitz and A. Frank, “Addressing the Winograd Schema Challenge as a Sequence Ranking Task,” Proc. of the 1st Int. Workshop on Language Cognition and Computational Models, pp. 41-52, 2018.
  29. [29] T. H. Trinh and Q. V. Le, “A Simple Method for Commonsense Reasoning,” arXiv:1806.02847, 2019.
  30. [30] T. Nakamura and D. Kawahara, “JFCKB: Japanese Feature Change Knowledge Base,” Proc. of the 11th Int. Conf. on Language Resources and Evaluation (LREC 2018), 2018.
  31. [31] O. Vinyals et al., “Pointer Networks,” C. Cortes et al. (Eds.), Advances in Neural Information Processing Systems, Vol.28, Curran Associates, Inc., 2015.
  32. [32] C. Park et al., “Fast End-to-End Coreference Resolution for Korean,” Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2610-2624, 2020. http://doi.org/10.18653/v1/2020.findings-emnlp.237
  33. [33] J. Chung et al., “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv:1412.3555, 2014.
  34. [34] A. Vaswani et al., “Attention is All You Need,” I. Guyon et al. (Eds.), Advances in Neural Information Processing Systems, Vol.30, Curran Associates, Inc., 2017.
  35. [35] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38-45, 2020. http://doi.org/10.18653/v1/2020.emnlp-demos.6
  36. [36] T. Kudo, “MeCab : Yet Another Part-of-Speech and Morphological Analyzer.” https://taku910.github.io/mecab/ [Accessed January 23, 2018]
  37. [37] T. Sato et al., “Implementation of a word segmentation dictionary called MeCab-IPAdic-NEologd and study on how to use it effectively for information retrieval,” Proc. of the 23rd Annual Meeting of the Association for Natural Language Processing, pp. NLP2017-B6-1, 2017 (in Japanese).
  38. [38] T. Kudo and J. Richardson, “SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,” Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66-71, 2018. http://doi.org/10.18653/v1/D18-2012
  39. [39] J. Kaplan et al., “Scaling Laws for Neural Language Models,” arXiv:2001.08361, 2020.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Sep. 09, 2024