JACIII Vol.27 No.3 pp. 481-489
doi: 10.20965/jaciii.2023.p0481

Research Paper:

Toward Question-Answering with Multi-Hop Reasoning and Calculation over Knowledge Using a Neural Network Model with External Memories

Yuri Murayama ORCID Icon and Ichiro Kobayashi ORCID Icon

Ochanomizu University
2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan

October 25, 2022
February 7, 2023
May 20, 2023
question-answering, knowledge base, multi-hop reasoning, calculation, differentiable neural computer

The differentiable neural computer (DNC) is a neural network model with an addressable external memory that can solve algorithmic and question-answering tasks. Improved versions of the DNC have been proposed, including the robust and scalable DNC (rsDNC) and DNC-deallocation-masking-sharpness (DNC-DMS). However, integrating structured knowledge and calculations into these DNC models remains a challenging research question. In this study, we incorporate an architecture for knowledge and calculations into the DNC, rsDNC, and DNC-DMS to improve their abilities to generate correct answers for questions with multi-hop reasoning and provide calculations over structured knowledge. Our improved rsDNC model achieves the best performance for the mean top-1 accuracy, and our improved DNC-DMS model scores the highest for the top-10 accuracy in the GEO dataset. In addition, our improved rsDNC model outperforms other models in regards to the mean top-1 accuracy and mean top-10 accuracy in the augmented GEO dataset.

Overview of our proposed model

Overview of our proposed model

Cite this article as:
Y. Murayama and I. Kobayashi, “Toward Question-Answering with Multi-Hop Reasoning and Calculation over Knowledge Using a Neural Network Model with External Memories,” J. Adv. Comput. Intell. Intell. Inform., Vol.27 No.3, pp. 481-489, 2023.
Data files:
  1. [1] H. T. Siegelmann and E. D. Sontag, “On the Computational Power of Neural Nets,” J. Comput. Syst. Sci., Vol.50, No.1, pp. 132-150, 1995.
  2. [2] Y. Bengio, P. Simard, and P. Frasconi, “Learning Long-Term Dependencies with Gradient Descent is Difficult,” Trans. Neur. Netw., Vol.5, No.2, pp. 157-166, 1994.
  3. [3] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, Vol.9, No.8, pp. 1735-1780, 1997.
  4. [4] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” arXiv:1409.3215, 2014.
  5. [5] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv:1409.0473, 2014.
  6. [6] T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-Based Neural Machine Translation,” Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing, pp. 1412-1421, 2015.
  7. [7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All You Need,” Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 5998-6008, 2017.
  8. [8] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov, “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context,” Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978-2988, 2019.
  9. [9] J. W. Rae, A. Potapenko, S. M. Jayakumar, and T. P. Lillicrap, “Compressive Transformers for Long-Range Sequence Modelling,” arXiv:1911.05507, 2019.
  10. [10] P. H. Martins, Z. Marinho, and A. Martins, “∞-former: Infinite Memory Transformer,” Proc. of the 60th Annual Meeting of the Association for Computational Linguistics, Vol.1 (Long Papers), pp. 5468-5485, 2022.
  11. [11] J. v. Neumann, “First Draft of a Report on the EDVAC,” Technical Report, 1945.
  12. [12] A. Graves, G. Wayne, and I. Danihelka. “Neural Turing Machines,” arxiv:1410.5401, 2014.
  13. [13] A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwińska, S. G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, A. P. Badia, K. M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, and D. Hassabis, “Hybrid computing using a neural network with dynamic external memory,” Nature, Vol.538, pp. 471-476, 2016.
  14. [14] J. Franke, J. Niehues, and A. Waibel, “Robust and Scalable Differentiable Neural Computer for Question Answering,” Proc. of the Workshop on Machine Reading for Question Answering, pp. 47-59, 2018.
  15. [15] R. Csordás and J. Schmidhuber, “Improving Differentiable Neural Computers Through Memory Masking, De-Allocation, and Link Distribution Sharpness Control,” arXiv:1904.10278, 2019.
  16. [16] J. M. Zelle and R. J. Mooney, “Learning to Parse Database Queries Using Inductive Logic Programming,” Proc. of the 13th National Conf. on Artificial Intelligence (AAAI’96), Vol.2, pp. 1050-1055, 1996.
  17. [17] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, Vol.4, pp. 26-31, 2012.
  18. [18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding,” Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.1 (Long and Short Papers), pp. 4171-4186, 2019.
  19. [19] M. Geva, A. Gupta, and J. Berant, “Injecting Numerical Reasoning Skills into Language Models,” Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 946-958, 2020.
  20. [20] M. Stone, “Cross-validatory choice and assessment of statistical predictions,” J. Roy. Stat. Soc. Series B (Methodological), Vol.36, No.2, pp. 111-147, 1974.
  21. [21] P. Pasupat and P. Liang, “Compositional Semantic Parsing on Semi-Structured Tables,” Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. on Natural Language Processing, Vol.1 (Long Papers), pp. 1470-1480, 2015.
  22. [22] J. W. Rae, J. J. Hunt, T. Harley, I. Danihelka, A. W. Senior, G. Wayne, A. Graves, and T. P. Lillicrap, “Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes,” arXiv:1610.09027, 2016.
  23. [23] I. Ben-Ari and A. J. Bekker, “Differentiable Memory Allocation Mechanism for Neural Computing.” [Accessed April 24, 2023]
  24. [24] M. S. Rasekh and F. Safi-Esfahani, “EDNC: Evolving Differentiable Neural Computers,” Neurocomputing, Vol.412, pp. 514-542, 2020.
  25. [25] S. Seo, D. Lee, and J.-H. Kim, “Shallow Convolution-Augmented Transformer with Differentiable Neural Computer for Low-Complexity Classification of Variable-Length Acoustic Scene,” Proc. Interspeech 2021, pp. 576-580, 2021.
  26. [26] Y. Tao and Z. Zhang, “HiMA: A Fast and Scalable History-Based Memory Access Engine for Differentiable Neural Computer,” 54th Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO ’21), pp. 845-856, 2021.
  27. [27] D. Lee, H. Park, S. Seo, H. Son, G. Kim, and J.-H. Kim, “Robustness of Differentiable Neural Computer Using Limited Retention Vector-Based Memory Deallocation in Language Model,” KSII Trans. on Internet and Information Systems, Vol.15, No.3, pp. 837-852, 2021.
  28. [28] A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, and R. Socher, “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing,” Proc. of the 33rd Int. Conf. on Machine Learning (PMLR), Vol.48, pp. 1378-1387, 2016.
  29. [29] C. Xiong, S. Merity, and R. Socher, “Dynamic Memory Networks for Visual and Textual Question Answering,” Proc. of the 33rd Int. Conf. on Machine Learning (PMLR), Vol.48, pp. 2397-2406, 2016.
  30. [30] S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus, “End-to-End Memory Networks,” Advances in Neural Information Processing Systems 28 (NIPS 2015), pp. 2440-2448, 2015.
  31. [31] J. Moon, H. Yang, and S. Cho, “Finding ReMO (Related Memory Object): A Simple Neural Architecture for Text Based Reasoning,” arXiv:1801.08459, 2018.
  32. [32] C. Akita, M. Mase, and Y. Kitamura, “Natural Language Questions and Answers for RDF Information Resources,” J. Adv. Comput. Intell. Intell. Inform., Vol.14, No.4, pp. 384-389, 2010.
  33. [33] A. Miller, A. Fisch, J. Dodge, A.-H. Karimi, A. Bordes, and J. Weston, “Key-Value Memory Networks for Directly Reading Documents,” Proc. of the 2016 Conf. on Empirical Methods in Natural Language Processing, pp. 1400-1409, 2016.
  34. [34] A. Saha, V. Pahuja, M. Khapra, K. Sankaranarayanan, and S. Chandar, “Complex Sequential Question Answering: Towards Learning to Converse over Linked Question Answer Pairs with a Knowledge Graph,” Proc. of the AAAI Conf. on Artificial Intelligence, Vol.32, 2018.
  35. [35] I. V. Serban, A. Sordoni, Y. Bengio, A. C. Courville, and J. Pineau, “Hierarchical Neural Network Generative Models for Movie Dialogues,” arXiv:1507.04808v1, 2015.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Jul. 19, 2024