Opposition-Based Reinforcement Learning

Hamid R. Tizhoosh

doi:10.20965/jaciii.2006.p0578

single-jc.php

« previous

JACIII Vol.10 No.4 pp. 578-585

(2006)

doi: 10.20965/jaciii.2006.p0578

Paper:

Views over last 60 days: 1,826

Opposition-Based Reinforcement Learning

Hamid R. Tizhoosh

Pattern Analysis and Machine Intelligence Laboratory, Systems Design Engineering, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1

Received:

September 29, 2005

Accepted:

November 22, 2005

Published:

July 20, 2006

Keywords:

reinforcement learning, Q-learning, opposite action, opposite state

Abstract

Reinforcement learning is a machine intelligence scheme for learning in highly dynamic, probabilistic environments. By interaction with the environment, reinforcement agents learn optimal control policies, especially in the absence of a priori knowledge and/or a sufficiently large amount of training data. Despite its advantages, however, reinforcement learning suffers from a major drawback - high calculation cost because convergence to an optimal solution usually requires that all states be visited frequently to ensure that policy is reliable. This is not always possible, however, due to the complex, high-dimensional state space in many applications. This paper introduces opposition-based reinforcement learning, inspired by opposition-based learning, to speed up convergence. Considering opposite actions simultaneously enables individual states to be updated more than once shortening exploration and expediting convergence. Three versions of Q-learning algorithm will be given as examples. Experimental results for the grid world problem of different sizes demonstrate the superior performance of the proposed approach.

Cite this article as:

H. Tizhoosh, “Opposition-Based Reinforcement Learning,” J. Adv. Comput. Intell. Intell. Inform., Vol.10 No.4, pp. 578-585, 2006.

Data files:

References

[1] A. G. Barto, R. S. Sutton, and P. S. Brouwer, “Associative search network: A reinforcement learning associative memory,” Biological Cybernetics, Vol.40, No.3, pp. 201-211, May, 1981.
[2] A. W. Beggs, “On the convergence of reinforcement learning,” Journal of Economic Theory, Vol.122, Issue 1, pp. 1-36, May, 2005.
[3] K. Driessens, J. Ramon, and H. Blockeel, “Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner,” Proc. 12th European Conference on Machine Learning, Freiburg, Germany, September, 2001.
[4] C. Drummond, “Composing functions to speed up reinforcement learning in a changing world,” Proc. 10th European Conference on Machine Learning, Springer-Verlag, 1998.
[5] S. Dzeroski, L. De Raedt, and K. Driessens, “Relational Reinforcement Learning,” Machine Learning Vol.43, Issue 1-2, pp. 7-52, April-May, 2001.
[6] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” Journal of Artificial Intelligence Research, Vol.4, 1996.
[7] C. H. C. Ribeiro, “Embedding a Priori Knowledge in Reinforcement Learning,” Journal of Intelligent and Robotic Systems 21, pp. 51-71, 1998.
[8] S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms,” Machine Learning, Vol.38, Issue 3, pp. 287-308, March, 2000.
[9] R. S. Sutton, “Temporal Credit Assignment in Reinforcement Learning,” PhD thesis, University of Massachusetts, Amherst, MA, 1984.
[10] R. S. Sutton, and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.
[11] H. R. Tizhoosh, “Opposition-based learning: A new scheme for machine intelligence,” International Conference on Computational Intelligence for Modelling Control and Automation CIMCA’05, Vienna, Austria, 2005, Vol.I, pp. 695-701.
[12] H. R. Tizhoosh, “Reinforcement learning based on actions and opposite actions,” ICGST International Conference on Artificial Intelligence and Machine Learning AIML-05, Cairo, Egypt, 2005.
[13] C. J. C. H. Watkins, “Learning from Delayed Rewards,” PhD thesis, Cambridge University, Cambridge, England, 1989.
[14] C. J. C. H. Watkins, and P. Dayan, “Q-learning,” Machine Learning, 8, pp. 279-292, 1992.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] A. G. Barto, R. S. Sutton, and P. S. Brouwer, “Associative search network: A reinforcement learning associative memory,” Biological Cybernetics, Vol.40, No.3, pp. 201-211, May, 1981.

[B2] [2] A. W. Beggs, “On the convergence of reinforcement learning,” Journal of Economic Theory, Vol.122, Issue 1, pp. 1-36, May, 2005.

[B3] [3] K. Driessens, J. Ramon, and H. Blockeel, “Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner,” Proc. 12th European Conference on Machine Learning, Freiburg, Germany, September, 2001.

[B4] [4] C. Drummond, “Composing functions to speed up reinforcement learning in a changing world,” Proc. 10th European Conference on Machine Learning, Springer-Verlag, 1998.

[B5] [5] S. Dzeroski, L. De Raedt, and K. Driessens, “Relational Reinforcement Learning,” Machine Learning Vol.43, Issue 1-2, pp. 7-52, April-May, 2001.

[B6] [6] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” Journal of Artificial Intelligence Research, Vol.4, 1996.

[B7] [7] C. H. C. Ribeiro, “Embedding a Priori Knowledge in Reinforcement Learning,” Journal of Intelligent and Robotic Systems 21, pp. 51-71, 1998.

[B8] [8] S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms,” Machine Learning, Vol.38, Issue 3, pp. 287-308, March, 2000.

[B9] [9] R. S. Sutton, “Temporal Credit Assignment in Reinforcement Learning,” PhD thesis, University of Massachusetts, Amherst, MA, 1984.

[B10] [10] R. S. Sutton, and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998.

[B11] [11] H. R. Tizhoosh, “Opposition-based learning: A new scheme for machine intelligence,” International Conference on Computational Intelligence for Modelling Control and Automation CIMCA’05, Vienna, Austria, 2005, Vol.I, pp. 695-701.

[B12] [12] H. R. Tizhoosh, “Reinforcement learning based on actions and opposite actions,” ICGST International Conference on Artificial Intelligence and Machine Learning AIML-05, Cairo, Egypt, 2005.

[B13] [13] C. J. C. H. Watkins, “Learning from Delayed Rewards,” PhD thesis, Cambridge University, Cambridge, England, 1989.

[B14] [14] C. J. C. H. Watkins, and P. Dayan, “Q-learning,” Machine Learning, 8, pp. 279-292, 1992.