About Profit Sharing Considering Infatuate Actions
In reinforcement learning systems based on trial-and error, the agent, that is the subject or the system that perceives its environment and takes actions which maximize its chances of success, is rewarded when it attains the target level of learning of the learning exercise. In Profit Sharing, the reinforcement learning process is pursued for the accumulation of such rewards. In order to continue the process of reward accumulation, the agent insists upon the repetition of the particular actions that are being learned and avoids selecting other actions, making the agent less adaptable to changes in the environment. In view of the above, this paper attempts to propose the introduction of the concept of infatuation to eliminate the reluctance of the agent to adapt to new environments. If the agent is a living being, when a single particular reinforcement learning process is repeated, the stimulus the agent perceives in each of the processes gradually loses its intensity due to familiarization. However, if the agent encounters a set of rules that are different from those of the particular repeated learning process, then the agent reverts to the previous particular learning process, and the stimulus the agent receives after the said reversion recovers its intensity. The intention here is to apply the concept of assimilation infatuation to Profit Sharing, and to confirm its effects through experiments.
-  J.J. Grefenstette, “Credit assignment in rule discovery systems based on genetic algorithms,” Machine Learning, Vol.3, pp. 225-245, 1988.
-  W. Uemura, A. Ueno, and S. Tatsumi, “The exploitation reinforcement learning method on POMDPs,” Joint 2nd Int. Conf. on Soft Computing and Intelligent Systems, TUE-1-3, 2004.
-  S.D. Whitehead and D.H. Balland, “Active perception and reinforcement learning,” Proc. of the 7th Int. Conf. on Machine Learning, pp. 162-169, 1990.
-  K. Miyazaki, M. Yamamura, and S. Kobayashi, “A Theory of Profit Sharing in Reinforcement Learning,” J. of Japanese Society for Artificial Intelligence, Vol.9, No.4, pp. 580-587, 1994.
-  K. Miyazaki, S. Arai, and S. Kobayashi, “Learning Deterministic Policies in Parially Observable Markov Decision Processes,” J. of Japanese Society for Artificial Intelligence, Vol.14, No.1, pp. 148-156, 1999.
-  G.A. Rummery and M. Niranjan, “On-line Q-learning using connectionist systems,” Technical Report CUED/F-INFENG/TR 166 Engineering Department, Cambridge University, 1994.
-  C.J.C.H. Watkins and P. Dayan, “Technical note:Q-Learning,” Machine Learning, Vol.8, pp. 279-292, 1992.
-  S. Kato and H. Matsuo, “A theory of profit sharing in dynamic environment,” Proc. of PRICAI-2000, pp. 115-124, 2000.
-  S. Kato and H. Matsuo, “A Theory of Profit Sharing in Dynamic Environment,” The Trans. of the Institute of Electronics, Information and Communication Engineers, D-1, Vol.J84-D-I, No.7, pp. 1067-1075, 2001.