Adaptive Nutrient Water Supply Control of Plant Factory System by Reinforcement Learning
Takumi Wakahara* and Sadayoshi Mikami**
*Graduate School, Future University Hakodate, 116-2 Kamedanakano-cho Hakodate, Hokkaido 041-8655, Japan
**Department of Complex and Intelligent Systems, Future University Hakodate, 116-2 Kamedanakano-cho Hakodate, Hokkaido 041-8655, Japan
An adaptive nutrient control method for a plant factory is proposed. The method is based on a Reinforcement Learning (RL) modified for a target in which the same state never comes back during a single episode and a reward is given after a very long delay. In application such as plant growth control, one episode takes a very long time period, and a rapid convergence to a prospective control solution is essential while an extensive exploration is needed since there is usually no precise model available. A method like reinforcement learning is useful for a problem having no reference model. But a necessity of exploration does not match the need for rapid convergence, and a new balancing method is needed. In this research, an average reward distribution method is proposed, which is similar to the profit sharing method but effects more extensively on finding much prospective early solutions, while guaranteeing to converge into a rational solution in a long run. An experiment is conducted in a simple plant factory system, which shows that at least standard reinforcement learning is insufficient for this type of problem. Computer simulations show that the method has good effects on acquiring prospective control policy at early stage comparing to a standard reinforcement learning and a profit sharing method.
-  M. Takatuji, “Theory of Plant Factory,” SHITA TECHNOLOGY, No.1, 1993. (in Japanese)
-  M. Takatuji, “Basic and Practice of Plant Factory,” Eikoubou, 1996. (in Japanese)
-  M. Takatuji, “Present and Future of Complete Controlled Type Plant Factory,” SHITA REPORT, No.23, 2006. (in Japanese)
-  R. S. Sutton and A. G. Barto, “Reinforcement Learning,” The MIT Press, 1999.
-  S. Singh and D. Bertsekas, “Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems,” Advances in Neural Information Processing Systems: Proc. of the 1996 Conf., pp. 974-980, 1997.
-  J. A. Boyan and M. L. Littman, “Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach,” Advances in Neural Information Processing Systems, Vol.6, pp. 671-678, 1994.
-  P. L. Kaelbling, “Learning in Embedded Systems,” The MIT Press, 2010.
-  H. Aoki, N. Umetu, and S. Ono, “Theory and Practice of Hydroponic Soil Cultivation,” Seibundo Shinkosha, 2001. (in Japanese)
-  A. J. Lack and D. E. Evans, “Instant Notes in Plant Biology,” BIOS Scientific Publishers Limited, 2001.
-  K. Toshimi, “Kewpie TS Farm,” SHITA REPORT, No.6, 1993. (in Japanese)
-  “Eco saku,
” http://www.ecosaku-yasai.com/index.html, (access at 29/03/2010)
-  W. Larcher, “Physiological Plant Ecology,” Springer, 2003.
-  L. Baird, “Residual Algorithms: Reinforcement Learning with Function Approximation,” Proc. of the Twelfth Int. Conf. on Machine Learning, pp. 30-37, 1995.
-  C. J. C. H. Watkins, “Learning from Delay Rewards,” Ph.D. thesis, Cambridge University, 1989.
-  K. Miyazaki and S. Kobayashi, “Profit Sharing Based Reinforcement Learning Systems in Continuous State Spaces,” SCIS & ISIS 2006, pp. 1105-1110, 2006.
-  K. Miyazaki, T. Terada, and H. Kobayashi, “Generating Cooperative Behavior by Multi Agent Profit Sharing on the Soccer Game,” ISIS 2003, pp. 116-169, 2003.
-  Y. Ueda, H. Narita, N. Kato, K. Hayashi, H. Nambo, and H. Kimura, “An Automatic Email Distribution by Using Text Mining and Reinforcement Learning,” Denshi Joho Tsushin Gakkai Ronbunshi, Vol.J87-D-I, No.10, pp. 887-898, 2004.
-  S. Kato and H. Matsuo, “A Theory of Profit Sharing in Dynamic Environment,” Lecture Notes in Computer Science, Vol.1886, pp. 115-124, 2000.