Takumi Wakahara and Sadayoshi Mikami
An adaptive nutrient control method for a plant factory is proposed. The method is based on a Reinforcement Learning (RL) modified for a target in which the same state never comes back during a single episode and a reward is given after a very long delay. In application such as plant growth control, one episode takes a very long time period, and a rapid convergence to a prospective control solution is essential while an extensive exploration is needed since there is usually no precise model available. A method like reinforcement learning is useful for a problem having no reference model. But a necessity of exploration does not match the need for rapid convergence, and a new balancing method is needed. In this research, an average reward distribution method is proposed, which is similar to the profit sharing method but effects more extensively on finding much prospective early solutions, while guaranteeing to converge into a rational solution in a long run. An experiment is conducted in a simple plant factory system, which shows that at least standard reinforcement learning is insufficient for this type of problem. Computer simulations show that the method has good effects on acquiring prospective control policy at early stage comparing to a standard reinforcement learning and a profit sharing method.
Keywords: reinforcement learning, intelligent control, plant growth, plant factory