
JACIII Vol.21 No.4 pp. 639-649
doi: 10.20965/jaciii.2017.p0639


Learning Quadcopter Maneuvers with Concurrent Methods of Policy Optimization

Pei-Hua Huang and Osamu Hasegawa

Tokyo Institute of Technology
J3-13, 4259 Nagatsuta, Midori-ku, Yokohama 226-8502, Japan

October 29, 2016
May 15, 2017
July 20, 2017
deep reinforcement learning, asynchronous update, policy optimization, aerial robotics, autonomous control

This study presents an aerial robotic application of deep reinforcement learning that imparts an asynchronous learning framework and trust region policy optimization to a simulated quad-rotor helicopter (quadcopter) environment. In particular, we optimized a control policy asynchronously through interaction with concurrent instances of the environment. The control system was benchmarked and extended with examples to tackle continuous state-action tasks for the quadcoptor: hovering control and balancing an inverted pole. Performing these maneuvers required continuous actions for sensitive control of small acceleration changes of the quadcoptor, thereby maximizing the scalar reward of the defined tasks. The simulation results demonstrated an enhancement of the learning speed and reliability for the tasks.

Cite this article as:
P. Huang and O. Hasegawa, “Learning Quadcopter Maneuvers with Concurrent Methods of Policy Optimization,” J. Adv. Comput. Intell. Intell. Inform., Vol.21 No.4, pp. 639-649, 2017.
Data files:
