single-jc.php

JACIII Vol.30 No.3 pp. 674-688
(2026)

Research Paper:

Reinforced Optimization: An Objective-Based Optimization Algorithm for Reinforcement Learning

Ali Ridho Barakbah ORCID Icon, Irene Erlyn Wina Rachmawan, Ira Prasetyaningrum ORCID Icon, Yuliana Setiowati ORCID Icon, and Weny Mistarika Rahmawati ORCID Icon

Politeknik Elektronika Negeri Surabaya
Jl. Raya ITS, Sukolilo, Surabaya 60111, Indonesia

Corresponding author

Received:
July 16, 2025
Accepted:
November 30, 2025
Published:
May 20, 2026
Keywords:
reinforcement learning, optimization problem, objective-based optimization
Abstract

Reinforcement learning (RL) is a machine learning paradigm that enables systems to learn from experience within uncertain environments. However, RL is typically suited to goal-oriented problems rather than objective-driven tasks. To address this limitation, this paper proposes a new optimization algorithm, called reinforced optimization (RO). This algorithm leverages an RL framework and reformulates it from a goal-based setting into an objective-based formulation, enabling its application to optimization problems. The algorithm preserves essential RL elements such as rewards and penalties, as well as exploration and exploitation mechanisms. It utilizes sequences of rewards to represent problem states and updates these states through directional steps. The effectiveness of each action with respect to the objective function is evaluated: successful actions lead to increased rewards, while unsuccessful actions result in decreased rewards. This iterative process continues for a predefined number of iterations, aiming to achieve convergence towards a global optimum in optimization tasks. To evaluate the RO algorithm, experiments were conducted on three objective-based problems: centroid optimization, traveling salesman problem, and vehicle routing problem. The algorithm’s performance was compared to other optimization methods: genetic algorithm, ant colony optimization, simulated annealing, and particle swarm optimization. The results indicate that the RO algorithm exhibits behavior analogous to gradient descent during convergence and demonstrates a favorable trade-off between accuracy and execution time, making it a viable alternative for both continuous and combinatorial optimization problems.

Reinforced optimization

Reinforced optimization

Cite this article as:
A. Barakbah, I. Rachmawan, I. Prasetyaningrum, Y. Setiowati, and W. Rahmawati, “Reinforced Optimization: An Objective-Based Optimization Algorithm for Reinforcement Learning,” J. Adv. Comput. Intell. Intell. Inform., Vol.30 No.3, pp. 674-688, 2026.
Data files:
References
  1. [1] B. Bakker and J. Schmidhuber, “Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization,” Proc. of the 8th Conf. on Intelligent Autonomous Systems, pp. 438-445, 2004.
  2. [2] K. Matsuda et al., “Hierarchical reward model of deep reinforcement learning for enhancing cooperative behavior in automated driving,” J. Adv. Comput. Intell. Intell. Inform., Vol.28, No.2, pp. 431-443, 2024. https://doi.org/10.20965/jaciii.2024.p0431
  3. [3] M. Tan, “Multi-agent reinforcement learning: Independent versus cooperative agents,” Proc. of the 10th Int. Conf. on Machine Learning, pp. 330-337, 1993.
  4. [4] K. Miyazaki and K. Takadama, “Special issue on cutting edge of reinforcement learning and its hybrid methods,” J. Adv. Comput. Intell. Intell. Inform., Vol.28, No.2, p. 379, 2024. https://doi.org/10.20965/jaciii.2024.p0379
  5. [5] R. S. Sutton and A. G. Barto, “Reinforcement learning: An introduction,” The MIT Press, 1998.
  6. [6] C. I. Mary and S. V. K. Raja, “Refinement of clusters from k-means with ant colony optimization,” J. of Theoretical and Applied Information Technology, Vol.10, No.1, pp. 28-32, 2009.
  7. [7] C. J. Watkins and P. Dayan, “Reinforcement learning,” Encyclopedia of Cognitive Science, Nature Publishing Group, 2003.
  8. [8] A. Gosavi, “A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis,” Machine Learning, Vol.55, No.1, pp. 5-29, 2004. https://doi.org/10.1023/B:MACH.0000019802.64038.6c
  9. [9] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. of Artificial Intelligence Research, Vol.4, No.1, pp. 237-285, 1996. https://doi.org/10.1613/jair.301
  10. [10] R. A. C. Bianchi, R. Ros, and R. L. de Mantaras, “Improving reinforcement learning by using case based heuristics,” Proc. of the 8th Int. Conf. on Case-Based Reasoning, pp. 75-89, 2009. https://doi.org/10.1007/978-3-642-02998-1_7
  11. [11] C. Szepesvári, “Algorithms for reinforcement learning,” Springer, 2010. https://doi.org/10.1007/978-3-031-01551-9
  12. [12] D. Kitakoshi, H. Shioya, and M. Kurihara, “Analysis of a method improving reinforcement learning agents’ policies,” J. Adv. Comput. Intell. Intell. Inform., Vol.7, No.3, pp. 276-282, 2003. https://doi.org/10.20965/jaciii.2003.p0276
  13. [13] J. Strahl, T. Honkela, and P. Wagner, “A Gaussian process reinforcement learning algorithm with adaptability and minimal tuning requirements,” Proc. of the 24th Int. Conf. on Artificial Neural Networks, pp. 371-378, 2014. https://doi.org/10.1007/978-3-319-11179-7_47
  14. [14] M. D. Vose, “The simple genetic algorithm: Foundations and theory,” The MIT Press, 1999. https://doi.org/10.7551/mitpress/6229.001.0001
  15. [15] S. Ding et al., “Improved genetic algorithm for train platform rescheduling under train arrival delays,” J. Adv. Comput. Intell. Intell. Inform., Vol.27, No.5, pp. 959-966, 2023. https://doi.org/10.20965/jaciii.2023.p0959
  16. [16] K. Ono and Y. Hanada, “Self-organized subpopulation based on multiple features in genetic programming on GPU,” J. Adv. Comput. Intell. Intell. Inform., Vol.25, No.2, pp. 177-186, 2021. https://doi.org/10.20965/jaciii.2021.p0177
  17. [17] K.-S. Hung, S.-F. Su, and Z.-J. Lee, “Improving ant colony optimization algorithms for solving traveling salesman problems,” J. Adv. Comput. Intell. Intell. Inform., Vol.11, No.4, pp. 433-442, 2007. https://doi.org/10.20965/jaciii.2007.p0433
  18. [18] M. Dorigo and T. Stützle, “Ant colony optimization,” Bradford Books, 2004.
  19. [19] C. Ratanavilisagul, “Modified ant colony optimization with route elimination and pheromone reset for multiple pickup and multiple delivery vehicle routing problem with time window,” J. Adv. Comput. Intell. Intell. Inform., Vol.26, No.6, pp. 959-964, 2022. https://doi.org/10.20965/jaciii.2022.p0959
  20. [20] J. Zheng and X. Gan, “A hybrid algorithm based on multi-agent and simulated annealing,” Proc. of the 2011 Int. Conf. on Future Wireless Networks and Information Systems, Vol.1, pp. 343-351, 2012. https://doi.org/10.1007/978-3-642-27323-0_44
  21. [21] J. Liu, X. Gao, and X. Chen, “Feasibility analysis of optimization models for natural gas distribution networks using machine learning,” J. Adv. Comput. Intell. Intell. Inform., Vol.29, No.3, pp. 614-622, 2025. https://doi.org/10.20965/jaciii.2025.p0614
  22. [22] H. Takano, J. Murata, Y. Maki, and M. Yasuda, “Improving the search ability of tabu search in the distribution network reconfiguration problem,” J. Adv. Comput. Intell. Intell. Inform., Vol.17, No.5, pp. 681-689, 2013. https://doi.org/10.20965/jaciii.2013.p0681
  23. [23] I. C. Trelea, “The particle swarm optimization algorithm: Convergence analysis and parameter selection,” Information Processing Letters, Vol.85, No.6, pp. 317-325, 2003. https://doi.org/10.1016/S0020-0190(02)00447-7
  24. [24] W. S. Chai, M. I. F. bin Romli, S. B. Yaakob, L. H. Fang, and M. Z. Aihsan, “Regenerative braking optimization using particle swarm algorithm for electric vehicle,” J. Adv. Comput. Intell. Intell. Inform., Vol.26, No.6, pp. 1022-1030, 2022. https://doi.org/10.20965/jaciii.2022.p1022
  25. [25] T. Matsui et al., “Particle swarm optimization for jump height maximization of a serial link robot,” J. Adv. Comput. Intell. Intell. Inform., Vol.11, No.8, pp. 956-963, 2007. https://doi.org/10.20965/jaciii.2007.p0956
  26. [26] M. Alzakan et al., “Enhancing k-means clustering results with gradient boosting: A post-processing approach,” Int. J. of Advanced Computer Science and Applications, Vol.15, No.2, pp. 913-921, 2024. https://doi.org/10.14569/IJACSA.2024.0150292
  27. [27] Y. M. Cheung, “ mathrm{k}^{*} k*-Means: A new generalized k k-means clustering algorithm,” Pattern Recognition Letters, Vol.24, No.15, pp. 2883-2893, 2003. https://doi.org/10.1016/S0167-8655(03)00146-6
  28. [28] S. S. Khan and A. Ahmad, “Cluster center initialization algorithm for K K-means clustering,” Pattern Recognition Letters, Vol.25, No.11, pp. 1293-1302, 2004. https://doi.org/10.1016/j.patrec.2004.04.007
  29. [29] A. R. Barakbah and Y. Kiyoki, “A pillar algorithm for K-means optimization by distance maximization for initial centroid designation,” 2009 IEEE Symp. on Computational Intelligence and Data Mining, pp. 61-68, 2009. https://doi.org/10.1109/CIDM.2009.4938630
  30. [30] R. Matai, S. Singh, and M. L. Mittal, “Traveling salesman problem: An overview of applications, formulations, and solution approaches,” D. Davendra (Ed.), “Traveling Salesman Problem, Theory and Applications,” pp. 1-24, IntechOpen, 2010. https://doi.org/10.5772/12909
  31. [31] D. S. Johnson, “Local optimization and the traveling salesman problem,” Proc. of the 17th Int. Colloquium on Automata, Languages and Programming, pp. 446-461, 1990. https://doi.org/10.1007/BFb0032050
  32. [32] C. Chang, J. Cao, N. Weng, and G. Lv, “Ant colony optimization parameters control based on evolutionary strength,” IEEE 31st Int. Conf. on Tools with Artificial Intelligence, pp. 448-455, 2019. https://doi.org/10.1109/ICTAI.2019.00069
  33. [33] M. Clauß, M. Bernt, and M. Middendorf, “A common interval guided ACO algorithm for permutation problems,” 2013 IEEE Symp. on Swarm Intelligence, pp. 64-71, 2013. https://doi.org/10.1109/SIS.2013.6615160
  34. [34] G. Laporte, “The vehicle routing problem: An overview of exact and approximate algorithms,” European J. of Operational Research, Vol.59, No.3, pp. 345-358, 1992. https://doi.org/10.1016/0377-2217(92)90192-C

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on May. 20, 2026