Transient and asymptotic dynamics of reinforcement learning in games期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Transient and asymptotic dynamics of reinforcement learning in games

Affiliation:	1. School of Automation, Northwestern Polytechnical University, Xi''an 710072, China;2. Wenzhou Vocational&Technical College, Wenzhou 325035, China;3. School of economics and management, Chang''an University, Xi''an 710064, China;4. School of Computer Science, Northwestern Polytechnical University, Xi''an 710072, China;1. State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China;2. Department of Chemistry and Physics, State University of New York at Stony Brook, Stony Brook, NY, USA

Abstract:	Reinforcement learners tend to repeat actions that led to satisfactory outcomes in the past, and avoid choices that resulted in unsatisfactory experiences. This behavior is one of the most widespread adaptation mechanisms in nature. In this paper we fully characterize the dynamics of one of the best known stochastic models of reinforcement learning [Bush, R., Mosteller, F., 1955. Stochastic Models of Learning. Wiley & Sons, New York] for 2-player 2-strategy games. We also provide some extensions for more general games and for a wider class of learning algorithms. Specifically, it is shown that the transient dynamics of Bush and Mosteller's model can be substantially different from its asymptotic behavior. It is also demonstrated that in general—and in sharp contrast to other reinforcement learning models in the literature—the asymptotic dynamics of Bush and Mosteller's model cannot be approximated using the continuous time limit version of its expected motion.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏