首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
Reinforcement learners tend to repeat actions that led to satisfactory outcomes in the past, and avoid choices that resulted in unsatisfactory experiences. This behavior is one of the most widespread adaptation mechanisms in nature. In this paper we fully characterize the dynamics of one of the best known stochastic models of reinforcement learning [Bush, R., Mosteller, F., 1955. Stochastic Models of Learning. Wiley & Sons, New York] for 2-player 2-strategy games. We also provide some extensions for more general games and for a wider class of learning algorithms. Specifically, it is shown that the transient dynamics of Bush and Mosteller's model can be substantially different from its asymptotic behavior. It is also demonstrated that in general—and in sharp contrast to other reinforcement learning models in the literature—the asymptotic dynamics of Bush and Mosteller's model cannot be approximated using the continuous time limit version of its expected motion.  相似文献   

2.
The paper explores the implications of melioration learning—an empirically significant variant of reinforcement learning—for game theory. We show that in games with invariable pay-offs melioration learning converges to Nash equilibria in a way similar to the replicator dynamics. Since melioration learning is known to deviate from optimizing behavior when an action’s rewards decrease with increasing relative frequency of that action, we also investigate an example of a game with frequency-dependent pay-offs. Interactive melioration learning is then still appropriately described by the replicator dynamics, but it indeed deviates from rational choice behavior in such a game.  相似文献   

3.
Self-tuning experience weighted attraction learning in games   总被引:2,自引:0,他引:2  
Self-tuning experience weighted attraction (EWA) is a one-parameter theory of learning in games. It addresses a criticism that an earlier model (EWA) has too many parameters, by fixing some parameters at plausible values and replacing others with functions of experience so that they no longer need to be estimated. Consequently, it is econometrically simpler than the popular weighted fictitious play and reinforcement learning models. The functions of experience which replace free parameters “self-tune” over time, adjusting in a way that selects a sensible learning rule to capture subjects’ choice dynamics. For instance, the self-tuning EWA model can turn from a weighted fictitious play into an averaging reinforcement learning as subjects equilibrate and learn to ignore inferior foregone payoffs. The theory was tested on seven different games, and compared to the earlier parametric EWA model and a one-parameter stochastic equilibrium theory (QRE). Self-tuning EWA does as well as EWA in predicting behavior in new games, even though it has fewer parameters, and fits reliably better than the QRE equilibrium benchmark.  相似文献   

4.
Demands in the Ultimatum Game in its traditional form with one proposer and one responder are compared with demands in an Ultimatum Game with responder competition. In this modified form one proposer faces three responders who can accept or reject the split of the pie. Initial demands in both ultimatum games are quite similar, however in the course of the experiment, demands in the ultimatum game with responder competition are significantly higher than in the traditional case with repeated random matching. Individual round-to-round changes of choices that are consistent with directional learning are the driving forces behind the differences between the two learning curves and cannot be tracked by an adjustment process in response to accumulated reinforcements. The importance of combining reinforcement and directional learning is addressed. Moreover, learning transfer between the two ultimatum games is analyzed.  相似文献   

5.
A learning-based model of repeated games with incomplete information   总被引:3,自引:0,他引:3  
This paper tests a learning-based model of strategic teaching in repeated games with incomplete information. The repeated game has a long-run player whose type is unknown to a group of short-run players. The proposed model assumes a fraction of ‘short-run’ players follow a one-parameter learning model (self-tuning EWA). In addition, some ‘long-run’ players are myopic while others are sophisticated and rationally anticipate how short-run players adjust their actions over time and “teach” the short-run players to maximize their long-run payoffs. All players optimize noisily. The proposed model nests an agent-based quantal-response equilibrium (AQRE) and the standard equilibrium models as special cases. Using data from 28 experimental sessions of trust and entry repeated games, including 8 previously unpublished sessions, the model fits substantially better than chance and much better than standard equilibrium models. Estimates show that most of the long-run players are sophisticated, and short-run players become more sophisticated with experience.  相似文献   

6.
Consider a large population of finitely-lived agents organized into n different hierarchical levels. Every period, all those placed at each level are randomly matched to play a given symmetric game. Based on the resulting outcome, a ρ-fraction of agents who (within their own level) attain the highest payoffs are promoted upwards. On the other hand, newcomers replacing those who die every period enter at the lowest level and choose irreversibly the strategy to be played for the rest of their life. This choice is made, with some noise, by imitating one of the strategies adopted at the highest level.  In this setup, the unique long-run behavior of the system is fully characterized for the whole class of 2×2 coordination games and two alternative variations of the model. The results crucially depend on the key “institutional” parameters ρ and n. In particular, it is shown that inefficient behavior prevails in the long run (even when risk-dominated) if promotion is only mildly selective—high ρ—and the social system is quite hierarchical—large n. In a stylized manner, these parameter conditions may be viewed as reflecting a sort of institutional deficiency that impairs economic performance. Journal of Economic Literature Classification Numbers: C70, C72, C73, D72.  相似文献   

7.
This paper is designed to combine the game-theoretic investigation of the static or equilibrium properties of large strategic market games together with the investigation of some very simple dynamics, which nevertheless are sufficient to show differences between two related games, one with only trade and one in which both borrowing from an outside bank and trade take place. The role of banking reserves emerges as relevant and sensitive to the transient state dynamics.Several 100,000 player games are simulated and the behavior is compared with the analytical prediction for the games with a continuum of agents.The dynamics considered here is so simple that it does not show adaptive learning. A natural extension calls for updating via a learning program such as a genetic algorithm.  相似文献   

8.
We present a theoretical model of noisy introspection designed to explain behavior in games played only once. The model determines layers of beliefs about others' beliefs about …, etc., but allows for surprises by relaxing the equilibrium requirement that belief distributions coincide with decision distributions. Noise is injected into iterated conjectures about others' decisions and beliefs, which causes the predictions to differ from those of deterministic models of iterated thinking, e.g., rationalizability. The paper contains a convergence proof that implies existence and uniqueness of the outcome of the iterated thought process. In addition, estimated introspection and noise parameters for data from 37 one-shot matrix games are reported. The accuracy of the model is compared with that of several alternatives.  相似文献   

9.
We conduct an experiment on a minority-of-three game in which each player is a team composed of three subjects. Each team can freely discuss its strategies in the game and decisions must be made via majority rule. Team discussions are recorded and their content analyzed to detect evidence of strategy co-evolution among teams playing together. Our main results show no evidence supporting the mixed strategy Nash equilibrium solution, and provide evidence more consistent with reinforcement learning models than with belief-based models. Exhibiting level-2 rationality (i.e., reasoning about others’ beliefs) is positively and significantly correlated with higher than average earnings in the game. In addition, teams that are more successful tend to become more self-centered over time, paying more attention to their own past successes than to the behavior of other teams. Finally, we find evidence of mutual adaptation over time, as teams that are more strategic induce competing teams to be more self-centered instead. Our results contribute to the understanding of coordination dynamics resting on heterogeneity and co-evolution of decision rules. In addition, they provide support at the decision process level to the validity of modeling behavior using reinforcement learning models.  相似文献   

10.
This paper studies the cumulative proportional reinforcement (CPR) rule, according to which an agent plays, at each period, an action with a probability proportional to the cumulative utility that the agent has obtained with that action. The asymptotic properties of this learning process are examined for a decision-maker under risk, where it converges almost surely toward the expected utility maximizing action(s). The process is further considered in a two-player game; it converges with positive probability toward any strict pure Nash equilibrium and converges with zero probability toward some mixed equilibria (which are characterized). The CPR rule is compared in its principles with other reinforcement rules and with replicator dynamics. Journal of Economic Literature Classification Number: C72.  相似文献   

11.
Most learning models assume players are adaptive (i.e., they respond only to their own previous experience and ignore others' payoff information) and behavior is not sensitive to the way in which players are matched. Empirical evidence suggests otherwise. In this paper, we extend our adaptive experience-weighted attraction (EWA) learning model to capture sophisticated learning and strategic teaching in repeated games. The generalized model assumes there is a mixture of adaptive learners and sophisticated players. An adaptive learner adjusts his behavior the EWA way. A sophisticated player rationally best-responds to her forecasts of all other behaviors. A sophisticated player can be either myopic or farsighted. A farsighted player develops multiple-period rather than single-period forecasts of others' behaviors and chooses to “teach” the other players by choosing a strategy scenario that gives her the highest discounted net present value. We estimate the model using data from p-beauty contests and repeated trust games with incomplete information. The generalized model is better than the adaptive EWA model in describing and predicting behavior. Including teaching also allows an empirical learning-based approach to reputation formation which predicts better than a quantal-response extension of the standard type-based approach. Journal of Economic Literature Classification Numbers: C72, C91.  相似文献   

12.
As one of the best-known examples of the paradox of backward induction, centipede games have prompted a host of studies with various approaches and explanations (McKelvey and Palfrey, 1992, Fey et al., 1996, Nagel and Tang, 1998, Rapoport et al., 2003, Palacios-Huerta and Volij, 2009). Focusing on initial plays observed in experiments, this paper attempts to offer another explanation based on thorough study of level-k models as applied to these games. Borrowing ideas from the cognitive hierarchy model (Camerer et al., 2004), the authors constructed a group of models based on levels of rationality, and also tested for various assumptions on the play of the most naïve player type in these models. It was found that level-k models generally perform better than the agent quantal response equilibrium (AQRE) model and its variant with altruistic player types for increasing-pie centipede games, while the AQRE model with altruistic player types performs better in constant-pie games.  相似文献   

13.
On economic applications of evolutionary game theory   总被引:32,自引:0,他引:32  
Evolutionary games have considerable unrealized potential for modeling substantive economic issues. They promise richer predictions than orthodox game models but often require more extensive specifications. This paper exposits the specification of evolutionary game models and classifies the possible asymptotic behavior for one and two dimensional models.  相似文献   

14.
Consumption dynamics under information processing constraints   总被引:1,自引:0,他引:1  
This paper studies how “rational inattention” (RI)—a type of information processing constraint proposed by Sims [Sims, C.A., 2003. Implications of rational inattention, Journal of Monetary Economics 50 (3), 665–690]—affects the joint dynamics of consumption and income in a permanent income model with general income processes. Specifically, I propose an analytical approach to solve the multivariate permanent income model with RI and examine its implications for optimal consumption, saving, and welfare. It is shown that RI can affect the relative volatility of consumption and provide an endogenous propagation mechanism that disentangles the short-run and long-run responses of consumption to exogenous income shocks. I also explore how aggregation reduces the impact of the RI-induced endogenous noise on consumption and thus increases the smoothness of aggregate consumption. Finally, I compare RI with four alternative hypotheses (habit formation, signal extraction, robustness, and inattentiveness) by examining their implications for the joint behavior of consumption and income.  相似文献   

15.
Learning to Learn, Pattern Recognition, and Nash Equilibrium   总被引:1,自引:0,他引:1  
The paper studies a large class of bounded-rationality, probabilistic learning models on strategic-form games. The main assumption is that players “recognize” cyclic patterns in the observed history of play. The main result is convergence with probability one to a fixed pattern of pure strategy Nash equilibria, in a large class of “simple games” in which the pure equilibria are nicely spread along the lattice of the game. We also prove that a necessary condition for convergence of behavior to a mixed strategy Nash equilibrium is that the players consider arbitrarily long histories when forming their predictions.Journal of Economic LiteratureClassification Numbers: C72, D83.  相似文献   

16.
Focusing on responder behavior, we report panel data findings from both low and high stakes ultimatum bargaining games. Whereas Slonim and Roth (1998) find that offers are rejected fairly equally across rounds in both low and high stakes games, we find that learning does take place, but only when there is sufficient money on the table. The disparate results can be reconciled when one considers the added power that our experimental design provides-detecting subtle temporal differences in responder behavior requires a data generation process that induces a significant number of proportionally low offers.  相似文献   

17.
It is very commonly admitted that error-learning behaviour in general improves the stability of dynamical economic evolutions. We show here, in the context of Temporary General Equilibrium Theory, that such an intuition is not true, in the sense that “most often,” for a large class of models, learning does not assuredly leads to more stable dynamics. The presentation of the problem then allows for a discussion of the type of hypothesis—very high levels of information or very careful behaviours—which can invalidate such a conclusion.  相似文献   

18.
How do people learn? We assess, in a model-free manner, subjectsʼ belief dynamics in a two-armed bandit learning experiment. A novel feature of our approach is to supplement the choice and reward data with subjectsʼ eye movements during the experiment to pin down estimates of subjectsʼ beliefs. Estimates show that subjects are more reluctant to “update down” following unsuccessful choices, than “update up” following successful choices. The profits from following the estimated learning and decision rules are smaller (by about 25% of average earnings by subjects in this experiment) than what would be obtained from a fully-rational Bayesian learning model, but comparable to the profits from alternative non-Bayesian learning models, including reinforcement learning and a simple “win-stay” choice heuristic.  相似文献   

19.
Previous experimental studies have documented quick convergence to equilibrium play in market entry games with a large number of agents. The present study examines the effect of the available information in a 12-player game in an attempt to account for these findings. In line with the prediction of a simple reinforcement learning model (Roth and Erev, 1995,Games Econ. Behav.8, 164–212), quick convergence to equilibrium is observed even given minimal information (unknown payoff rule). However, in violation of the basic model, information concerning other players' payoff increases the number of entrants. The information effect can be described by a variant of the basic reinforcement learning model assuming that the additional information changes the player's reference point.Journal of Economic LiteratureClassification Number: C7, C92.  相似文献   

20.
A deterministic learning model applied to a game with multiple equilibria produces distinct basins of attraction for those equilibria. In symmetric two-by-two games, basins of attraction are invariant to a wide range of learning rules including best response dynamics, replicator dynamics, and fictitious play. In this paper, we construct a class of three-by-three symmetric games for which the overlap in the basins of attraction under best response learning and replicator dynamics is arbitrarily small. We then derive necessary and sufficient conditions on payoffs for these two learning rules to create basins of attraction with vanishing overlap. The necessary condition requires that with probability one the initial best response is not an equilibrium to the game. The existence of parasitic or misleading actions allows subtle differences in the learning rules to accumulate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号