Metabolic cost as an organizing principle for cooperative learning
This paper investigates how a population of neuron-like agents can use metabolic cost to communicate the importance of... more This paper investigates how a population of neuron-like agents can use metabolic cost to communicate the importance of their actions. Although decision-making by individual agents has been extensively studied, questions regarding how agents should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization by metabolic cost aligns the information content of actions with their expected reward. Thus, metabolic cost provides a mechanism whereby agents encode expected reward into their outputs. Further, aside from reducing energy expenditures, imposing a tight metabolic constraint also increases the accuracy of empirical estimates of rewards, increasing the robustness of distributed learning. Finally, we present two implementations of metabolically constrained learning that confirm our theoretical finding. These results suggest that metabolic cost may be an organizing principle underlying the neural code, and may also provide a useful guide to the design and analysis of other cooperating populations.
2 views
Seen by:Modele Dayalı Pekiştirme ile Öğrenme için Önem Örneklemesi
by Orhan Sönmez
Co-authored with A. Taylan Cemgil
publish in SIU2012
2012
40 views
Seen by:An Integrated Model of Associative and Reinforcement Learning
Any successful attempt at explaining and replicating the complexity and generality of human and animal learning will... more Any successful attempt at explaining and replicating the complexity and generality of human and animal learning will require the integration of a variety of learning mechanisms. Here we introduce a computational model which integrates associative learning and reinforcement learning. We contrast the integrated model with associative learning and reinforcement learning models in two simulation studies. The first simulation demonstrates performance advantages for the integrated model in an environment with a dynamic and complex reward structure. The second simulation contrasts the performances of the three models in a classic latent learning experiment (Blodgett, 1929), demonstrating advantages for the integrated model in predicting and explaining the behavioral data.
Decentralised reinforcement learning for energy-efficient scheduling in wireless sensor networks
Mihaylov, M., Le Borgne, Y-A., Tuyls, K. and Nowé, A. (2012) ‘Decentralised reinforcement learning for energy-efficient scheduling in wireless sensor networks’, International Journal of Communication Networks and Distributed Systems, Vol. 9, Nos. 3/4, pp.207–224.
We present a self-organising reinforcement learning (RL) approach for scheduling the wake-up cycles of nodes in a... more We present a self-organising reinforcement learning (RL) approach for scheduling the wake-up cycles of nodes in a wireless sensor network. The approach is fully decentralised, and allows sensor nodes to schedule their active periods based only on their interactions with neighbouring nodes. Compared to standard scheduling mechanisms such as SMAC, the benefits of the proposed approach are twofold. First, the nodes do not need to synchronise explicitly, since synchronisation is achieved by the successful exchange of data messages in the data collection process. Second, the learning process allows nodes competing for the radio channel to desynchronise in such a way that radio interferences and therefore packet collisions are significantly reduced. This results in shorter communication schedules, allowing to not only reduce energy consumption by reducing the wake-up cycles of sensor nodes, but also to decrease the data retrieval latency. We implement this RL approach in the OMNET++ sensor network simulator, and illustrate how sensor nodes arranged in line, mesh and grid topologies autonomously uncover schedules that favour the successful delivery of messages along a routing tree while avoiding interferences.
18 views
Seen by:Learning Behaviours for Robot Soccer
by James Brusey
PhD Thesis, Awarded Australasian Distinguished Doctoral Dissertation 2004 (www.core.edu.au)
A central problem in autonomous robotics is how to design programs that determine what the robot should do next.... more
A central problem in autonomous robotics is how to design programs that determine what the robot should do next. Behaviour-based control is a popular paradigm, but current approaches to behaviour design typically involve hand-coded behaviours. The aim of this work is to explore the use of reinforcement learning to develop autonomous robot behaviours automatically, and specifically to look at the performance of the resulting behaviours.
This thesis examines the question of whether behaviours for a real behaviour-based, autonomous robot can be learnt under simulation using the Monte Carlo Exploring Starts, -soft On Policy Monte Carlo or linear, gradient-descent Sarsa algorithms. A further question is whether the increased performance of learnt behaviours carries through to increased performance on the real robot. In addition, this work looks at whether continuing to learn on the real robot causes further improvement in the performance of the behaviour.
A novel method is developed, termed Policy Initialisation, that makes use of the domain knowledge in an existing, hand-coded behaviour by converting the behaviour into either a reinforcement learning policy or an action-value function. This is then used to bootstrap the learning process.
The Markov Decision Process model is central to reinforcement learning algorithms. This work examines whether it is possible to use an internal world model in the real robot to suit the requirements of the Markov Decision Process model.
The methodology used to answer these questions is to take three realistic, non-trivial robotic tasks, and attempt to learn behaviours for each. The learnt behaviours are then compared with hand-coded behaviours that have either been published or used in international competition. The tasks are based on real task requirements for robots used in a RoboCup Formula 2000 robot soccer team. The first is a generic movement behaviour that moves the robot to a target point. The second requires the robot to dribble the ball in an arc so that the robot maintains possession and so that the final position is lined up with the goal. The third addresses the problem of kicking the ball away from the wall.
The results show that for these three different types of behavioural problem, reinforcement learning on a simulator produced significantly better performance than hand-coded equivalents, not only under simulation but also on the real robot. In contrast to this, continuing the learning process on the real robot did not significantly improve performance.
The Policy Initialisation technique is found to accelerate learning for tabular Monte Carlo methods, but makes minimal improvement and is, in fact, costly to use in conjunction with linear, gradient-descent Sarsa. This approach, unlike some other techniques for accelerating learning, does not appear to bias the solution.
Finally, the evidence from this thesis is that internal world models that maintain the requirements of Markov Decision Processes can be constructed, and this appears to be a sound approach to avoiding problems connected with partial observability that have previously occurred in the use of reinforcement learning in robotic environments.
58 views
Seen by:Evaluation of Machine Learning Methods on a Swinging Humanoid
Not published
We show that, given a specific task, a variety of machine learning algorithms can be applied. The approaches are... more We show that, given a specific task, a variety of machine learning algorithms can be applied. The approaches are evaluated in terms of performance in a simulated environment and applicability to a real-world task. We argue that no approach performs optimally in all aspects considered.
9 views
Seen by:The Cult of the Cross in the Order of the Temple
published in As Ordens Militares. Freires, Guerreiros, Cavaleiros. Actas do VI Encontro sobre Ordens Militares, Vol. 1, GEsOS / Município de Palmela (Palmela, 2012), 207–219
Decentralized Learning in Wireless Sensor Networks
"Decentralized Learning in Wireless Sensor Networks," Lecture Notes in Computer Science (Springer Berlin/Heidelberg), vol. 5924, pp. 60-73, 2010.
In this work we present a reinforcement learning algorithm that aims to increase the autonomous lifetime of a Wireless... more In this work we present a reinforcement learning algorithm that aims to increase the autonomous lifetime of a Wireless Sensor Network (WSN) and decrease its latency in a decentralized manner. WSNs are collections of sensor nodes that gather environmental data, where the main challenges are the limited power supply of nodes and the need for decentralized control. To overcome these challenges, we make each sensor node adopt an algorithm to optimize the efficiency of a small group of surrounding nodes, so that in the end the performance of the whole system is improved. We compare our approach to conventional ad-hoc networks of different sizes and show that nodes in WSNs are able to develop an energy saving behaviour on their own and significantly reduce network latency, when using our reinforcement learning algorithm.
Self-Organizing Synchronicity and Desynchronicity using Reinforcement Learning
M. Mihaylov, Y.-A. Le Borgne, K. Tuyls, and A. Nowé, "Self-Organizing Synchronicity and Desynchronicity using Reinforcement Learning," in Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011), Rome, Italy, 2011, pp. 94-103.
We present a self-organizing reinforcement learning (RL) approach for coordinating the wake-up cycles of nodes in a... more We present a self-organizing reinforcement learning (RL) approach for coordinating the wake-up cycles of nodes in a wireless sensor network in a decentralized manner. To the best of our knowledge we are the first to demonstrate how global synchronicity and desynchronicity can emerge through local interactions alone without the need of central mediator or any form of explicit coordination. We apply this RL approach to wireless sensor nodes arranged in different topologies and study how agents, starting with a random policy, are able to self-adapt their behavior based only on their interaction with neighboring nodes. Each agent independently learns to which nodes it should synchronize to improve message throughput and at the same with whom to desynchronize in order to reduce communication interference. The obtained results show how simple and computationally bounded sensor nodes are able to coordinate their wake-up cycles in a distributed way in order to improve the global system performance through (de)synchronicity.
30 views
Seen by:Sparse reward processes
Working paper
arXiv:1201.255
EPFL-WORKING-174030
We introduce a class of learning problems where the agent is presented with a series of tasks. Intuitively, if there... more We introduce a class of learning problems where the agent is presented with a series of tasks. Intuitively, if there is relation among those tasks, then the information gained during execution of one task has value for the execution of another task. Consequently, the agent is intrinsically motivated to explore its environment beyond the degree necessary to solve the current task it has at hand. We develop a decision theoretic setting that generalises standard reinforcement learning tasks and captures this intuition. More precisely, we consider a multi-stage stochastic game between a learning agent and an opponent. We posit that the setting is a good model for the problem of life-long learning in uncertain environments, where while resources must be spent learning about currently important tasks, there is also the need to allocate effort towards learning about aspects of the world which are not relevant at the moment. This is due to the fact that unpredictable future events may lead to a change of priorities for the decision maker. Thus, in some sense, the model "explains" the necessity of curiosity. Apart from introducing the general formalism, the paper provides algorithms. These are evaluated experimentally in some exemplary domains. In addition, performance bounds are proven for some cases of this problem.
14 views
Seen by:12 views
Seen by:2 views
Seen by:
