Reinforcement learning is one of the three major types of machine learning, next to supervised and unsupervised learning. It allows machines (or agents) to learn actively from the environment through its actions. The training process leverages a continuous reward system to encourage good actions and discourage bad actions. Before the agent can start learning, it is vital to set a collection of rules to limit an agents range of actions. This step is quite important in the preparation phase as your agent will start exploring all his different actions to reach its predefined goal, if the rules are wrongly defined, surprising results can occur.
To demonstrate reinforcement learning, an agent (the moon lander) trained itself... Read more
To demonstrate reinforcement learning, an agent (the moon lander) trained itself to land safely between the yellow flags. As reinforcement learning implies, the moon lander did not receive instructions before, it was not told that landing safely in a certain area would be beneficial. To learn from its actions, the moon lander had to explore its environment and look for actions with a positive reward (crashing = - -, time = -, not crashing = +, landing between flags = +++). After some training, the agent had enough experience to determine what action would have a positive impact on its rewards.
As our agent was situated in a continuous action space - meaning... Read more
As our agent was situated in a continuous action space - meaning every action (going up, left, right) had an immediate impact on the state of its environment, we chose to leverage a ‘Deep Deterministic Policy Gradient’. DDPG is an actor-critic algorithm that is often used in a continuous, high dimensional action space.
Reinforcement learning is the perfect type of machine learning when you need... Read more
Reinforcement learning is the perfect type of machine learning when you need to achieve a certain goal, without exactly knowing how to reach that goal. It is great at discovering new strategies to resolve a given problem. Therefore, it is often associated with playing games – think of AlphaGo that set the Go community on fire after beating the World Champion by discovering a whole new tactic to try to win. Other current use cases are to train robots perform delicate tasks (which could not be hard coded due to its complexity), text mining or trade execution. The strength of reinforcement learning is that it has no bias when working towards a given goal, which encourages the discovery of new, creative solutions.