Reinforcement Learning

What is Reinforcement Learning?

Case 1: Imagine student A, studying hard for the exam and they get good grades because of that.

Case 2: Now imagine student B, spending time on YouTube watching PewDiePie trying to save Sven and failing as they did not study at all, of course.

Throughout our lives, we have such interactions, and they are often attached to a positive or a negative outcome, right?

There are many such situations where our behaviour directly influenced by incentives or punishment. All of our actions are influenced by an incentive to gain something in return for our efforts. 

Inspired by this system of getting incentives, we keep exploring and try to discover which action might lead to better rewards. In this entire process, there’s no one watching us, supervising us or mentoring us; we act solely based on our gut and we perform the action iteratively to see the results, and if they are any better than the previous time. Taking that feedback into consideration, we improves our steps, ideas, etc., and move closer towards our aim.

That’s how humans work, right?

This is what we called Reinforcement Learning (RL), also known as semi-supervised learning model in the terminology of machine learning.

We can pick like 1000s of examples of RL from real world;In football / soccer games, let us say that the environment is partially observable, since players in Team A will not be aware of Team B player’s strategies, positions, and speed of the ball. They act according to their intuition and constantly improve. That’s reinforcement learning, right there!

Key Points in Reinforcement Learning

  • Input: The input should be an initial state from which the model will start
  • Output: There are many possible output as there are variety of solution to a particular problem
  • Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output.
  • The model keeps continues to learn.
  • The best solution is decided based on the maximum reward.

What are they used for?

  • RL Models are used for planning: That means it helps in predicting future states and possible rewards, which is dependant on the path taken. Please note that the path taken (or the choices that will be made) can be decided beforehand without experiencing the environment.
  • In model-based RL, the agent does not possess a priori model of the environment but estimates it while it is learning. In the learning process, it performs least number of interactions with the environment and tries to construct a model. After inducing a reasonable model of the environment, the agent then applies dynamic programming-like algorithms to compute the policy. In contrast in model-free RL, agents learn the model by trial and error process. It learns directly from experience, performs an action, collects the reward (positive or negative), then updates the value functions.

Let’s look at an example, players Me (Smriti) and you (You) are playing a chess game.

Smriti makes the first move then You make a move, again Smriti makes a move and after a certain period Smriti discovers that few of her moves were not good enough. She lost the game but she learns from her mistake. However, if Smriti simulated her moves in her brain after observing a few moves of her opponent and prepared a model and played accordingly then the outcome would have been different!

Let’s talk about the rest of the technicalities in the next blog!

2 thoughts on “Reinforcement Learning

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: