All Courses

AI with Python – Reinforcement Learning

David Bane

2 years ago

Reinforcement Learning | insideAIML
Table of Contents
  • Basics of Reinforcement Learning
  • Building Blocks: Environment and Agent
              1. Agent
              2. Agent Terminology
              3. Environment
              4. Properties of Environment
  • Constructing an Environment with Python
  • Constructing a learning agent with Python
          In this blog, you will learn in detail about the concepts of reinforcement learning in AI with Python.

Basics of Reinforcement Learning

           This type of learning is used to reinforce or strengthen the network based on critic information. That is, a network being trained under reinforcement learning, receives some feedback from the environment. However, the feedback is evaluative and not instructive as in the case of supervised learning. Based on this feedback, the network performs the adjustments of the weights to obtain better critic information in the future.
This learning process is similar to supervised learning but we might have very little information. The following figure gives the block diagram of reinforcement learning −
Figure. Reinforcement Learning | insideAIML

Building Blocks: Environment and Agent

          Environment and Agent are the main building blocks of reinforcement learning in AI. This section discusses them in detail −

1. Agent

           An agent is anything that can perceive its environment through sensors and acts upon that environment through effectors.
  • A human agent has sensory organs such as eyes, ears, nose, tongue, and skin parallel to the sensors, and other organs such as hands, legs, mouth, for effectors.
  • A robotic agent replaces cameras and infrared range finders for the sensors, and various motors and actuators for effectors.
  • A software agent has encoded bit strings as its programs and actions.

2. Agent Terminology

The following terms are more frequently used in reinforcement learning in AI −
  • Performance Measure of Agent − It is the criteria, which determines how successful an agent is.
  • The behavior of Agent − It is the action that the agent performs after any given sequence of percepts.
  • Percept − It is an agent’s perceptual inputs at a given instance.
  • Percept Sequence − It is the history of all that an agent has perceived till date.
  • Agent Function − It is a map from the precept sequence to an action.

3. Environment

          Some programs operate in an entirely artificial environment confined to keyboard input, database, computer file systems, and character output on a screen.
In contrast, some software agents, such as software robots or softbots, exist in rich and unlimited softbot domains. The simulator has a very detailed, and complex environment. The software agent needs to choose from a long array of actions in real-time.
For example, a softbot designed to scan the online preferences of the customer and display interesting items to the customer works in the real as well as an artificial environment.

4. Properties of Environment

The environment has multifold properties as discussed below −
  • Discrete/Continuous − If there are a limited number of distinct, clearly defined, states of the environment, the environment is discrete, otherwise it is continuous. For example, chess is a discrete environment and driving is a continuous environment.
  • Observable/Partially Observable − If it is possible to determine the complete state of the environment at each time point from the percepts, it is observable; otherwise it is only partially observable.
  • Static/Dynamic − If the environment does not change while an agent is acting, then it is static; otherwise it is dynamic.
  • Single-agent/Multiple agents − The environment may contain other agents which may be of the same or different kind as that of the agent.
  • Accessible/Inaccessible − If the agent’s sensory apparatus can have access to the complete state of the environment, then the environment is accessible to that agent; otherwise it is inaccessible.
  • Deterministic/Non-deterministic − If the next state of the environment is completely determined by the current state and the actions of the agent, then the environment is deterministic; otherwise it is non-deterministic.
  • Episodic/Non-episodic − In an episodic environment, each episode consists of the agent perceiving and then acting. The quality of its action depends just on the episode itself. Subsequent episodes do not depend on the actions in the previous episodes. Episodic environments are much simpler because the agent does not need to think ahead.
Figure. Properties of Environment | insideAIML

Constructing an Environment with Python

          For building a reinforcement learning agent, we will be using the OpenAI Gym package which can be installed with the help of the following command −
There are various environments in OpenAI gym which can be used for various purposes. Few of them are Cartpole-v0, Hopper-v1, and MsPacman-v0. They require different engines. The detailed documentation of the OpenAI Gym can be found on
The following code shows an example of Python code for cartpole-v0 environment −
Figure. Constructing an Environment with Python
You can construct other environments in a similar way.

Constructing a learning agent with Python

For building reinforcement learning agent, we will be using the OpenAI Gym package as shown −

import gym
env = gym.make('CartPole-v0')
for _ in range(20):
   observation = env.reset()
   for i in range(100):
      action = env.action_space.sample()
      observation, reward, done, info = env.step(action)
      if done:
         print("Episode finished after {} timesteps".format(i+1))
Figure. Constructing a learning agent with Python | insideAIML
Observe that the cartpole can balance itself.
Enjoyed reading this blog? Then why not share it with others. Help us make this AI community stronger. 
To learn more about such concepts related to Artificial Intelligence, visit our insideAIML blog page.
You can also ask direct queries related to Artificial Intelligence, Deep Learning, Data Science and Machine Learning on our live insideAIML discussion forum.
Keep Learning. Keep Growing. 

Submit Review