RL-1.0Y: Fundamentals of Reinforcement Learning

Course Details

Course Details

This is an introductory course for Reinforcement Learning, covering basics of reinforcement learning with assignments based on real world problems. The programming language used in the course is Python/C++. 


  • Basic Python / C++ Programming

  • Concepts of probability, Linear Algebra and statistics

  • Machine Learning: The machine learning prerequisite can be covered by taking the AI-1.0Z course at Deep Eigen, which is freely available on the platform - Machine Learning

Course Highlights

  • Core concepts of Reinforcement Learning

  • Mathematical modelling of real world problems and solutions based on RL

  • Deep Reinforcement Learning Algorithms

Payment Modes:

We have two options:

  • Pay online using payment gateway

  • Pay via Bank Transfer

  • In Bank transfer, during refund, there is no payment-gateway fee deduction

  •  Supervised, Unsupervised and Reinforcement Learning
  •  Reinforcement Learning Framework
  •  Examples

  • Multi Armed Bandits
  • Returns, Optimal Value, Optimal Actions
  • Action-Value Methods
  • Greedy Method, Epsilon-Greedy Method, Comparison
  • Upper Confidence Bound for Action Selection
  • Gradient Bandits
  • Examples

  • Introduction to Markov Decision Processes
  • Value Function and Bellman Equations
  • Episodic and Continuous Tasks
  • Policy and Value Functions
  • Optimal Policy and Value Function

  • Policy Evaluation, Improvement, Iteration
  • Value Iteration
  • Asynchronous Dynamic Programming
  • Generalized Policy Iteration
  • Prediction and Control in Reinforcement Learning

  • Introduction: Monte Carlo Learning
  • Monte Carlo Estimation of Action Values
  • Monte Carlo Control: GPI
  • First-Visit and Every-Visit Methods
  • Monte Carlo Control without Exploring Starts
  • Off-Policy Prediction via Importance Sampling
  • Per-Decision and Discounting-Aware Importance Sampling

  • Introduction: Temporal Difference Learning
  • Temporal Difference Prediction
  • TD(0) Optimality and Advantages
  • N-Step Temporal Difference Prediction
  • SARSA : On Policy TD Control
  • Expected SARSA and N-Step SARSA
  • Q-Learning: Off policy control
  • Maximization Bias and Double Learning
  • N-Step Off Policy Learning
  • Off-policy Learning Without Importance Sampling : N-Step Tree Backup

  • Introduction: Models and Planning
  • Dyna Algorithm
  • Dyna-Q Method, Disadvantage
  • Prioritized Sweeping
  • Expected vs Sample Updates
  • Rollout Algorithms and MCTS: Brief Introduction

  • Feature Construction for Function Approximation
  • Value Function Approximation
  • Linear Methods
  • Non-Linear Methods: Deep Neural Networks
  • On Policy Methods with Function approximation
  • Off Policy Methods with Approximation

  • λ-return
  • Online λ-returns and N-Step Truncated (λ)-returns
  • TD(λ), SARSA(λ)
  • Variable λ (Discounting Factor) and γ Returns
  • Off-Policy Methods Traces with Control Variates and Stability
  • Deadly Triad: Function Approximation, Bootstrapping and Off-Policy Training

  • Policy approximation and Advantages
  • Policy Gradient Theorem
  • REINFORCE Algorithm: Monte Carlo Method
  • REINFORCE Algorithm: Baseline Method
  • Actor Critic Method
  • Policy Gradient with Continuous Problems
  • Policy Parameterization for Continuous Actions

  • Value Based Methods: DQN. DDQN, Dueling DDQN

  • Tentative Topic At the Discretion of Instructor: Actor Critic Methods


8 Months from the date of registration or from course start date, whichever is later.

Beyond this period, registrant will have to pay 10% of the course fee for extension of 2 months for charges related to server, maintainence, and assignment evaluation.

Refunds can be done only within 7 days of registration. In case any part of the course becomes online, for that part refund cannot be issued. We recommend checking the free lectures to get an idea of the depth and type of lectures in the course before one registers for the course.

If a refund is requested by a registrant, we will subtract the payment-gateway fee from your payment. An additional 1% of the amount will be deducted for convenience, processing, and server charges. The GST amount will also be subtracted. Any part of the course that is online, fee corresponding to that part will also be deducted. The remaining amount will be refunded to you.

The below example shows the refund process in case of a full refund, i.e., when the course has not become online. 

Payment gateway fee when you make a payment (course fee + tax):

  • National 2.36%: 2% + 18% GST on 2%
  • International 3.54%: 3% + 18% GST on 3%

Amount received by us, denoted by M, after you make a payment of amount X, where y is the payment-gateway fee share (2.36% or 3.54%):

  • X = x + x*18/100
  • M = X - X*y/100
  • Where "x" is the course fee without the GST, and "X" is the course fee with GST (if we are collecting GST).

Refund Process

  • Amount received by us, denoted by M: M = X - X*y/100
  • Refund initiated by us, denoted by N: N = M - M*1.18/100
  • Refund amount uploaded on payment-gateway to be refuned to you after GST subtraction: N - x*0.18
  • Amount refunded to you by the payment-gateway: N - N*y/100

Refund Process, if payment was made via direct bank transfer:

  • Amount received by us: X = x + x*18/100 (if GST is collected)
  • Refund initiated by us: N = X - X*1.18/100 - x*18/100
  • Amount refunded to you: N


  • Instructor’s Name

    Sanjeev Sharma
  • Course Type

  • Fee: India

          ₹ 90000
  • Fee: Foreign

    ₹ 99999
  • Current Status:

    Starts upon Registration
  • Expected Course Engagement

    10-15 Hrs/Week