Lecture 2: Markov Decision Processes, Dynamic Programming

Video

   

Description

Disclaimer: This is the most mathematical lecture out of the StarAi series. Whilst we endeavoured to make the StarAi content as accessible as possible, this particular lecture covers the base fundamentals & therefore contains the most formulas. If formulas is not for you please proceed to week 3. If however you would like to dive deeper down the mathematical formulation of the RL framework stick around for lecture 2! We also highly recommend David Silver’s excellent course on Youtube.

In this lecture you will learn the fundamentals of Reinforcement Learning. We start off by discussing the Markov environment and its properties, gradually building our understanding of the intuition behind the Markov Decision Process and its elements, like state-value function, action-value function and policies. We then move on to discussing Bellman equations and the intuition behind them. At the end we will explore one of the Bellman equation implementations, using the Dynamic Programming approach and finish with an exercise, where you will implement state-value and action-value functions algorithms and find an optimal policy to solve the Gridworld problem.

   

Lecture Slides

StarAi Lecture 2 Markov Decision Processes slides

   

Exercise

Follow the link below to access the exercises for lecture 2:

Lecture 2 Exercise 1: Policy Evaluation Exercise

Lecture 2 Exercise 2: Policy Iteration Exercise

Lecture 2 Exercise 3: Value Iteration Exercise

   

Exercise Solutions

Follow the link below to access the exercise solutions for lecture 2:

Exercise Solutions 1: Policy Evaluation

Exercise Solutions: Value Iteration

   

Additional Learning Material

  1. Sutton & Barto’s Reinforcement Learning: An Introduction - All of Chapter 3 & 4.