IE8571

IE 8571 - Advanced Reinforcement Learning and Dynamic Programming (4 Cr.)

Industrial and Systems Engineering (11138) TIOT - College of Science and Engineering

IE 8571 - Advanced Reinforcement Learning and Dynamic Programming (4 Cr.)

Course description

Topics are methods for solving problems in sequential decision making. We will introduce the modeling framework of Markov Decision Processes (MDP), and the classic solution approach of dynamic programming. We will discuss the traditional solution approaches to dynamic programming of value and policy iteration. We will then move onto model free methods of finding optimal policies for MDPs such as Monte Carlo and Temporal Difference methods. We will discuss the extension of these methods to problems with large state spaces where it is necessary to introduce parametric approximations such as deep neural networks. Examples will be drawn from problems in navigation, medicine, game play, and others. We will discuss the convergence proofs for a variety of the algorithms in the so-called 'tabular setting', e.g., policy iteration, value iteration, Q-learning, and Sarsa.

Prerequisites: Knowledge of probability, optimization, and linear algebra at the undergraduate level. Knowledge of Markov chains at level of IE 8532 or equivalent. Ability to read and write mathematical proofs.

Minimum credits

4

Maximum credits

4

Is this course repeatable?

No

Grading basis

OPT - Student Option

Lecture

Requirements

000017

Credit will not be granted if credit has been received for:

03077

Fulfills the writing intensive requirement?

No

Typically offered term(s)

Fall Odd Year