A Markov decision process (MDP) is a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker.
A Markov decision process is the fundamental model at the heart of most reinforcement learning. It is a directed graph where the nodes are system states; at each node, one or more actions are possible. Actions are the means by which an actor within the system transitions from state to state (moves through the graph). Hence, a Markov Decision Process is an extension of a Markov Chain that includes the concept of decision-making.
The behaviour of an MDP can optionally be partially random (stochastic): when a given action is performed at a given node, there is then a certain probability that the system will transfer to each of the connected nodes.
An MDP is defined by the following components:
- A set of states (
S
) representing all possible situations. - A set of actions (
A
) available to the decision-maker. - A transition function (
T
) that defines the probability of moving from one state to another given a specific action. - A reward function (
R
) that assigns a numerical value to each state-action pair, representing the immediate reward received after transitioning from one state to another. - A discount factor (
γ
) that represents the importance of future rewards compared to immediate rewards.
For example, consider a robot navigating a grid. The states represent the robot’s position on the grid, the actions represent the possible movements (up
, down
, left
, right
), the transition function defines the probability of moving to a new position given a movement action, and the reward function assigns a value to each movement based on the robot’s goal (e.g., reaching a target location).
MDPs are used to find optimal policies, which are strategies that define the best action to take in each state to maximize the expected cumulative reward over time. Solving an MDP involves finding the policy that maximizes the expected reward, often using dynamic programming techniques such as value iteration or policy iteration.
Markov decision processes are an essential concept in reinforcement learning and stochastic processes, providing a framework for modeling and solving decision-making problems under uncertainty.
- Related terms
- Reinforcement Learning Stochastic Processes