A Markov decision process is the fundamental model at the heart of most reinforcement learning. It is a directed graph where the nodes are system states; at each node, one or more actions are possible. Actions are the means by which an actor within the system transitions from state to state (moves through the graph).
The behaviour of an MDP can optionally be partially random (stochastic): when a given action is performed at a given node, there is then a certain probability that the system will transfer to each of the conneted nodes.
- alias
- MDP