msg.Machine Learning Catalogue

A Markov decision process is the fundamental model at the heart of most reinforcement learning. It is a directed graph where the nodes are system states; at each node, one or more actions are possible. Actions are the means by which an actor within the system transitions from state to state (moves through the graph).

The behaviour of an MDP can optionally be partially random (stochastic): when a given action is performed at a given node, there is then a certain probability that the system will transfer to each of the conneted nodes.

alias: MDP

used by: ALG_Actor-critic ALG_Deep Q-network ALG_Monte-Carlo tree search ALG_Neural actor-critic ALG_Policy gradient estimation ALG_Q-learning ALG_SARSA ALG_Temporal difference learning