msg.Machine Learning Catalogue

In policy gradient estimation, an actor learns its behaviour using a weighted policy function that maps parameters describing the current state to actions. It is useful when the input parameters describing the environment state are continuous rather than categorical. The policy is learned without modelling the value function, i.e. without explicit consideration of the expected return from being in each state or carrying out each action. In practice, the policy is almost always modelled using a neural network. It uses gradient ascent to gradually learn the policy weights that maximize the overall return (‘the reward waiting at the end of the episode if this policy is followed’). Gradient ascent follows the same principle as the gradient descent used to minimize error in many other areas of machine learning.

alias
subtype
has functional building block: FBB_Behavioural modelling
has input data type: IDT_Vector of quantitative variables
has internal model: INM_Markov decision process INM_Neural network
has output data type: ODT_Classification
has learning style: LST_Reinforcement
has parametricity: PRM_Nonparametric with hyperparameter(s)
has relevance: REL_Relevant
uses
sometimes supports
mathematically similar to

Policy gradient estimation