The Monte-Carlo tree search is a simple method for finding the optimal path through a Markov decision process. The algorithm learns by following random paths through the process and storing the information about whether a path was successful or not on all nodes on that path. With time, each node will amass information about what proportion of the paths that went through it were successful, and the resulting weights can be used as a policy for future journeys. Because of the greedy-search nature of the standard Monte-Carlo algorithm, it is only a feasible option where the problem space is relatively small. However, if it is possible to try out all possible paths through a Markov decision process, the algorithm is guaranteed to find the best solution.
UCT (Upper Confidence Bound 1 applied to trees) is a more feasible variant of the Monte-Carlo tree search that brings exploitation and exploration into the equation so that the policy learned up to now can be taken account of during training in a way that speeds up learning without completely disregarding paths that have not yet been explored.
- alias
- subtype
- UCT Upper Confidence Bound 1 applied to trees
- has functional building block
- FBB_Behavioural modelling
- has input data type
- IDT_Vector of categorical variables IDT_Binary vector
- has internal model
- INM_Markov decision process
- has output data type
- ODT_Classification
- has learning style
- LST_Reinforcement
- has parametricity
- PRM_Parametric
- has relevance
- REL_Relevant
- uses
- sometimes supports
- mathematically similar to