Monte-Carlo tree search


The Monte-Carlo tree search is a simple method for finding the optimal path through a Markov decision process. The algorithm learns by following random paths through the process and storing the information about whether a path was successful or not on all nodes on that path. With time, each node will amass information about what proportion of the paths that went through it were successful, and the resulting weights can be used as a policy for future journeys. Because of the greedy-search nature of the standard Monte-Carlo algorithm, it is only a feasible option where the problem space is relatively small. However, if it is possible to try out all possible paths through a Markov decision process, the algorithm is guaranteed to find the best solution.

UCT (Upper Confidence Bound 1 applied to trees) is a more feasible variant of the Monte-Carlo tree search that brings exploitation and exploration into the equation so that the policy learned up to now can be taken account of during training in a way that speeds up learning without completely disregarding paths that have not yet been explored.

UCT Upper Confidence Bound 1 applied to trees
has functional building block
FBB_Behavioural modelling
has input data type
IDT_Vector of categorical variables IDT_Binary vector
has internal model
INM_Markov decision process
has output data type
has learning style
has parametricity
has relevance
sometimes supports
mathematically similar to