Bayesian network


A Bayesian network or belief network is a graph whose nodes represent variables that each have two or more values with certain probabilities and whose links model dependency relationships between pairs of variables. A Bayesian network can be used both for classification (the input nodes are set to match a specific data item and the classification probabilities are read off the output node(s)) as well as more generally to investigate the interrelationships between the variables within a system. A probability table capturing all possible combinations of all variables would be a special-case Bayesian network where all node pairs are joined by a link. Because such a table would grow exponentially with respect to the number of variables, the practical aim is generally to minimize the number of links by only linking nodes where a genuine dependency relationship exists.

A standard Bayesian network is a directed graph, but the direction is based solely on how the nodes are to be used (input / output) as well as on which topology allows the most information to be modelled with the fewest number of links. A causal network is a subtype where link directions express causality.

A Bayesian network can be completely created by a subject-matter expert and used to classify input data without any machine learning. This possibility is seen e.g. in medicine where Bayesian networks are used to capture the likelihood of a patient with certain symptoms having a certain illness. However, there are also various levels on which Bayesian networks appear in a machine-learning context:

  • The structure of the network is given and training data is used to derive the probability rules;
  • The network contains intermediate nodes for which data is missing. A Bayesian network for which some variables cannot be observed is known as a hidden Markov model and is commonly used to model time-sequence data. Various algorithms including the expectation-maximization algorithm can be used to derive the missing values;
  • Various techniques can also be used to discover the network structure itself (i.e. which variables are mutually dependent) from training data.

The defining difference between a Bayesian network and a Markov random field is that a Bayesian field is directed while a Markov random field is undirected. As explained excellently in this video, there are some graphs that can be converted from one type to the other without losing information (apart from the directedness), but each type of model is also able to capture information that the other type cannot. Contrary to what the name might suggest, a hidden Markov model is directed and thus a sub-type of Bayesian network rather than a sub-type of Markov random field.

Bayesian belief network Belief network
Causal network Hidden Markov model HMM
has functional building block
FBB_Classification FBB_Feature discovery
has input data type
IDT_Vector of categorical variables IDT_Vector of quantitative variables
has internal model
INM_Probability INM_Rule
has output data type
ODT_Classification ODT_Vector of categorical variables
has learning style
LST_Supervised LST_Unsupervised
has parametricity
has relevance
sometimes supports
mathematically similar to
ALG_Markov random field