Markov random field

Algorithm

A Markov random field is an undirected graph where each node captures the (discrete or Gaussian) probability distribution of a variable and the edges represent dependencies between those variables and are weighted to represent the relative strengths of the dependencies. Because of its undirected nature, a Markov random field can contain subgraphs of three or more interconnected nodes that interact in terms of the joint probabilities they represent in a way reminiscent of the Hopfield network. Furthermore, Markov random fields can also be configured to take into account the probabilities of individual variables having certain values (unary factors): for example, the impact of a very improbable value for a given variable on the rest of the network can be reduced on the basis that it is more likely to be a sampling error than a genuine reading.

Markov random fields represent a very general model that specifies little about how they might be built, trained or used. Clearly, when all the variables in a large Markov random field except one are known, it will be easy to calculate the single missing probability. In fact, it can be shown that it is possible for any group of node values within a Markov random field to be calculated based on the remaining node values, but this is an NP-hard problem that, in practice, has to be solved using approximation techniques.

The defining difference between a Bayesian network and a Markov random field is that a Bayesian network is directed while a Markov random field is undirected. As explained excellently in this video, there are some graphs that can be converted from one type to the other without losing information (apart from the directedness), but each type of model is also able to capture information that the other type cannot. Contrary to what the name might suggest, a hidden Markov model is directed and thus a sub-type of Bayesian network rather than a sub-type of Markov random field.

A conditional random field is a Markov random field that has specifically been set up to allow the values of a specific variable or variables to be predicted based on the values of the other variables. Conditional random fields are closely related to other algorithms:

  • Conditional random fields with discrete probability distributions are essentially isomorphic with (simple or multinominal) logistic regression in that the aim is to calculate the probabilities of variables having specific values based on known values of other variables. The main difference is that conditional random fields, like hidden Markov models, are normally used to model sequential time series: the network structure of a conditional random field is more suited to modelling such data than a simple logistic regression equation is.
     
  • In as much as the relationships within a conditional random field are expressed as weights that are trained using backpropagation, the conditional random field also has a lot in common with neural network architectures like the perceptron . Differences are:
     
    • the standard type of conditional random field can only solve linear problems (although the same is true of the original form of the one-row perceptron; and there are also multilayer conditional random fields that can be used for more complex problem spaces);
       
    • a conditional random field is normally parametric, i.e. the data scientist specifies the precise nature of the relationships being modelled, while a perceptron is normally non-parametric, i.e. the data scientist supplies the raw data and expects the perceptron to learn the nature of the relationships as well as their relative weights.
alias
MRF Markov network
subtype
Discrete Markov random field Gaussian Markov random field Conditional random field Multilayer conditional random field
has functional building block
FBB_Classification FBB_Feature discovery
has input data type
IDT_Vector of categorical variables IDT_Vector of quantitative variables
has internal model
INM_Probability INM_Rule
has output data type
ODT_Classification ODT_Vector of categorical variables
has learning style
LST_Unsupervised LST_Supervised
has parametricity
PRM_Parametric
has relevance
REL_Relevant
uses
sometimes supports
mathematically similar to
ALG_Logistic regression ALG_Bayesian network