Naive Bayesian Classifier


A naive Bayesian classifier is used to calculate the probability of a data item belonging to each of two or more classes based on its input vector values. Typically, the probabilities are obtained from training data by observing how often a data item with each input value belongs to each class.

Bayes’ theory provides the means of combining multiple such probabilities to yield an overall probability that is weighted for the overall frequencies with which each class occurs, e.g. with a medical test for a certain type of cancer: that corrects for the fact that a much larger proportion of people in the general population will not have the cancer than will have it at any given time.

The algorithm is referred to as naive because it presumes that the input vector values occur independently of one another. Just as with the collinearity assumption that applies to least-squares regression, a naive Bayesian classifier will often still produce results that are very much usable provided that the mutual dependencies between input values are not too great. This is especially the case if it is being used to produce qualitative (which data items belong in which class) rather than quantitative results (what is the probability that each data item belongs in each class). At the same time, discriminant analysis offers a valid alternative for situations where the independence restriction needs to be relaxed.

Different types of naive Bayesian classifier are used with different types of input values.

For categorical input data:

  • If the input vector values are boolean (e.g. a text contains a given word), the probabilities are combined using the Bernoulli naive Bayes algorithm.

For quantitative input data:

  • If the input vector values are scalar (e.g a text contains a given word a stated number of times), the probabilities are combined using the multinomial naive Bayes algorithm. Zero probabilities (which result when a given input value never predicts a given class in the training data) are mathematically incompatible with multinomial naive Bayes. They have to be replaced with small positive values using a technique called additive smoothing.

  • if the input vector values are continuous with Gaussian distribution, i.e. the input values that predict each class are normally distributed around specific points on a scale, the mean and standard deviation for each class can be plugged into an equation that then calculates the probability of a given input value belonging to each class.

Naive Bayes
Gaussian Naive Bayes Multinomial Naive Bayes Bernoulli Naive Bayes
has functional building block
has input data type
IDT_Vector of categorical variables IDT_Vector of quantitative variables
has internal model
has output data type
ODT_Classification ODT_Probability
has learning style
has parametricity
has relevance
sometimes supports
mathematically similar to