Multivariate adaptive regression splines

Algorithm

Multivariant adaptive regression splines or MARS (also called Earth in many open-source implementations because MARS is a trademark) performs a similar function to least-squares regression, but is used when the relationship of one or more predictor variables to the dependent variable are thought to vary over its value range. A simple example would be the increase in air humidity caused by heating water, which would be much more rapid once the boiling point of 100 Celcius had been reached.

The output of a MARS regression is a set of basis functions whose outputs are added together. The most important of these are hinge functions with the form

max(0,x-c)

which has the effect of meaning that the function only becomes relevant for values of x that are greater than c, which is referred to as a knot.

MARS builds its models using a type of stepwise regression where predictor variables with candidate knots are added to the model one by one. However, MARS does not share the weaknesses of general stepwise regression:

  • the set of predictor variables is still determined by the data scientist; MARS guarantees that all predictor variables will make it into the final model and only aims to find out the optimal knots.
     
  • MARS contains an inbuilt solution to overfitting: following the stepwise regression step, all generated basis functions that do not contribute to the accuracy of the model above a certain threshold are pruned out, meaning that only the more efficient predictors remain in the final version.

The complexity of MARS means it requires considerably more training data than ordinary least-squares regression - at least an order of magnitude more. However, there are various optional constraints that the user can put on the procedure to simplify it:

  • the maximum number of basis functions that may be generated in the stepwise regression phase;
     
  • the accuracy threshold below which basis functions should be removed from the model in the pruning phase, expressed as a penalty number whose typical value lies between 2 and 3;
     
  • which predictor variables should be subject to MARS and which should just be modelled using normal linear functions;
     
  • whether basic functions should only consist of simple hinge functions or whether basic functions should also be allowed where two or even more hinge functions are multiplied together (the permitted degree of interaction). Models with degrees of interaction greater than one can capture more complex relationships between predictor variables, but also tend to be harder to understand and to require still more training data to avoid overfitting.

A MARS model is isomorphic with some types of decision tree used for value prediction, although the way the algorithms generate the models are different.

alias
MARS Earth
subtype
has functional building block
FBB_Value prediction
has input data type
IDT_Vector of quantitative variables
has internal model
INM_Function
has output data type
ODT_Quantitative variable
has learning style
LST_Supervised
has parametricity
PRM_Parametric
has relevance
REL_Relevant
uses
ALG_Least Squares Regression
sometimes supports
mathematically similar to
ALG_Decision tree