*** Welcome to piglix ***

Predictive modeling


Predictive modeling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive models are often used to detect crimes and identify suspects, after the crime has taken place.

In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given an email determining how likely that it is spam.

Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set, say spam or 'ham'.

Depending on definitional boundaries, predictive modelling is synonymous with, or largely overlapping with, the field of machine learning, as it is more commonly referred to in academic or research and development contexts. When deployed commercially, predictive modelling is often referred to as predictive analytics.

Nearly any regression model can be used for prediction purposes. Broadly speaking, there are two classes of predictive models: parametric and non-parametric. A third class, semi-parametric models, includes features of both. Parametric models make "specific assumptions with regard to one or more of the population parameters that characterize the underlying distribution(s)", while non-parametric regressions make fewer assumptions than their parametric counterparts.

The majority classifier takes non-anomalous data and incorporates it within its calculations. This ensures that the results produced by the predictive modelling system are as valid as possible.

Ordinary least squares is a method that minimizes the sum of squared distances between observed and predicted values.

The generalized linear model (GLM) is a flexible family of models that are unified under a single method. Logistic regression is a notable special case of GLM. Other types of GLM include Poisson regression, gamma regression, and multinomial regression.


...
Wikipedia

...