The goal of regression is to describe a function that represents the relationship between the dependent and independent variables of a data set.
Principles of Regression
Data Preparation
At the beginning of the regression, the data must be checked for certain rules, such as the sense of creating a regression from the given data, i.e. whether there are any dependent variables at all because if not, no regression can be performed. On the other hand, it has to be clarified how to handle missing data, because they can influence the result.
Model Generation
After reasonable data preparation, a suitable model can be started to be generated. There are certain procedures to represent the relationship between the dependencies of the variables. These include, among others:
Linear Regression
Linear regression is the simplest and most understandable type of regression. For this purpose, a linear representation with the general form:
y = m * x + b
between the input and the output.
Polynomial Regression
With polynomial regression, it is possible to describe non-linear relationships between different variables, such as the spread of disease epidemics. The general form for a polynomial regression equation is
y = a0 * x^0 + a1 * x^1 + a2* x^2 + … + an * x^n
Lasso
Lasso stands for “Least Absolute Shrinkage and Selection Operator”. The goal of the procedure is to form a function with as few coefficients as possible. This is done by a parameter, which limits the number of coefficients. Coefficients that have a small contribution to the representation of the regression will go towards 0. Thus the function becomes smaller and smaller and thus more general. This procedure can be used to determine which variables have only a small influence on the result.
Neural Networks
Neural networks can learn the relationships of the independent variable themselves. This offers advantages in comparison to statistical models because complex correlations do not have to be specially programmed. We have further information on the function mode in our Blog article “What are Artificial Neural Networks?“.
Application
With the regression model it is possible to predict unknown data from a data set. There are a variety of use cases, such as the prediction of the fuel consumption of a vehicle, depending on its weight and speed, or the production of goods depending on the means used. In addition, regression can be used to evaluate studies, such as the correlation between tobacco consumption and a resulting increased mortality rate.