Regression

The regression analysis is used for determining or estimating model parameters. The starting point are measured values and a model that the measured values should be based on. Since the measured values generally subject to fault optimizes the regression analysis, the model parameters to optimal adaptation. The basic procedure is the method of least squares.

Least square method

The method of least squares, a method of compensation calculation. With the method, an optimum compromise is calculated, in which the squares of the deviations are minimized by the model function.

With respect to the measured values (x_i, y_i) and the model function f is the quadratic deviation is minimized. To achieve this, the parameters a_i of the model function are determined so that the following condition is satisfied.

$\sum_{i = 1}^{n} {(f (x_{i}, \vec{a}) - y_{i})}^{2} \to min$

Model function: Linear fit

For the determination of the regression line is a linear model function f is used for the least squares method.

$f (x, \vec{a}) = a_{0} + a_{1} x$

The deviation of the regression line to the measured values is then given as follows.

$\begin{matrix} a_{0} + a_{1} x_{1} - y_{1} = r_{1} \\ a_{0} + a_{1} x_{2} - y_{2} = r_{2} \\ ⋮ \\ a_{0} + a_{1} x_{n} - y_{n} = r_{n} \end{matrix}$

The goal now is the sum of squares of the deviations from a straight line r to make as small as possible.

$\sum_{i = 1}^{n} r_{i}^{2}$ $= {(a_{0} + a_{1} x_{1} - y_{1})}^{2} + \dots$ $+ {(a_{0} + a_{1} x_{n} - y_{n})}^{2}$ $\to min$

The extremum is determined by the partial derivatives with respect to a ₀ and a _{be set 1} zero.

$\frac{\partial}{\partial a_{0}} {(a_{0} + a_{1} x_{1} - y_{1})}^{2} + \dots$ $+ {(a_{0} + a_{1} x_{n} - y_{n})}^{2} = 0$

$\frac{\partial}{\partial a_{1}} {(a_{0} + a_{1} x_{1} - y_{1})}^{2} + \dots$ $+ {(a_{0} + a_{1} x_{n} - y_{n})}^{2} = 0$

The resolution of the equation system provides the parameters a₀ and a₁ of the regression line.

$a_{0} = \overline{y} - a_{1} \overline{x}$

$a_{1} = \frac{\sum_{i = 1}^{n} (x_{i} - \overline{x}) (y_{i} - \overline{y})}{\sum_{i = 1}^{n} {(x_{i} - \overline{x})}^{2}}$

Online-Calculator:

Online-Calculator: Fitting linear line

Fitting of exponential functions

If the measured values is an exponential relationship is based can also be used for the best fit straight line linear model. Therefore it is necessary to take the logarithm, the measured values, because then gives a linear equation by substitution.

$y = b \cdot a^{x}$

Logarithm leads to a linear equation.

$\ln y = \ln b + x \ln a$

With the logarithm of measured values y' and the substitutions a' = ln a and b'= ln b is present, the linear model.

$y' = b' + a' x$

Model function: power functions

The approximation of a power function is performed by returning to the linear model function.

$y = a \cdot x^{b}$

Logarithm leads to a linear equation.

$\ln y = \ln a + b \ln x$

With the logarithmic measurement values y' and the substitutions a' = ln a and x' = ln x is the linear model before.

$y' = a' + b x'$

Online-Calculator:

Online-Calculator: Power function

Fitting of the Gaussian distribution

The Gaussian distribution or normal distribution is defined as follows:

$f (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{1}{2} \frac{{(x - μ)}^{2}}{σ^{2}}}$

The fitting of the Gaussian distribution to the measured values takes place by forming the weighted mean value of the measured values. The weighted mean value corresponds to the μ In the Gaussian distribution. The standard deviation of the measured values from the mean value is the σ in the normal distribution.

$μ = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sum_{i = 1}^{n} y_{i}}$

$σ = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - μ)}^{2} y_{i}}{\sum_{i = 1}^{n} y_{i}}}$

Online-Calculator:

Online-Calculator: Normal distribution Normal Distribution Plot

Model function: Periodically (Fourier series)

Measured values can also be approximated by the periodic functions. The procedure for this is the development of a Fourier series. The elements of the Fourier series are sine and cosine functions. The development takes place in ascending order of frequencies.

The Fourier series is:

$s_{n} (x) = \frac{a_{0}}{2} + \sum_{k = 1}^{n} (a_{k} cos (k ω x) + b_{k} sin (k ω x))$

with the Fourier coefficients a_k und b_k and ω = 2π/T. This is the period T = b - a with the initial interval a and the end of interval b.

The Fourier coefficients a_k und b_k satisfy the least squares condition for the associated sine or cosine function. The coefficients are calculated as follows.

$a_{k} = \frac{2}{l} \int_{a}^{b} f (x) cos (k ω x) dx$

$b_{k} = \frac{2}{l} \int_{a}^{b} f (x) sin (k ω x) dx$

Online-Calculator:

Online-Calculator: Fourier approximation

Polynomial approximation using the QR method

The linear approximation problem is solved by the QR decomposition. The calculator determines the coefficients of the n-th degree polynomial.

The starting point is the over-determined system of equations:

$A x = b$

$with x \in R^{n} und A \in R^{n x m}$

The QR decomposition leads to the factorization of the matrix A:

$A = Q R$

This applies to the compensation problem:

${|| A x - b ||}_{2}^{2} = {|| Q R x - b ||}_{2}^{2} = {|| R^{*} x - Q^{T*} b ||}_{2}^{2}$

here R and Q are reduced to the relevant proportion. That is, R^* is the upper triangular matrix of R and Q^T* contains the corresponding rows of Q.

Replacing A by the Vandermond matrix with the corresponding measured values x_i and b by the measured values y_i yields the coefficients of the compensating polynomial as the solution of the equation system.

Online-Calculator:

Online-Calculator: Polynomial fitting

Mean values and standard deviation

The arithmetic means

$\overline{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$

$\overline{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}$

Standard deviation from the mean

$σ = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i} - \overline{x})}^{2}}$

For the standard deviation of the regression line of the average value of x of the relevant function value of the straight line is to be replaced.

Weighted average and standard deviation

The weighted average μ is formed by multiplying the measured values by their respective weight y_i.

$μ = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sum_{i = 1}^{n} y_{i}}$

In the standard deviation, the respective weights y_i must also be considered.

$σ = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - μ)}^{2} y_{i}}{\sum_{i = 1}^{n} y_{i}}}$

Releated sites

Here is a list of of further useful sites:

Calculator

The Online-Calculator The online calculator performs a least squares compensation calculation for the following functions: Equalization line, power approximation, equalization polynomial, normal distribution and Fourier approximation. The input of the measured values can be done with a table or alternatively the data can be read in from a file. The parameters of the compensation function are calculated and the function is displayed graphically.

Online-Calculator:

Curve fitting for: linear line, power function, polynomial, normal distribution, Fourier series Fourier series calculator

List of further sites:

Index Trigonometric calculations Normal Distribution Plot NxN Gauss method Derivation rules

Regression

Least square method

Model function: Linear fit

Fitting of exponential functions

Model function: power functions

Fitting of the Gaussian distribution

Model function: Periodically (Fourier series)

Polynomial approximation using the QR method

Mean values ​​and standard deviation

Weighted average and standard deviation

Releated sites

Calculator

Mean values and standard deviation