# Curve Fitting

Curve fitting allows a curve to be formed using a data set to allow an approximation of data points not lying in the data set.

## Interpolation of Different Degree Polynomials

It is possible to fit a curve around a data set using polynomials of any degree. The coefficients are required to be calculated for the polynomial. To do this, a matrix is formed and solved for the coefficients using Gaussian elimination.
Example:
Data:

Displacement (m) Velocity (m/s)
0 0
2 1.3
4 3.7
6 8.1
8 13.9
10 20

The polynomials of degree 5 form 6 equations, each with a data point substituted for x and y.

a0 + a1(0)1 + a2(0)2 + a3(0)3 + a4(0)4 + a5(0)5 = 0

a0 + a1(2)1 + a2(2)2 + a3(2)3 + a4(2)4 + a5(2)5 = 1.3

...

a0 + a1(10)1 + a2(10)2 + a3(10)3 + a4(10)4 + a5(10)5 = 20

These are then put into matrix form as such, and Gaussian elimination produces the final coefficient values.

This can be solved using the Matlab command >> A\b, where A is the matrix of coefficients and b is the column vector of solutions to the equations.

f(x) = 0.8125x - 0.251x2 + 0.1021x3 - 0.0091x4 + 0.0003x5

## Lagrange Interpolating Polynomial

This type of interpolation uses the data points around the required approximated data point to calculate an equation. The following equation is a 3rd order Lagrange interpolating polynomial, using 4 data points.

This method is disadvantageous as it requires a large amount of arithmetic, can not be used if the data points change, and the error is difficult to calculate.

## Least Square Regression

### Linear Regression

This type of curve fitting is linear, where the graph is approximated as a straight line, y(x) = a + bx, using the given data points. Regression is different to interpolation as it does not require all the data points to be held true for the approximated graph, where interpolation requires the new graph to pass through each point.

Linear regression requires finding the coefficients, a and b, of the approximated line. This requires a matrix of the following to be solved:

First find the sum of x, x2, y and x*y of the given data points first. Assuming the data points are as follows:

 x 2 4 6 8 10 y(x) 4000 8000 11000 13000 14000

Therefore, we get:

Which can be substituted and then solved using Gaussian elimination and back substitution:

It should be noted that this approximation will give large errors if the data isn't linear, and should not be alarming. This type of question is also the most likely to be seen in tests.

## End

This is the end of this topic. Click here to go back to the main subject page for Numerical Methods & Statistics.