The cost function tells us how well the model is doing so we can try to improve it.

Model:

This is our training set for the model () :

: parameters (also referred to as coefficients / weights)

Parameters are the variables you can adjust during training to improve the model. Depending on the values you chose for , you’ll get a different function

Let’s graph different values for based on

You can intuit that w dictates the slope of f(x)

You want to find correct values for so that matches your training set.

is a training example. y is the target


The Squared Error Cost Function is one type of many cost functions that machine learning engineers could use depending on the model they are building. It is also the most commonly used Cost Function, especially for Linear Regression Models.

To build a cost function that doesn’t get too large based on the number of training examples we compute the average squared error instead of total squared error by dividing by

You can also write the cost function like this:

We can replace the prediction with

When using the model we want to find the parameters that best fit our training set.

We use the cost function for this.

The goal is to minimize which will give us our desired params

If we were to graph the squared cost function for a linear regression model with params it would look something like this, but slightly different based on training set. Both w and b are on the x axis below.

This is known as a contour plot, which is similar to topographical maps

The plot on the top right is the horizontal slices of the contour plot below. This is why the same J error rate is on the same colored slice.