part III: errors and error surfaces
when working with functions in machine learning, we often visualize data points in a graph. let’s consider a simple binary classification problem where we have four points:
- (0,0)
- (0,1)
- (1,0)
- (1,1)
we want to find a linear boundary that separates these points. a common way to define such a boundary is using the equation:
w0 + w1 * x1 + w2 * x2 >= 0
this equation represents a line (or a hyperplane in higher dimensions). points that satisfy the inequality belong to one class, while those that do not belong to the other class.
visualizing decision boundaries
let’s choose an initial threshold value w0 = 1 and try different values for w1 and w2. for example:
- w1 = 1, w2 = 1 → decision boundary:
1 + 1*x1 + 1*x2 = 0 - w1 = -1, w2 = -1 → decision boundary:
1 - 1*x1 - 1*x2 = 0 - w1 = 1.5, w2 = 0 → decision boundary:
1 + 1.5*x1 + 0*x2 = 0 - w1 = 0.45, w2 = 0.45 → decision boundary:
1 + 0.45*x1 + 0.45*x2 = 0
for each of these lines, some points are misclassified. for example, when w1 = -1 and w2 = -1, three out of four points are classified incorrectly.
| w1 | w2 | errors |
|---|---|---|
| -1 | -1 | 3 |
| 1.5 | 0 | 1 |
| 0.45 | 0.45 | 3 |
defining the error function
since different values of w1 and w2 lead to different numbers of errors, we can think of the error as a function of these weights. our goal is to find values of w1 and w2 that minimize the error.
by plotting the error for various values of w1 and w2, we obtain an error surface. this surface helps us visualize how different weight choices impact classification accuracy.
- areas where the error is high are shown in red.
- areas with lower errors are shown in yellow.
- the blue region represents weights that lead to zero classification errors.
conclusion
understanding error surfaces is crucial in machine learning. optimization techniques like gradient descent use this concept to adjust weights and find the best decision boundary. by minimizing error, we improve model accuracy and generalization.