SVM

We’ve discussed the perceptron algorithm already. However, there are a few drawbacks for this method:

Does not find the best separating hyperplane

Soln: Large margin classification
Sigmoidal perceptron can separate only address linearly separable data

We shall tackle the first drawback now. The second drawback will be discussed in the next lecture.

Support Vector Machines

Provide breathing space to perceptron algorithm to have better separating planes.

\[\begin{align} w^T\phi(x)+b \geq +1 & \text{ for } y(x) = +1 \\ w^T\phi(x)+b \leq -1 & \text{ for } y(x) = -1 \end{align}\]

For such a case, the margin is given by $2/\vert\vert w\vert\vert$. The margin is the distance between both the displaced planes.

However, points need not be linearly separable, meaning the above equations are not guaranteed to hold. We thus add a new term called slackness represented by $\xi$.

\[\begin{align} w^T\phi(x_i)+b \geq +1-\xi_i & \text{ for } y(x_i) = +1 \\ w^T\phi(x_i)+b \leq -1+\xi_i & \text{ for } y(x_i) = -1 \\ & \xi_i \geq 0 \\ \end{align}\]

SVM tries to maximize the margin, and minimize $\sum\xi_i$.