SVM
We’ve discussed the perceptron algorithm already. However, there are a few drawbacks for this method:

Does not find the best separating hyperplane
Soln: Large margin classification

Sigmoidal perceptron can separate only address linearly separable data
We shall tackle the first drawback now. The second drawback will be discussed in the next lecture.
Support Vector Machines
Provide breathing space to perceptron algorithm to have better separating planes.
\[\begin{align} w^T\phi(x)+b \geq +1 & \text{ for } y(x) = +1 \\ w^T\phi(x)+b \leq 1 & \text{ for } y(x) = 1 \end{align}\]For such a case, the margin is given by $2/\vert\vert w\vert\vert$. The margin is the distance between both the displaced planes.
However, points need not be linearly separable, meaning the above equations are not guaranteed to hold. We thus add a new term called slackness represented by $\xi$.
\[\begin{align} w^T\phi(x_i)+b \geq +1\xi_i & \text{ for } y(x_i) = +1 \\ w^T\phi(x_i)+b \leq 1+\xi_i & \text{ for } y(x_i) = 1 \\ & \xi_i \geq 0 \\ \end{align}\]SVM tries to maximize the margin, and minimize $\sum\xi_i$.