# Analysis of Perceptron Algorithm

The following claim can be proven mathematically.

**Claim.** If $\exists w^*$ such that the given data is linearly seperable, then the perceptron algorithm will converge to a value $\hat{w}$ which classifies the entire data correctly.

*Proof todo here*

## Stochastic Gradient Descent

In normal gradient descent, we compute the value of $\nabla \mathcal{E}(\Phi W, Y)$ and update the value of the weight vector accordingly.

In Stochastic gradient descent, we iterate over the entire data and update the weight vector at the $i^{th}$ iteration according to $\nabla \mathcal{E}(W^\text{T}\phi, y_i)$. It can be seen that the perceptron update rule follows stochastic gradient descent with the **Hinge Loss Function**.

### Deciding final weight vector

Using the finally obtained weight vector is not always a good idea, because the process is iterative in nature. Usually, one of these two methods is employed:

**Voted Perceptron:**Take the vector which classified most of the weight vectors correctly**Averaged Perceptron:**Calculate the weighted average of the weight vectors