Iterative Soft Thresholding Algorithm

This algorithm is used for fitting the model using lasso regression.

While the relative drop in Lasso error across $t=k$ and $t=k+1$ is significant, the following two steps are done.

LS Iterate

$w^{k+1}_{LS} = w^{k}_{Lasso} - \eta\nabla E_{LS}(w^{k}_{Lasso})$
Proximal Step

If absolute value of $\left[w^{k+1}_{LS}\right]_i$ is less than $\lambda\eta$, then the $i^{th}$ element of $w^{k+1}_{lasso}$ is 0.

Else, just reduce its magnitude by $\lambda\eta$ while keeping the sign intact.

Evaluating Performance

Method 1: Training Error

This idea is not good enough. Error obtained via ridge regression will always be larger than the least squared error. Also, going by training error alone lets overfitting slip by.

Method 2: Test Error

Use a different dataset, seperate from the training data to evaluate loss of the model. This is good for finding if overfitting is taking place.

There tend to be three main sources of error:

Bias - Difference between true and fitted data function
Variance - Deviation of fits as sample data changes
Noise

Bias Variance Analysis

Before starting the analysis, we assume the following three points:

Noise is Additive: $y = g(x) + \epsilon$
Noise has mean 0 and variance $\sigma^2$, need not be gaussian
We are performing Linear fit via OLS (Ordinary Least Squares)

Let $g$ be the true function, and $f$ be the model. If $<\hat{x},\hat{y}>$ is data, then the expected value of Least Square error is:

$$\begin{eqnarray} \mathcal{E}[(f-y)^2] &= \mathcal{E}[(f-\bar{f})^2] + (\bar{f} - g)^2 + \mathcal{E}[(y-g)^2] \\ &= \text{Variance}(g) + \text{Bias}(g)^2 + \sigma^2 \end{eqnarray}$$

Here $f$ is $f(\hat{x})$, $g$ is $g(\hat{x})$ and $\bar{f}$ is $\mathcal{E}[f(\hat{x})]$. Also, remember that $\sigma$ is the variance of noise.

We divide the dataset into three categories:

Train: to train the model
Validation: to tune the model’s hyperparameters
Test: for final testing