Iterative Soft Thresholding Algorithm
This algorithm is used for fitting the model using lasso regression.
While the relative drop in Lasso error across $t=k$ and $t=k+1$ is significant, the following two steps are done.
-
LS Iterate
$w^{k+1}_{LS} = w^{k}_{Lasso} - \eta\nabla E_{LS}(w^{k}_{Lasso})$
-
Proximal Step
If absolute value of $\left[w^{k+1}_{LS}\right]_i$ is less than $\lambda\eta$, then the $i^{th}$ element of $w^{k+1}_{lasso}$ is 0.
Else, just reduce its magnitude by $\lambda\eta$ while keeping the sign intact.
Evaluating Performance
Method 1: Training Error
This idea is not good enough. Error obtained via ridge regression will always be larger than the least squared error. Also, going by training error alone lets overfitting slip by.
Method 2: Test Error
Use a different dataset, seperate from the training data to evaluate loss of the model. This is good for finding if overfitting is taking place.
There tend to be three main sources of error:
- Bias - Difference between true and fitted data function
- Variance - Deviation of fits as sample data changes
- Noise
Bias Variance Analysis
Before starting the analysis, we assume the following three points:
- Noise is Additive: $y = g(x) + \epsilon$
- Noise has mean 0 and variance $\sigma^2$, need not be gaussian
- We are performing Linear fit via OLS (Ordinary Least Squares)
Let $g$ be the true function, and $f$ be the model. If $<\hat{x},\hat{y}>$ is data, then the expected value of Least Square error is:
Here $f$ is $f(\hat{x})$, $g$ is $g(\hat{x})$ and $\bar{f}$ is $\mathcal{E}[f(\hat{x})]$. Also, remember that $\sigma$ is the variance of noise.
We divide the dataset into three categories:
- Train: to train the model
- Validation: to tune the model’s hyperparameters
- Test: for final testing