# Linear Regression

Regression is about learning to predict a set of output (*dependent*) variables as a function of input (*independent*) variables.

Consider the inputs to be of form $<x_i, y_i>$. *Attributes* of $x$ are (non-linear) functions $\phi$ which operate on $x$. The form of the equation that linear regression optimizes is:

Where $\Phi$ is a vector of all attributes, and $W$ of all weights.

Do note that $b$ can be dropped by defining $\widetilde{w}, \widetilde{\Phi}$ with one additional element being $b$ and $1$ respectively.

Linear regression is linear in terms of weights and attributes, and (generally) non-linear in terms of $x$ owing to $\Phi$.

For example, $\phi_1$ could be the date of investment, $\phi_2$ could be value of investment and so on.

There are general classes of basis functions, such as:

- Radial Basis function
- Wavelet function
- Fourier Basis

### Formal Notation

Dataset $\mathcal{D} = <x_1, y_1> \ldots <x_m, y_m>$

Attribute/basis functions $\phi_i$, and the general class of basis $\Phi$ is given as shown below. Do note that we have redefined the value of $\Phi$ now, and we shall be using this definition from here on.

With the above redefinition, the linear equation for a given $W$ becomes $Y = \Phi W$.

**General regression** would be to find $\hat{f}$ such that;

**Parameterized Regression** is a bit more complex as it involves the optimization of weights in the above definition for a given $f(\phi(x), w, b)$ for minimizing error.

The error function determines the type of regression. Some examples are given below. These will be discussed later in the course.

- Least Squares Regression
- Ridge Regression
- Logistic Regression

## Least Square Solution

Formally, the solution is given by:

If the “true” relation between $X$ and $Y$ was linear in nature, then 0 error is attainable. That is, $y = \Phi W$ exists, or **Y belongs to the column space of Phi**. We can just solve linear equations to get the optimal value of $W$.

If $Y$ is not in the column space of $\Phi$, the closed form solution for optimal weights $W^*$ is given by:

Do note that $\Phi^T\Phi$ is invertible iff it has full column rank. That is:

- All columns are linearly independent of each other
- The columns are not data driven

It can be proven that Gradient Descent converges to the same solution as well.