Neural Networks
We’ll be tackling non-linear classification now. We’ve already discussed the usage of kernels for such classification. However, domain knowledge is required for selecting a proper kernel for the given situation, which limits the applications.
Neural networks, on the other hand, act as universal function approximators which do not require as much domain knowledge.
In general, a series of mappings are used to obtain the desired results.
xf→yg→zh→{c1,…ck}
Activation FunctionsPermalink
f(x)=g(wTx)Here, g is called the Activation function. Some examples of activation functions are:
- Sigmoid
- Tanh
- Linear
- ReLU: max(0,s)
- Softplus: log(1+es), is the “differentiable version” of ReLU
The problem with sigmoid and tanh is that, their values are limited. ReLU and Softplus do not have this problem.
VC DimensionsPermalink
The cardinality of the largest set of points that f(w) can shatter is its VC Dimension. A function is said to shatter a given set of points if, for all assignments of labels to those points there exists a w such that fw perfectly evaluates the set.
The VC Dimensions of a linear separator in R2 is 3.
The VC dimensions of a threshold classifier in R is 1.
Designing Neural NetworksPermalink
Neural networks have great expressive power owing to:
- Non linearity of the activation functions
- Cascaded non-linear activation functions
There are four main design choices that are to be considered while coding up a neural network. These are:
- Input Layer
- Number of Hidden layers and number of nodes per hidden layer
- Output layer
- Loss function
Do note that the activations at the hidden layers are not visible, even during training.