Overfitting in Neural Networks

One can represent any smooth function to any desired accuracy using a single hidden layer of sufficient size.

We can counter this using already learnt methods such as L1 and L2 regularization. However for neural networks, we can avoid overfitting using smart network design. More specifically, we shall look at usage of Convolutional Neural Networks for image related tasks.

Convolutional Neural Networks

filters, stride, padding

We would like the learn the weights for each of the filters. Convolution improves upon fully connected layers in the following ways:

sparse interactions
parameter sharing
equivariant representations, with $f(g(x)) = g(f(x))$ where $f$ is convolution and $g$ is shift function

Number of parameters

Let the given input given by $M\times N\times D$, upon which convolution is performed using a patch of size $P\times Q$ with stride $S$ and number of channels $C$, and padding $D$.

Number of parameters $(P\cdot Q\cdot D)\times C$
Number of Neurons $\left(\frac{M-P+2D}{S}\right)\left(\frac{N-Q+2D}{S}\right)(C)$