Dropout From Scratch

Jan 09, 2022

Why Do We Need Dropout?

‣ Dropout is used while dealing with overfitting.

‣ Overfitting is the state in which the model learns the noise of training data and performs well on it

‣ Well in overfitting the model only performs best on training and bad on validation/test data

Defining Dropout

The selected neurons from a layer are switched off while training so that model can skip the noise and can save us from overfitting this whole process during time is called Dropout.

Process of Selecting Neurons

As we define the model architecture we have to give a rate of selecting neurons for a particular layer.

Here is a sample with rate=0.5

x=Dropout(0.5)(x)

Edge Cases while Implementing From scratch

Here we will be giving different rates to see the actual working of dropout

Case-1, rate = 0

If the rate is zero no neuron will get switched off
The layer with all neurons will be returning as it is.

if rate==0: return tensor

Case-2 rate = 1

If the rate is one all neurons will get switched off
The layer with all neurons will be returning zeros

if rate==1: tf.zeros_like(tensor)

Case-3, 0<rate<1

If the rate is between 0 and 1 then that fraction of neurons will be switched off.
For example, if the rate is 0.5 and on 10 neurons then 5 neurons will get switched off over time.

mask = np.random.uniform(0, 1, tensor.shape) > rate

return mask.astype(np.float32) * tensor / (1.0 - rate)

Here is the complete implementation

In Section 4.5, we introduced the classical approach to regularizing statistical models by penalizing the \(L_2\) norm of the weights. In probabilistic terms, we could justify this technique by arguing that we have assumed a prior belief that weights take values from a Gaussian distribution with mean zero. More intuitively, we might argue that we encouraged the model to spread out its weights among many features rather than depending too much on a small number of potentially spurious associations.

Thread For Implementing Dropout and Validating on Test Time

Tushar Raj Verma @TrajVoid

Overfitting is a problem! We have pre-trained networks that have many parameters that our data sometimes looks low sampled. Augmentations and collecting more data are also great techniques but they are data-centric approaches. Let us see something else today Dropout ↓ https://t.co/rwki1VTrfB

Unfreeze.ml

Discussion about this post