DC GAN

As we explained in the previous blog about GAN and mathematics behind GAN. This DCGAN is an improved version of GAN. It provided ways to train GANs effectively. Loss functions and algorithm implemented in DCGAN are same as in GAN (Ian Goodfellow). The changes that were brought in GAN training by DCGAN were

Changed fully connected layer with a strided convolutional layer for discriminator and strided deconvolutional layer for the generator and also concatenated with their respective labels(Conditioning with class labels).
Applied batchnorm in all layers of generator and discriminator except the first layer of the discriminator and last layer of the generator because of sample oscillation and model instability.
Applied ReLU activation in all layer of generator leaving output layer which uses tanh.
Applied Leaky ReLU activation in all layers of the discriminator.

The idea behind the labelling of discriminator models is derived from Conditional GAN. In conditional GAN it is said that if both the Generator and Discriminator are conditioned on some extra information such as y and y denote as labels, annotations, tags or data from other modalities. It performs conditioning by feeding the y into generator and discriminator using additional input layer.
Then its min-max game with value function V(G,D) as it is:

min max V (D, G) = E [log D(x|y)] + E [log(1 − D(G(z|y)))].

G D x∼Pdata (x) z∼Pz (z)
Where V is the value-function.

Batch Normalization

As we know that in the neural network, the output of one layer is the input of its next connected layer and this process going on till the last layer where in every layer we apply various activation functions such which transforms the input. Lets take a example , a layer with sigmoid activation function z = f(Wu+b) where u is the layer input, W weight matrix and bias b are the layer parameters and f(x) = 1/(1+exp(−x)). As |x| increase, f'(x) tends to zero. Thus it creates the vanishing gradient problems.

To avoid these problems, Batch- Normalization was used.

It reduces the internal covariate shift. Internal covariate shift is refer to the change in the distributions of internal nodes of the neural network.

Batch Normalization is implemented in two steps.

In first , Normalize each feature into the mean of zero and variance of 1. For a layer of n-dimension,having input x1,x2,x3....xn.

x^(k) = (x(k)-E[x(k)])/√Var[x(k)]

where the mean and variance computed over the training dataset.
This is done because the distribution of the input to each layer is different from that of the previous layer.

Note that simply applying normalization on each input layer may change the representation of layer. Thus in order to make sure that transformation going in to the network can represent the identity transform, we apply affine transformation on the normalized layer with parameters 𝜸(k) and 𝞫(k) for each x(k), y(k)= 𝜸(k)x^(k) + 𝞫(k)
where 𝜸(k) = √Var[x(k)] and 𝞫(k) = E[x(k)]

These parameters are learned with real model parameters and they restore the original representation of the network.

In the second step, we use whole set to normalize activations but this is impractical solution.So, we apply normalization in mini-batches where each batch produces estimate value of the mean and variance for each activation.
Using Batch normalization we get following advantages:

Training is facilitated at higher Learning rates.
It improves accuracy.
It regularizes the model and reduces the need for dropout.
It also prevent the network from getting stuck in saturated modes.
Avoids necessity of dropout.

It is one of the very first papers which emphasized on use of Batch-normalization in training GANs. Generator collapse happens when it receives in sufficient or zero gradients through discriminator. To tackle this batch-normalization was used. That is why we explicitly explained the concept of batch-normalization in this blog.

For code please refer this link.

Search This Blog

GAN-experiments

DC GAN

Comments

Post a Comment

Popular posts from this blog

GAN EXPERIMENTS