Least Square GAN
Introduction
In this blog, we are going to express our views on Least Squares GAN. In the earlier GANs, the discriminator was treated like a classifier(other than Wasserstein GANs) which classifies the image to either real or fake(the one generated by the generator). There’s a decision boundary which divides the plane containing the two distributions into 2 parts namely real region(the red portion in the figure ) and imaginary region(blue region in the region).
For a successful GAN training this decision boundary must pass through the real distribution because when we train a GAN successfully, the discriminator’s prediction is still in the range (0,1), i.e. between 0 and 1. This means even an optimal discriminator cannot say with 100% assurance that a sample is real with probability 0 and fake with probability 1 or vice versa. So there has to be a certain level of ambiguity throughout the training for a successful GAN learning.So this implies that the decision boundary must pass through the real data distribution with some real points in the region corresponding to fake distribution(blue area). This is precisely shown in the image below.
Problem with treating discriminator as a classifier is that this doesn’t tell the generator how bad these images are since the samples are in the real region(red region). That is why the image quality isn’t so good. The Least Squares GAN emphasizes on this, and the least squares loss function penalises samples that lie in a long way on the correct side of the decision boundary. So it tries to bring the generated samples towards the decision boundary. There’s also another advantage. When we use cross-entropy loss function the generator gets almost zero gradients because the discriminator classifies it correctly. But in this loss function, the generator will get high gradients because though the samples are in the real region since they are far away from decision boundary they get a higher value of gradients. So there’s continuous training in Least Loss GAN.
Loss function
The loss function depicting this is:-
min V (D) = E (D(x) − a)2 + E (D(G(z)) + b) ^2
D x∼Pdata (x) z∼Pz (z)
min V (G) = E (z) (D(G(z))− c)^ 2 .
G z∼Pz
Loss function
The loss function depicting this is:-
min V (D) = E (D(x) − a)2 + E (D(G(z)) + b) ^2
D x∼Pdata (x) z∼Pz (z)
min V (G) = E (z) (D(G(z))− c)^ 2 .
G z∼Pz
Advantages and improvements of regular GAN
1. penalising the points in the real distribution but far away from decision boundary thus possibility for better image quality
2. cleared the vanishing gradients problems.
3. It can also be viewed as a regression problem trying to bring the generated samples towards the regression line.
Thanks to Mao Xudong and Augustinus Kristiadi who were very helpful in clarifying our doubts.
The code for this variant can be found at this link.
3. It can also be viewed as a regression problem trying to bring the generated samples towards the regression line.
Thanks to Mao Xudong and Augustinus Kristiadi who were very helpful in clarifying our doubts.
The code for this variant can be found at this link.






Comments
Post a Comment