33

I hear from some sources that Generative adversarial networks are unsupervised ML, but i dont get it. Are Generative adversarial networks not in fact supervised?

1) 2-class case Real-against-Fake

Indeed one has to supply training data to the discriminator and this has to be "real" data, meaning data which i would label with f.e. 1. Even though one doesnt label the data explicit, one does so implicitly by presenting the discriminator in the first steps with training data, which you tell the discriminator is authentic. In that way you somehow tell the discriminator a labeling of the training data. And on the contrary a labeling of the noise data that is generated at the first steps of the generator, which the generator knows to be unauthentic.

2) Multi-class case

But it gets really strange in the multi class case. One has to supply descriptions in the training data. The obvious contradiction is that one supplies a response to an unsupervised ML algorithm.

scrimau
  • 1,325
  • 1
  • 14
  • 27
  • The input to the GAN is unlabeled real data. The algorithm introduces "fake" data that it distinguishes internally from the real, but no human labeling ("supervision") is required. In that way it serves the same role as other unsupervised methods for which no human labeling is needed / which can be applied to as large an unlabeled dataset as you can gather. – Alex Lew Jun 08 '17 at 21:24
  • A human beeing still has to select the training data and therefore implicitly label it. Can't produce cat images with training data which only shows monuments. Therefore the need for supervision in the preparation, when someone wants to produce a specific kind of data. – scrimau Jun 08 '17 at 22:16
  • See https://ai.stackexchange.com/a/22815/2444. – nbro Aug 02 '20 at 00:18

2 Answers2

54

GANs are unsupervised learning algorithms that use a supervised loss as part of the training. The later appears to be where you are getting hung-up.

When we talk about supervised learning, we are usually talking about learning to predict a label associated with the data. The goal is for the model to generalize to new data.

In the GAN case, you don't have either of these components. The data comes in with no labels, and we are not trying to generalize any kind of prediction to new data. The goal is for the GAN to model what the data looks like (i.e., density estimation), and be able to generate new examples of what it has learned.

The GAN sets up a supervised learning problem in order to do unsupervised learning, generates fake / random looking data, and tries to determine if a sample is generated fake data or real data. This is a supervised component, yes. But it is not the goal of the GAN, and the labels are trivial.

The idea of using a supervised component for an unsupervised task is not particularly new. Random Forests have done this for a long time for outlier detection (also trained on random data vs real data), and the One-Class SVM for outlier detection is technically trained in supervised fashion with the original data being the real class and a single point at the origin of the space (i.e., the zero vector) treated as the outlier class.

Raff.Edward
  • 6,404
  • 24
  • 34
  • 5
    Therefore, GANs have both a supervised step and an unsupervised one. Saying that GANs are unsupervised is wrong. – nbro Aug 27 '18 at 16:03
  • 2
    As I elaborated in my answer, GANs do have a supervised component. However, the data comes in unlabeled and uncategorized. This is the crux of what makes it an unsupervised algorithm, no labeling is needed or provided to the GAN algorithm. If we called any algo that had a component normally used in supervised algorithms, a supervised algorithm, then there would be almost no "unsupervised" algorithms by that definition. Its a function of what information needs to come with the data (i.e., supervision in the form of labels), not how the mechanics operate. – Raff.Edward Aug 28 '18 at 17:40
  • 1
    "However, the data comes in unlabeled and uncategorized.", to be precise, you should say "the data comes in unlabeled and uncategorized to the **trained** GAN", because an untrained GAN still needs a labeled dataset. – nbro Aug 28 '18 at 17:50
  • 1
    An untrained GAN does not need a labeled dataset. That is not correct. The training occurs by generating a classification problem between the two networks (hence the "Adversarial" in GAN). The labels of this problem are not provided with the data, but a trivial consequence of the training process. Images "Generated" by one part of the network have the trivial label of fake, and the training data the trivial label of real. These are not provided with the data. No person at any point needs to label the data before training. Hence the process as a whole is unsupervised. – Raff.Edward Aug 28 '18 at 18:04
  • 1
    "but a trivial consequence of the training process". You need labeled data to train one of these networks. So, GANs are supervised. I really don't understand why you persist in confusingly sharing this false information. It's like you want to protect them (in vain, though). – nbro Aug 28 '18 at 19:19
  • 3
    Because its well accepted that GANs are unsupervised. The original paper directly implies it by explaining how a GAN could be used for semi-supervised learning by using the GAN to learn the feature representation on unlabeled data. If we used your logic, auto-encoders & PCA would also be supervised because they use a supervised loss, even though its only to predict the input. I've explained multiple different ways how supervised vs unsupervised is a function of the labels coming with the data, not a function of the mechanism of learning. I'll be stoping this conversation at this point. – Raff.Edward Aug 29 '18 at 13:45
  • 1
    Hahaha.. you are a patient one Raff.Edward! From what I understand GANs are unsupervised & I am using it for unsupervised classification. – sand Apr 04 '19 at 13:12
  • @Raff.Edward would you please elaborate more or share the information regarding how random forest and SVM use supervised component in unsupervised task for outlier detection? – Edamame Dec 04 '19 at 06:13
  • Just got this ping and apparently missed an old Q by @Edamame. A one-class SVM (unsupervised) is exactly equivalent to training a two-class SVM, where one "class" is a single point at the origin (i.e., =0). Random Forests can do outlier detection by generating random data as the "other" class, and then training in a supervised fashion "other" vs real data. Almost all unsupervised methods use a "supervised" loss in some interpretation. Its where the labels come from that determines if it is actually a supervised problem. – Raff.Edward Apr 09 '21 at 17:41
-13

neither. roughly, the hierarchy looks like the following:

               machine learning methodology
                             +
                             |
                             |
                             v
    +-----------------------------------------------+
    |                        |                      |
    |                        |                      |
    v                        v                      v
supervised              unsupervised           reinforcement
象嘉道
  • 3,657
  • 5
  • 33
  • 49