What is weakly supervised learning (bootstrapping)?

Question

I understand the differences between supervised and unsupervised learning:

Supervised Learning is a way of "teaching" the classifier, using labeled data.

Unsupervised Learning lets the classifier "learn by itself", for example, using clustering.

But what is "weakly supervised learning"? How does it classify its examples?

score 30 · Accepted Answer · edited Oct 15 '20 at 21:01

Updated answer

As several comments below mention, the situation is not as simple as I originally wrote in 2013.

The generally accepted view is that

weak supervision - supervision with noisy labels (wikipedia)
semi supervision - only a subset of training data has labels (wikipedia)

There are also classifications that are more along with my original answer, for example, Zhi-Hua Zhou's 2017 A brief introduction to weakly supervised learning considers weak supervision to be an umbrella term for

incomplete supervision - only a subset of training data has labels (same as above)
inexact supervision - called where the training data are given with only coarse-grained labels
inaccurate supervision - where the given labels are not always ground-truth (weak supervision above).

Original answer

In short: In weakly supervised learning, you use a limited amount of labeled data.

How you select this data, and what exactly you do with it depends on the method. In general you use a limited number of data that is easy to get and/or makes a real difference and then learn the rest. I consider bootstrapping to be a method that can be used in weakly supervised learning, but as the comment by Ben below shows, this is not a generally accepted view.

See, for example Chris Bieman's 2007 dissertation for a nice overview, it says the following about bootstrapping/weakly-supervised learning:

Bootstrapping, also called self-training, is a form of learning that is designed to use even less training examples, therefore sometimes called weakly-supervised. Bootstrapping starts with a few training examples, trains a classifier, and uses thought-to-be positive examples as yielded by this classifier for retraining. As the set of training examples grows, the classifier improves, provided that not too many negative examples are misclassified as positive, which could lead to deterioration of performance.

For example, in case of part-of-speech tagging, one usually trains an HMM (or maximum-entropy or whatever) tagger on 10,000's words, each with it's POS. In the case of weakly supervised tagging, you might simply use a very small corpus of 100s words. You get some tagger, you use it to tag a corpus of 1000's words, you train a tagger on that and use it to tag even bigger corpus. Obviously, you have to be smarter than this, but this is a good start. (See this paper for a more advance example of a bootstrapped tagger)

Note: weakly supervised learning can also refer to learning with noisy labels (such labels can but do not need to be the result of bootstrapping)

thanks for your reply. I didn't entirely get the last part; the only difference is that you train your "machine" on a smaller data set? — Cheshie, Sep 22 '13 at 18:16
You train on a small data set, then you apply it on a bigger corpus and you re-train on that bigger corpus. — Jirka, Sep 22 '13 at 18:18
This is an example of bootstrapping, but not really weakly supervised learning (or at least, I've never heard bootstrapping called weakly supervised). Guess it just goes to show that there's little benefit in using these terms if their definitions are not clear. — Ben Allison, Sep 23 '13 at 08:29
I don't think this is right. Weakly supervised learning is when each of your training data points is partially annotated (incomplete groundtruth information), your corpus size is irrelevant. — IssamLaradji, Aug 25 '17 at 19:50
I think Tudor Achim's answer is the correct choice, while this one is not really on the point. Weakly supervised learning certainly is more than training on a limited amount of labeled data. — Michael, Jan 15 '20 at 19:06

score 30 · Answer 2 · answered Apr 24 '16 at 05:48

30

Weak supervision is supervision with noisy labels. For example, bootstrapping, where the bootstrapping procedure may mislabel some examples.
Distant supervision refers to training signals that do not directly label the examples; for example, learning semantic parsers from question-and-answer datasets.
Semi-supervised learning is when you have a dataset that is partially labeled and partially unlabeled.
Full-supervised learning is when you have ground truth labels for each datapoint.

answered Apr 24 '16 at 05:48

Tudor Achim

495
5
8

1

This should be the top answer as it disentangles the different terms. However, I will say that bootstrapping shouldn't be only in weakly supervised because its more a technique that can be used by any of them: distance supervision bootstraps with retraining on the indirectly labeled examples, in semi-supervised where you use "pseudo-labeling" to train with supervision on the unlabeled examples, or in full supervision where you train on the the examples you got wrong – physincubus Dec 16 '18 at 18:04

score 6 · Answer 3 · answered Jun 03 '19 at 14:52

This paper [1] defines 3 typical types of weak supervision:

incomplete supervision, where only a subset of training data is given with labels; (this is the same as semi-supervision, I think)
inexact supervision, where the training data are given with only coarse-grained labels;
and inaccurate supervision, where the given labels are not always ground-truth.

[1] Zhi-Hua Zhou, A brief introduction to weakly supervised learning, National Science Review, Volume 5, Issue 1, January 2018, Pages 44–53, https://doi.org/10.1093/nsr/nwx106

pythiest · Answer 4 · 2017-05-11T21:33:20.457

As described by Jirka, weak supervision entails initial (supervised) training on a small, labeled dataset, prediction on a larger set and (unsupervised) incorporation of the positively identified instances (or their characteristics) into the the model (either through retraining on the enlarged dataset or through direct update of the model). The process of (unsupervised) update is iterated until a certain goal is achieved. Obviously this can easily go wrong if the initial predictor yields to many false positives, but there are certain situations in which the search space can be constrained so that the generalization obtained through weak supervision does not (often) run amok, or user input can be used to (weakly) supervise the learning process. To provide a complementary, highly successful example not in text-mining, PSI-BLAST iteratively refines a protein sequence profile to identify distant homologs. A nice overview of what can go wrong with such an approach in this context can be found in this paper.

What is weakly supervised learning (bootstrapping)?

4 Answers4

Updated answer

Original answer