boosting algorithm - classifiers yields the correct label

Question

I am a computer science student; I am studying the Algorithms course independently.

During the course, I saw this question:

Suppose we have a set X = {x1, . . . , xn} of elements, each with a label L(i) ∈ {0, 1} (think of x(i) as a picture, and the label indicates whether it is a cat or not). We also have a set of classifiers H, and an algorithm A that given any distribution D on X, outputs h ∈ H such that

Pr(i∼D)[h(x(i)) = L(i)] ≥ 0.51

Show an algorithm that produces a set of T = O(log n) classifiers h (1), . . . , h(T) ∈ H, such that the majority vote among these T classifiers yields the correct label for all 1 ≤ i ≤ n.

photo of the question

From what I can understand this is a question related to boosting. But it is not clear to me how to show an algorithm for this question.

I found an algorithm, but I do not know if it fits the problem:

Algorithm 1 Boost(D, A)
Let T ← 4 log n/ E^2 for E < 0.01.
Initialize a copy of polynomial weights to run over w^t ∈ ∆n.
for t = 1 to T do
  Let h^t = A(D, w^t)
  Let L^t ∈ [0, 1]^m be such that L^t_i = 1[h^t(xi) = yi].
  Pass L^t to the PW algorithm.
end for
Let pˆ =1/T(SIGMA^T_t=1 e_h^t )
Return fpˆ(x).

link to algorithm, page 16

to be perfectly honest, did not understand how to solve the question.

score 1 · Accepted Answer · answered Apr 07 '22 at 13:41

1

We can view this as a two-player zero-sum game. Carol (the “classifier” player) chooses a classifier, and Dave (the “data” player) chooses a labeled element. Carol wins if the classifier is correct on that element, and Dave wins if it’s incorrect.

Algorithm A implies that Carol can win this game at least 51% of the time. We can run algorithm ComputeEQ with ε = 0.005 (0.5%) to find a strategy for Carol where she chooses uniformly at random from O(log n) classifiers and wins at least 50.5% of the time regardless of Dave’s strategy. This implies that the majority vote is correct on all n elements.

(This is really a question for https://cs.stackexchange.com.)

answered Apr 07 '22 at 13:41

David Eisenstat

64,237
7
60
120

thanks for reply, what Algorithm A is doing? i did not understand why this Algorithm right for at least 51%. in the question with have log(n) classifiers, why those classifiers needs to be right for at least 51%? it seems for me that they need to be right 50% of the time, because they "simple" classifiers. – hch Apr 08 '22 at 06:47
1

@dust Algorithm A is given by the question. It implies that, for every mixed strategy that Dave could play, there exists an opposing pure strategy for Carol that wins 51% of the time. The duality theorem for two-player zero-sum games then implies that there exists a mixed strategy for Carol such that no mixed strategy of Dave's reduces Carol's win rate below 51%. As for your second question, any fixed epsilon in (0, 0.01) will work. 0.005 is basically arbitrary. – David Eisenstat Apr 08 '22 at 13:27

boosting algorithm - classifiers yields the correct label

1 Answers1