for classification, we usually use [CLS] to predict labels. but now i have another request to do avg-pooling on the output of each sentence in bert model. it seems a little bit hard for me? sentence is split by [SEP] but lengh of each sentence in each sample of a batch is not equal, so tf.split is not fit for this problem?
an example as follows(batch_size=2), how to get the avg-pooling of each sentences?
[CLS] w1 w2 w3 [sep] w4 w5 [sep]
[CLS] x1 x2 [sep] x3 w4 x5 [sep]