Let's say I want to train BERT with 2 sentences (query-answer) pair against a certain binary label (1,0) for the correctness of the answer, will BERT let me use 512 words/tokens each for the query and the answer or together(query+answer combined) they should be 512? [510 upon ignoring the [start] and [sep] token]
Thanks in advance!