0

I'm trying to build a bert model containing document as input. As bert's limitation is 512 tokens, it's unable to give accurate answer. Now, I'm trying to find NLP model/way/algorithm which should help bert model to find the correct answer.

I tried with document as input and was expecting accurate answer as it was giving with small passages.

Amrutha k
  • 39
  • 1
  • 4
  • This is what is the code but works well with small passages and not with like 10+passages https://colab.research.google.com/drive/1Izqs5c2-3KqYPki1wOiWCPBpcC8T4kJl#scrollTo=lQJgO79NS2t8 – Amrutha k Apr 16 '23 at 14:01

1 Answers1

0

In the Extractive Question Answering task, in order to extract answer from the context (in this case, your input document), it is usually solved by BERT-like model.

According to your case, there has a limitation that you have a long document to extract an answer out. however, the model you used is bert-large which cannot handle that long document since its maximum tokens limited to be just only 512 tokens. That's why it cannot get an accurate answer because bert-large can take a look just only 512 tokens only.

To obtain higher accuracy, my recommendation is to use the BERT-like model that can process longer sequence in order to handle with long document as an input. You may consider:

  1. Longformer, a BERT-like model for long documents, which can be used in the Extractive QA task. I have a quick look at the QA task on huggingface and pick the most downloaded Longformer model LONGFORMER-BASE-4096 fine-tuned on SQuAD v1 which can handle up to 4096 tokens (~8x compared with original BERT). Maybe you should try it first. if you interested about how it works, you may take a look at the paper here Longformer: The Long-Document Transformer to get an idea of how attention mechanism of the model works.

  2. I suggest you read more about TransformerXL paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, one of the initial models in the task of processing long sequence, it is one of the few models that has no sequence length limit. It is much better to study how they designed an attention mechanism in order to handle such a long sequence.

Hope this help!

xty
  • 91
  • 6