1

I'm using GPT-3 for some experiments where I prompt the language model with tests from cognitive science. The tests have the form of short text snippets. Now I'd like to check whether GPT-3 has already encountered these text snippets during training. Hence my question: Is there any way to sift through GPT-3's training text corpora? Can one find out whether a certain string is part of these text corpora?

Thanks for your help!

Frigoooo
  • 51
  • 4

1 Answers1

0

I don't think that's possible, unfortunately. GPT-3's training corpora is private.

But if that was possible, it would be great for detecting plagiarism. Maybe ask if it it knows where a certain line of text came from?

Brian MacKay
  • 31,133
  • 17
  • 86
  • 125