Search through GPT-3's training data

Question

I'm using GPT-3 for some experiments where I prompt the language model with tests from cognitive science. The tests have the form of short text snippets. Now I'd like to check whether GPT-3 has already encountered these text snippets during training. Hence my question: Is there any way to sift through GPT-3's training text corpora? Can one find out whether a certain string is part of these text corpora?

Thanks for your help!

Brian MacKay · Answer 1 · 2022-12-08T00:37:13.110

0

I don't think that's possible, unfortunately. GPT-3's training corpora is private.

But if that was possible, it would be great for detecting plagiarism. Maybe ask if it it knows where a certain line of text came from?

edited Dec 08 '22 at 00:37

answered Dec 05 '22 at 18:25

Brian MacKay

31,133
17
86
125

Search through GPT-3's training data

1 Answers1