0

I would like to know about the datasets that OpenAI used to train its CLIP (Contrastive Language-Image Pre-Training) framework so I can select the one that resembles the most to my project dataset. I've been searching the info, but I only can find for some of them (the ones used in the original paper).

The backbone models used by CLIP are (up to March, 2023):

  • RN50
  • RN101
  • RN50x4
  • RN50x16
  • RN50x64
  • ViT-B/32
  • ViT-B/16
  • ViT-L/14
  • ViT-L/14@336px

Does anyone know the dataset names with which the model was trained? Or at least, a brief explanation of its characteristics (number of classes, distribution among classes and super-classes <<e.j. Honda, Opel, Fiat => car>>, image properties...). I do not want to download the same dataset nor train or test with it.

Thanks for your help!

Need info about CLIP backbone models

Jason Aller
  • 3,541
  • 28
  • 38
  • 38

0 Answers0