1

I use the following code to get the most likely replacements for a masked word:

!pip install git+https://github.com/huggingface/transformers.git
import torch
import pandas as pd
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline

unmasker = pipeline('fill-mask', model='bert-base-uncased', top_k=100)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForMaskedLM.from_pretrained('bert-base-uncased')

results = unmasker(f"The sun is [MASK].")
for i in results:
  print(i["token_str"], i["score"]*100)

For example, the most likely replacement for "[MASK]" in the sequence "The sun is [MASK]." is "rising" (33.61%), "shining" (9.33%), and "up" (7.38%).

My question: is there a way to achieve the same with GPT-3? There is a "complete" and "insert" preset in the OpenAI playground, however, it gives me full sentences (instead of single words) and no probabilities. Can someone help?

diggi2395
  • 185
  • 8

1 Answers1

1

First of all, I don't think you can access properties like token or scores in GPT-3, all you have is the generated text.

Second of all, in my experience GPT-3 is ALL about the correct prompt. You just have to give it instructions like you were talking to a human being.

In you specific case, I would use a prompt like this:

Prompt:

The sun is [MASK].

Replace [MASK] with the most probable 5 words to replace, and give me their probabilities.

Result:

The sun is shining.

  1. shining - 0.47
  2. bright - 0.18
  3. sunny - 0.13
  4. hot - 0.10
  5. beautiful - 0.09

If you want to do that programmatically, here's the code:

import openai
openai.organization = "your org key, if you have one"
openai.api_key = "you api key"
openai.Engine.list()

my_prompt = '''The sun is [MASK].
    
    Replace [MASK] with the most probable 5 words to replace, and give me their probabilities.'''

# Here set parameters as you like
response = openai.Completion.create(
  engine="text-davinci-002",
  prompt=my_prompt,
  temperature=0,
  max_tokens=500,
  # top_p=1,
  # frequency_penalty=0.0,
  # presence_penalty=0.0,
  # stop=["\n"]
)

print(response['choices'][0]['text'])
SilentCloud
  • 1,677
  • 3
  • 9
  • 28
  • 1
    Thank you very much for that detailed response! I assumed that there is no direct access to tokens/scores but I wanted to make sure if I miss anything. Your solution gives the exact output that I wanted, thanks! – diggi2395 Aug 16 '22 at 09:16