openai multiclass classification logprobs doesn't return defined classes, instead it returns one class and variations of it

Question

As stated in the title, the multiclass classification doesn't return the correct classes I defined in the training set, instead it returns the first class (predicted class) and other classes are just a variation of it.

example request:

curl https://api.openai.com/v1/completions   -H 'Content-Type: application/json'   -H 'Authorization: KEY'   -d '{
  "model": "curie:model_id",
  "prompt": "test_sample \n\n###\n\n",
  "max_tokens": 1,
  "logprobs": 7
}'

example response:

    "id": "xxx",
    "object": "text_completion",
    "created": 1675633654,
    "model": "curie:modle_id",
    "choices": [{
        "text": " 6",
        "index": 0,
        "logprobs": {
            "tokens": [" 6"],
            "token_logprobs": [-0.000016165199],
            "top_logprobs": [{
                "6": -11.555985,
                " six": -13.56059,
                " 625": -15.326343,
                " 6": -0.000016165199,
                " 7": -12.376487
            }],
            "text_offset": [27]
        },
        "finish_reason": "length"
    }],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 1,
        "total_tokens": 10
    }
}

as we can see from the response the top_logprobs are just variations from the top class

I have a dataset of 1000 samples and 7 classes, which is around 145 samples/class more then the 100 samples/class recommended by the documentation.

I've defined the classes just like the documentation recommends: (ensuring it's one token with a space, etc..) in-fact I tried several implementation of classes all of which returned the same results, one implementation i tested was the convert the classes from one token to numbers which yielded the same result as shown here (https://community.openai.com/t/multiple-labels-in-the-file-for-multi-class-classification-task/3541).

Training samples are defined like this:

df['training_sample'] = df['training_sample'].apply(lambda x: x + '\n\n###\n\n')

the expected behavior is for the classification response to return the most classes with most confidence then the confidences of all the other classes in logprobs

the actual behavior is something like this, the provided is when I changed the labels to numbers, same unwanted behavior

score 1 · Answer 1 · answered Feb 09 '23 at 15:19

Setting temperature=0 is recommended when using a fine-tuned classifier. This will reduce the number of weird classes appearing in logprobs.

However, from my experience, it's not guaranteed that logprobs will always return the classes you trained the model with (especially with multiclass problems and text very different from the training data).

So it's safer to filter class names and apply some kind of threshold on the probabilities.

openai multiclass classification logprobs doesn't return defined classes, instead it returns one class and variations of it

1 Answers1