How can I keep track of the expenses of each program separately when calling the same OpenAI GPT model deployed in an Azure instance?

Question

I have an OpenAI GPT model deployed in an instance belonging to a resource in my Azure subscription. I have two programs that use this OpenAI GPT model. How can I keep track of the expenses of each program separately?

Example: I deployed the OpenAI GPT model "GPT 4 32k" as gpt-4-32k-viet. Program A and program B use this model. How can I keep track of the expenses of incurred by program A and program B separately?

I use the code from the Azure OpenAI tutorial:

import tiktoken
import openai
import os
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
openai.api_base = "https://[resourcename].openai.azure.com/" # Your Azure OpenAI resource's endpoint value .
openai.api_key = "[my instance key]"


system_message = {"role": "system", "content": "You are a helpful assistant."}
max_response_tokens = 250
token_limit= 4096
conversation=[]
conversation.append(system_message)


def num_tokens_from_messages(messages, model="gpt-4-32k"):
    encoding = tiktoken.encoding_for_model(model)
    num_tokens = 0
    for message in messages:
        num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens


user_input = 'Hi there. What is the difference between Facebook and TikTok?'
conversation.append({"role": "user", "content": user_input})
conv_history_tokens = num_tokens_from_messages(conversation)

while (conv_history_tokens + max_response_tokens >= token_limit):
    del conversation[1]
    conv_history_tokens = num_tokens_from_messages(conversation)

response = openai.ChatCompletion.create(
    engine="gpt-4-32k-viet",  # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.
    messages=conversation,
    temperature=.7,
    max_tokens=max_response_tokens,
)

conversation.append({"role": "assistant", "content": response['choices'][0]['message']['content']})
print("\n" + response['choices'][0]['message']['content'] + "\n")

We will start implementing OpenAI in the next weeks or so. I work in FinOps so I will probably hit this problem 1 month form now. I will update you in the future if I find a solution — Francesco Mantovani, May 08 '23 at 09:38
I disagree with the closure: it's a coding question about the Azure OpenAI API. How can I keep track of the expenses separately when I make the API calls? — Franck Dernoncourt, May 12 '23 at 13:58
To me it sounds like a billing question, not a programming question. — Mark Rotteveel, May 14 '23 at 12:17
@FranckDernoncourt, I updated my reply. I don't have an easy solution for you. Enclose each app into a Resource Group and track the Resource Group — Francesco Mantovani, Jul 24 '23 at 12:30

score 1 · Accepted Answer · answered Jul 24 '23 at 11:54

1

You have to enclose them in different Resource Groups.

You can then target that Resource Group and group by Service Name if you want to have a more granular view:

We are having an hard time trying to understand the real costs of GPT and the only way I suggest you is t test like crazy.

If you feed that language model with 500 characters you have a cost.

But if you feed it with 5,000 characters don't expect to have the same cost x10.

Is difficult to forecast so what I suggest you is containerize per Resource Group. This technology is not designed to be multi-tenant, you will lose the costs. If you want to know how much your customer have consumed the only way is to go single-tenant.

Otherwise you have to create an ID per customer and link each token to that ID. And good luck with that.

answered Jul 24 '23 at 11:54

Francesco Mantovani

10,216
13
73
113

Thanks! "You have to enclose them in different Resource Groups." that's indeed what I was doing to some extent, but there used to be a limit of 3 Resource Groups per location per Azure account, while I had >20 programs to track. Looks like the limit is 30 now, so more feasible (but now I have >30 programs and only 1 possible location...) – Franck Dernoncourt Jul 24 '23 at 17:03
@FranckDernoncourt , I know. But this is a new technology and as you can see they already changed the limit from 3 to 30. The locations are 2 as far as I can recall but Microsoft indeed wants to make more $$$ with GPT so expect this service to be extended. – Francesco Mantovani Jul 25 '23 at 08:11
There's no much you can do and even C-level and board members needs to understand that this a new technology. Azure Support doesn't even has a category for GPT.... just to give you an idea how new is this. And good luck find people with experience in GPT. It just came out in November and was massively updated in February. No one is an expert in this field – Francesco Mantovani Jul 25 '23 at 08:14

score 0 · Answer 2 · answered May 25 '23 at 11:26

0

Perfect for anyone who needs to quickly calculate the token amount of ChatGPT in prompts for their project.

https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/

answered May 25 '23 at 11:26

Ramprasad

87
4

OpenAI provides the same solution: https://platform.openai.com/tokenizer – Francesco Mantovani Jul 24 '23 at 11:40

How can I keep track of the expenses of each program separately when calling the same OpenAI GPT model deployed in an Azure instance?

2 Answers2