Predicting with model gives different predictions for same inputs and loading the model throws "doesnt save pytorch_model.bin cannot be opened" error

Question

I am predicting my model like this:

def predict_label(text,username):
    # input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
    model=getmodelfromusername(username)
    input_ids=tokenizer(text, padding=True, truncation=True, max_length=500, return_tensors="pt")
    logits = model(**input_ids)[0]
    probs = torch.nn.functional.softmax(logits, dim=1)
    
    return probs

I am training like this:

def train_and_update_model(model, parseddata, code, username,number):

    optimizer = torch.optim.AdamW(model.parameters(), lr=4e-5)

    lr_scheduler = get_scheduler(
        name="linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=2
    )

    input_ids = torch.tensor([tokenizer.encode(str(parseddata), add_special_tokens=True)])
    labels = torch.tensor([number])

    # del banmodel

    model.train(mode=True)

    for i in range(2):
        outputs = model(input_ids, labels=labels)
        loss = outputs[0]
        loss.backward()
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    with lock:
        model.save_pretrained(username + "/CustomModel")

    with lock:
        model = BertForSequenceClassification.from_pretrained(username + '/CustomModel')
    changemodelwithcode(code, model)

In my application sometimes I'm gonna need to predict a lot of text that is incoming and also at the same time I need to be able to train at any moment. However, I also get this error when I try to do that:

at: model.save_pretrained(username + "/CustomModel")
RuntimeError: File test/CustomModel\pytorch_model.bin cannot be opened.

Any help would be extremely appreciated thanks

Which model is this? What dataset are you predicting on? What's your transformers version? — alvas, Apr 08 '23 at 16:42
Take a look at the code snippet in the answer. Your training code should still work if you load the model and tokenizer using AutoModel. — alvas, Apr 08 '23 at 18:38

score 2 · Answer 1 · answered Apr 08 '23 at 18:35

2

Try to use AutoModel classes and it would help a lot in the model saving/loading, e.g.

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

model.save_pretrained('mybert')


mybert_model = AutoModelForSequenceClassification.from_pretrained('./mybert')
mybert_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

Then for inference you can use pipeline, https://huggingface.co/docs/transformers/main_classes/pipelines e.g.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import pipeline

mybert_model = AutoModelForSequenceClassification.from_pretrained('./mybert')
mybert_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(classifier(['this is a sentence', 'foo bar']))
print(classifier(['this is a sentence', 'foo bar']))
print(classifier(['this is a sentence', 'foo bar']))

[out]:

[{'label': 'LABEL_1', 'score': 0.5042521357536316},
 {'label': 'LABEL_0', 'score': 0.5395587682723999}]

[{'label': 'LABEL_1', 'score': 0.5042521357536316},
 {'label': 'LABEL_0', 'score': 0.5395587682723999}]

[{'label': 'LABEL_1', 'score': 0.5042521357536316},
 {'label': 'LABEL_0', 'score': 0.5395587682723999}]

answered Apr 08 '23 at 18:35

alvas

115,346
109
446
738

So i've change the model to an automodel however, after training the model, I am still getting different scores for the same input – tidekis doritos Apr 08 '23 at 20:16
Did you use pipeline to load the model? Or set the model to eval mode? – alvas Apr 08 '23 at 20:19
yes I used pipeline, how can i set it to eval mode? – tidekis doritos Apr 08 '23 at 20:23
Are you sure it's BERT uncased model? And the data is the same? Could you post your code and the sample of the data you're testing on? – alvas Apr 08 '23 at 20:26
yup im sure it bert. With any text data i put in it, after training it gives different predictions everytime and the scores are not constant. – tidekis doritos Apr 08 '23 at 20:29
Wait a minute, are you training or using the model? – alvas Apr 08 '23 at 20:45
If you're running inference (i.e. using the model), you don't need to train. Just use the pipeline. But if you're training the model in different runs, it's very possible that there's some randomness and determinism is not guaranteed unless you go to all the places thats randomness can exist in the model to standardize i.t – alvas Apr 08 '23 at 20:47

Predicting with model gives different predictions for same inputs and loading the model throws "doesnt save pytorch_model.bin cannot be opened" error

1 Answers1