5

I am using a VM of GCP(e2-highmem-4 (Efficient Instance, 4 vCPUs, 32 GB RAM)) to load the model and use it. Here is the code I have written-

import torch
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import transformers
config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True,
)
# config.attn_config['attn_impl'] = 'flash'

model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True,
  cache_dir="./cache"
)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b", cache_dir="./cache")
text_gen = pipeline("text-generation", model=model, tokenizer=tokenizer)
text_gen(text_inputs="what is 2+2?")

Now the code is taking way too long to generate the text. Am I doing something wrong? or is there any way to make things faster? Also, when creating the pipeline, I am getting the following warning-\

The model 'MPTForCausalLM' is not supported for text-generation

I tried generating text by using it but it was stuck for a long time.

DD111
  • 51
  • 1
  • 2

1 Answers1

1

You might want to try a GPU instances, because trying to use bigger LLMS like this with CPUs is pretty much a lost cause now.

Anyhow, I also got that "The model 'MPTForCausalLM' is not supported for text-generation" Which is why I ended up in this thread. Text Generation did work for me despite the warning.

Thusitha
  • 145
  • 8
  • I am having the same issue as OP (both the warning message and the apparent hanging during text generation) and I'm putting the model on a pair of A100s. So lack of GPU instance does not seem to be the problem. – Ceph Jun 19 '23 at 17:38