I am working on using a transformer. Pipeline to get BERT embeddings to my input. using this without a pipeline i am able to get constant outputs but not with pipeline since I was not able to pass arguments to it.
How can I pass transformer-related arguments for my Pipeline?
# These are BERT and tokenizer definitions
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
inputs = ['hello world']
# Normally I would do something like this to initialize the tokenizer and get the result with constant output
tokens = tokenizer(inputs,padding='max_length', truncation=True, max_length = 500, return_tensors="pt")
model(**tokens)[0].detach().numpy().shape
# using the pipeline
pipeline("feature-extraction", model=model, tokenizer=tokenizer, device=0)
# or other option
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT",padding='max_length', truncation=True, max_length = 500, return_tensors="pt")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
nlp=pipeline("feature-extraction", model=model, tokenizer=tokenizer, device=0)
# to call the pipeline
nlp("hello world")
I have tried several ways like the options listed above but was not able to get results with constant output size. one can achieve constant output size by setting the tokenizer arguments but have no idea how to give arguments for the pipeline.
any idea?