Questions tagged [sentence-transformers]

Sentence Transformers is a python framework for state of the art sentence, text and image embeddings. These embeddings are used to find sentences which have similar meaning.

Sentence Transformers is a python framework for state of the art sentence, text and image embeddings. These embeddings are used to find sentences which have similar meaning. Sentence Transformers supports creation of word and text embeddings for more than 100 languages.

You can install it using pip:

pip install -U sentence-transformers

Pre-trained models can be found here : https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models

Sentence Transformers Paper: https://arxiv.org/abs/1908.10084

171 questions
13
votes
2 answers

Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-mean-tokens. I have an application that will be…
9
votes
3 answers

ModuleNotFoundError: No module named 'setuptools.command.build'

I am trying to pip install sentence transformers. I am working on a Macbook pro with an M1 chip. I am using the following command: pip3 install -U sentence-transformers When I run this, I get this error/output and I do not know how to fix…
Joey O'Neill
  • 124
  • 1
  • 6
5
votes
0 answers

Fine-tuning BERT sentence transformer model

I am using a pre-trained BERT sentence transformer model, as described here https://www.sbert.net/docs/training/overview.html , to get embeddings for sentences. I want to fine-tune these pre-trained embeddings, and I am following the instructions in…
Fiori
  • 181
  • 1
  • 12
5
votes
2 answers

Same sentences produces a different vector in XLNet

I have computed the vectors for two same sentences using XLNet embedding-as-service. But the model produces different vector embeddings for both the two same sentences hence the cosine similarity is not 1 and the Euclidean distances also not 0. in…
4
votes
1 answer

Restrict Vocab for BERT Encoder-Decoder Text Generation

Is there any way to restrict the vocabulary of the decoder in a Huggingface BERT encoder-decoder model? I'd like to force the decoder to choose from a small vocabulary when generating text rather than BERT's entire ~30k vocabulary.
4
votes
1 answer

Token indices sequence length Issue

I am running a sentence transformer model and trying to truncate my tokens, but it doesn't appear to be working. My code is from transformers import AutoModel, AutoTokenizer model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2" model =…
GSA
  • 751
  • 8
  • 12
3
votes
1 answer

How to improve/preprocess text (in special cases) so the embeddings and LLM will have better context?

I have been working on setting up local documents to be ingested into vectordb and then to be used (embeddings) as context for the LLM. Problem is, local documents are very much high level (check below more details). After it's chunked with…
3
votes
0 answers

What should I do to load all-mpnet-base-v2 model from sentence-transformers?

I am trying to run all-mpnet-base-v2 model with sentence-transformers 1.2.1 from my Anaconda framework (in Python 3.8). My first prompt for the model is like this: bert_model = SentenceTransformer('all-mpnet-base-v2') However, when I run this…
3
votes
1 answer

sentence transformer use of evaluator

I came across this script which is second link on this page and this explanation I am using all-mpnet-base-v2 (link) and I am using my custom data I am having hard time understanding use of evaluator =…
user2543622
  • 5,760
  • 25
  • 91
  • 159
3
votes
1 answer

How to create a dataset object with for multiple input of texts to the SetFit model?

The Setfit library accept two inputs : "text" and "label", https://huggingface.co/blog/setfit My goals is to train Setfit using two similarity input with binary label (similar or not similar). ("text1","text2","similiar/not") The example of dataset…
3
votes
4 answers

How to save a SetFit trainer locally after training

I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to disk? https://github.com/huggingface/setfit
3
votes
1 answer

Sentence Transformers in Python: "[E1002] Span index out of range"

As a programming noob, I am trying to find similar sentences in several hundreds of newspaper articles. I have tried my code with a smaller text sample which has worked brilliantly. Now, with a larger text file (using the same code), I get the error…
3
votes
1 answer

The last dimension of the inputs to a Dense layer should be defined. Found None. Full input shape received:

I am having trouble when switching a model from some local dummy data to using a TF dataset. Sorry for the long model code, I have tried to shorten it as much as possible. The following works fine: import tensorflow as tf import…
3
votes
1 answer

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd when importing sentence-transformers

I want to use sentence-transformers. To do this, I installed sentence-transformers as follows: pip install sentence-transformers Then, I did my import as follows: from sentence_transformers import SentenceTransformer Which resulted in the following…
Emil
  • 1,531
  • 3
  • 22
  • 47
3
votes
1 answer

how to use sentence bert with transformers and torch

I would like to use sentence_transformers But due to policy restrictions I cannot install the package sentence-transformers I have transformers and torch package though. I went to this page and tried to run the below code Before doing that I went to…
1
2 3
11 12