what is the license under which sentence-transformers/multi-qa-mpnet-base-dot-v1 can be used? Apache 2.0 or MIT or ? where can we find more information on this.

- 115,346
- 109
- 446
- 738

- 115
- 1
- 9
-
Apparently you're [not the only one who wonders](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1/discussions/2). The short form is: unless you get some clarification, you'll have to assume that it's not licensed to you at all. Reach out to get clarification from the owner(s). – Joachim Sauer Jun 07 '23 at 08:13
1 Answers
Disclaimer: I'm not a lawyer, it's best to ask your organziation's legal department.
TL;DR
Unfortunately, based on the data it's trained on, the sentence-transformers/multi-qa-mpnet-base-dot-v1
pretrained model cannot be used for commercial use and has mixed licenses.
But if you are to reset all model weights (which will make the model useless), then it should technically fall back to the Sentence Transformers' library Apache 2.0 license
In Long
From doc, https://www.sbert.net/docs/pretrained_models.html#multi-qa-models
The following models have been trained on 215M question-answer pairs from various sources and domains, including StackExchange, Yahoo Answers, Google & Bing search queries and many more. These model perform well across many search tasks and domains.
From the data config, https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1/blob/main/data_config.json
[
...,
{
"name": "stackexchange_title_body/stackoverflow.com-Posts.jsonl.gz",
"lines": 18562443,
"weight": 226
},
{
"name": "searchQA_question_top5_snippets_merged.jsonl.gz",
"lines": 582261,
"weight": 263
},
{
"name": "amazon-qa-train-pairs.jsonl.gz",
"lines": 2448839,
"weight": 451
},
{
"name": "gooaq_pairs.jsonl.gz",
"lines": 3012496,
"weight": 451
},
{
"name": "msmarco-query_passage_negative_v2.jsonl.gz",
"lines": 17579773,
"weight": 1000
}
]
The comprehensive table of datasets that the model uses https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-dot-v1#training
Dataset | License | Commercial Use | Notes |
---|---|---|---|
GooAQ | Apache License v2 | ❌ | "This dataset should not be used for any commercial purposes. See the license for the detailed terms." |
Stackoverflow Dump | CC-BY-SA | ? | |
Amazon QA | ? | ? | |
MS Macro | ? | ❌ | "The MS MARCO datasets are intended for non-commercial research purposes" |

- 115,346
- 109
- 446
- 738