Questions tagged [transformer-model]

This tag refers to the Transformer model, especially used for Natural Language Understanding and Processing and made popular by the paper Attention Is All You Need.

1093 questions
29
votes
3 answers

How to understand masked multi-head attention in transformer

I'm currently studying code of transformer, but I can not understand the masked multi-head of decoder. The paper said that it is to prevent you from seeing the generating word, but I can not unserstand if the words after generating word have not…
28
votes
7 answers

How to download model from huggingface?

https://huggingface.co/models For example, I want to download 'bert-base-uncased', but cann't find a 'Download' link. Please help. Or is it not downloadable?
marlon
  • 6,029
  • 8
  • 42
  • 76
25
votes
3 answers

Difference between src_mask and src_key_padding_mask

I am having a difficult time in understanding transformers. Everything is getting clear bit by bit but one thing that makes my head scratch is what is the difference between src_mask and src_key_padding_mask which is passed as an argument in forward…
Leo
  • 480
  • 1
  • 6
  • 9
25
votes
5 answers

How to create a StreamTransformer in Dart?

Trying to build a custom StreamTransformer class, however a lot of the examples out there seem to be out of date, and the one found in the documentation isn't (what some typed languages might consider anyway) as a class (found here:…
Will Squire
  • 6,127
  • 7
  • 45
  • 57
20
votes
2 answers

what the difference between att_mask and key_padding_mask in MultiHeadAttnetion

What the difference between att_mask and key_padding_mask in MultiHeadAttnetion of pytorch: key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. When given a binary mask and a value is True, the…
one
  • 2,205
  • 1
  • 15
  • 37
19
votes
4 answers

AttributeError when using ColumnTransformer into a pipeline

This is my first machine learning project and the first time that I use ColumnTransformer. My aim is to perform two steps of data preprocessing, and use ColumnTransformer for each of them. In the first step, I want to replace the missing values in…
Giulia
  • 205
  • 1
  • 2
  • 5
15
votes
1 answer

Implementation of the Dense Synthesizer

I’m trying to understand the Synthesizer paper (https://arxiv.org/pdf/2005.00743.pdf 1) and there’s a description of the dense synthesizer mechanism that should replace the traditional attention model as described in the Transformer…
alvas
  • 115,346
  • 109
  • 446
  • 738
14
votes
2 answers

gcc ON arm/android

I just got a EEE pad transformer. Like any hardware I own I'd like to have a C compiler on it. I know I can cross compile, but I'd like to do development ON the device itself. I've searched google and all I can seem to find are pages on how to…
Tim
  • 279
  • 2
  • 3
  • 10
14
votes
3 answers

Setting namespaces and prefixes in a Java DOM document

I'm trying to convert a ResultSet to an XML file. I've first used this example for the serialization. import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.Document; import org.w3c.dom.ls.DOMImplementationLS; import …
TrashCan
  • 817
  • 4
  • 13
  • 20
13
votes
1 answer

How to get immediate next word probability using GPT2 model?

I was trying the hugging face gpt2 model. I have seen the run_generation.py script, which generates a sequence of tokens given a prompt. I am aware that we can use GPT2 for NLG. In my use case, I wish to determine the probability distribution for…
Gaurang Tandon
  • 6,504
  • 11
  • 47
  • 84
12
votes
2 answers

Why embed dimemsion must be divisible by num of heads in MultiheadAttention?

I am learning the Transformer. Here is the pytorch document for MultiheadAttention. In their implementation, I saw there is a constraint: assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads" Why require…
jason
  • 1,998
  • 3
  • 22
  • 42
12
votes
3 answers

InvalidArgumentError: Mismatch between the current graph and the graph from the checkpoint

So I am basically using this transformer implementation for my project: https://github.com/Kyubyong/transformer . It works great on the German to English translation it was originally written for and I modified the processing python script in order…
noob
  • 5,954
  • 6
  • 20
  • 32
11
votes
2 answers

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In the original paper, it stated that: “BERT is…
11
votes
3 answers

How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?

I've been looking to use Hugging Face's Pipelines for NER (named entity recognition). However, it is returning the entity labels in inside-outside-beginning (IOB) format but without the IOB labels. So I'm not able to map the output of the pipeline…
11
votes
2 answers

Get probability of multi-token word in MASK position

It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, and then find the probability of your requested…
1
2 3
72 73