Highest Voted 'transformer-model' Questions

29

votes

3 answers

How to understand masked multi-head attention in transformer

I'm currently studying code of transformer, but I can not understand the masked multi-head of decoder. The paper said that it is to prevent you from seeing the generating word, but I can not unserstand if the words after generating word have not…

asked Sep 27 '19 at 02:40

Neptuner

291
1
3
3

28

votes

7 answers

How to download model from huggingface?

https://huggingface.co/models For example, I want to download 'bert-base-uncased', but cann't find a 'Download' link. Please help. Or is it not downloadable?

huggingface-transformers transformer-model

asked May 19 '21 at 00:34

marlon

6,029
8
42
76

25

votes

3 answers

Difference between src_mask and src_key_padding_mask

I am having a difficult time in understanding transformers. Everything is getting clear bit by bit but one thing that makes my head scratch is what is the difference between src_mask and src_key_padding_mask which is passed as an argument in forward…

pytorch transformer-model

asked Jun 03 '20 at 10:18

Leo

480
1
6
9

25

votes

5 answers

How to create a StreamTransformer in Dart?

Trying to build a custom StreamTransformer class, however a lot of the examples out there seem to be out of date, and the one found in the documentation isn't (what some typed languages might consider anyway) as a class (found here:…

stream dart transformer-model

asked Jan 04 '15 at 13:15

Will Squire

6,127
7
45
57

20

votes

2 answers

what the difference between att_mask and key_padding_mask in MultiHeadAttnetion

What the difference between att_mask and key_padding_mask in MultiHeadAttnetion of pytorch: key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. When given a binary mask and a value is True, the…

python deep-learning pytorch transformer-model attention-model

asked Jun 29 '20 at 00:31

one

2,205
1
15
37

19

votes

4 answers

AttributeError when using ColumnTransformer into a pipeline

This is my first machine learning project and the first time that I use ColumnTransformer. My aim is to perform two steps of data preprocessing, and use ColumnTransformer for each of them. In the first step, I want to replace the missing values in…

python pandas scikit-learn pipeline transformer-model

asked Jan 21 '19 at 22:20

Giulia

205
1
2
5

15

votes

1 answer

Implementation of the Dense Synthesizer

I’m trying to understand the Synthesizer paper (https://arxiv.org/pdf/2005.00743.pdf 1) and there’s a description of the dense synthesizer mechanism that should replace the traditional attention model as described in the Transformer…

python deep-learning neural-network pytorch transformer-model

asked May 06 '20 at 08:33

alvas

115,346
109
446
738

14

votes

2 answers

gcc ON arm/android

I just got a EEE pad transformer. Like any hardware I own I'd like to have a C compiler on it. I know I can cross compile, but I'd like to do development ON the device itself. I've searched google and all I can seem to find are pages on how to…

android gcc arm android-3.0-honeycomb transformer-model

asked Jun 17 '11 at 02:45

Tim

279
2
3
10

14

votes

3 answers

Setting namespaces and prefixes in a Java DOM document

I'm trying to convert a ResultSet to an XML file. I've first used this example for the serialization. import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.Document; import org.w3c.dom.ls.DOMImplementationLS; import …

java xml serialization transformer-model

asked May 14 '12 at 13:51

TrashCan

817
4
13
20

13

votes

1 answer

How to get immediate next word probability using GPT2 model?

I was trying the hugging face gpt2 model. I have seen the run_generation.py script, which generates a sequence of tokens given a prompt. I am aware that we can use GPT2 for NLG. In my use case, I wish to determine the probability distribution for…

transformer-model huggingface-transformers

asked Jul 11 '20 at 18:19

Gaurang Tandon

6,504
11
47
84

12

votes

2 answers

Why embed dimemsion must be divisible by num of heads in MultiheadAttention?

I am learning the Transformer. Here is the pytorch document for MultiheadAttention. In their implementation, I saw there is a constraint: assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads" Why require…

python-3.x pytorch transformer-model attention-model

asked Feb 26 '21 at 16:45

jason

1,998
3
22
42

12

votes

3 answers

InvalidArgumentError: Mismatch between the current graph and the graph from the checkpoint

So I am basically using this transformer implementation for my project: https://github.com/Kyubyong/transformer . It works great on the German to English translation it was originally written for and I modified the processing python script in order…

python tensorflow tensor transformer-model

asked Oct 24 '18 at 20:04

noob

5,954
6
20
32

11

votes

2 answers

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In the original paper, it stated that: “BERT is…

deep-learning nlp bert-language-model huggingface-transformers transformer-model

asked Jan 09 '21 at 19:46

tlqn

349
1
6
18

11

votes

3 answers

How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?

I've been looking to use Hugging Face's Pipelines for NER (named entity recognition). However, it is returning the entity labels in inside-outside-beginning (IOB) format but without the IOB labels. So I'm not able to map the output of the pipeline…

nlp tokenize transformer-model named-entity-recognition huggingface-transformers

asked Mar 30 '20 at 18:58

Union find

7,759
13
60
111

11

votes

2 answers

Get probability of multi-token word in MASK position

It is relatively easy to get a token's probability according to a language model, as the snippet below shows. You can get the output of a model, restrict yourself to the output of the masked token, and then find the probability of your requested…

python pytorch transformer-model bert-language-model huggingface-transformers

asked Dec 21 '19 at 09:24

Bram Vanroy

27,032
24
137
239

Questions tagged [transformer-model]