0

I am trying to use a GPT2 architecture for musical applications and consequently need to train it from scratch. After a bit of googling I found that the issue #1714 from huggingface's github already had "solved" the question. When I try the to run the propose solution :

from transformers import GPT2Config, GPT2Model

NUMLAYER = 4
NUMHEAD = 4
SIZEREDUCTION = 10 #the factor by which we reduce the size of the velocity argument.
VELSIZE = int(np.floor(127/SIZEREDUCTION)) + 1 
SEQLEN=40 #size of data sequences.
EMBEDSIZE = 5 

config = GPT2Config(vocab_size = VELSIZE, n_positions = SEQLEN, n_embd = EMBEDSIZE, n_layer = NUMLAYER, n_ctx = SEQLEN, n_head = NUMHEAD)  
model = GPT2Model(config)

I get the following error :

Traceback (most recent call last):

  File "<ipython-input-7-b043a7a2425f>", line 1, in <module>
    runfile('C:/Users/cnelias/Desktop/PHD/Swing project/code/script/GPT2.py', wdir='C:/Users/cnelias/Desktop/PHD/Swing project/code/script')

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
    execfile(filename, namespace)

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/cnelias/Desktop/PHD/Swing project/code/script/GPT2.py", line 191, in <module>
    model = GPT2Model(config)

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 355, in __init__
    self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 355, in <listcomp>
    self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 223, in __init__
    self.attn = Attention(nx, n_ctx, config, scale)

  File "C:\Users\cnelias\Anaconda3\lib\site-packages\transformers\modeling_gpt2.py", line 109, in __init__
    assert n_state % config.n_head == 0

What does it mean and how can I solve it ?

Also more generally, is there a documentation on how to do a forward call with the GPT2 ? Can I define my own train() function or do I have to use the model's build-in function ? Am I forced to use a Dataset to do the training or can I feed it individual tensors ? I looked for it but couldn't find answer to these on the doc, but maybe I missed something.

PS : I already read the blogpost fron huggingface.co, but it omits too much informations and details to be usefull for my application.

Guy Coder
  • 24,501
  • 8
  • 71
  • 136
Johncowk
  • 342
  • 1
  • 16

1 Answers1

2

I think the error message is pretty clear:

assert n_state % config.n_head == 0

Tracing it back through the code, we can see

n_state = nx # in Attention: n_state=768

which indicates that n_state represents the embedding dimension (which is generally 768 by default in BERT-like models). When we then look at the GPT-2 documentation, it seems the parameter specifying this is n_embd, which you are setting to 5. As the error indicates, the embedding dimension has to be evenly divisible through the number of attention heads, which were specified as 4. So, choosing a different embedding dimension as a multiple of 4 should solve the problem. Of course, you can also change the number of heads to begin with, but it seems that odd embedding dimensions are not supported.

dennlinger
  • 9,890
  • 1
  • 42
  • 63
  • thanks a lot !! I am bit a of a github noob, would you mind telling me how you found the piece of code from which the error was comming from ? I tried to do that myself and after a while of doing ```ctrl + f``` on every file of the repo I gave up. – Johncowk Mar 24 '20 at 15:14
  • 1
    No worries, good to see you working on problems by yourself. For this specifically, I worked my way backwards through the error message. Right above the `assert ...` in the error message you can see the path to the file where this specific line happened to be. From there on it was quite easy to find it, although I can otherwise recommend you to use the github search bar (which by default starts searching *in the current repository* and thus works as a quite neat alternative). Note that this is a one-off response, since the formal thing would be to open a new question on this (unrelated) topic. – dennlinger Mar 24 '20 at 18:03
  • 1
    I had the same problem and error message. Making `n_embd` a multiple of the number of attention heads indeed solved it. – Giorgio Dec 31 '20 at 19:08