Are Transformers (positional embedding + encoder) slow to train?

Asked Feb 06 '23 at 05:06

Active Feb 06 '23 at 05:06

Viewed 103 times

I am completely new to transformers. I built a transformer-based model that has the encoder and positional embedding parts only. I stacked 12 of them. To classify around 1 million samples of Time series data. the model is very very slow ( around half an hour for each epoch). My GPU: RTX 3080 on a laptop. Is it normal for Transformers to learn slowly? Is there any way to improve the performance? Is there an easy way to tune the hyperparameter with a highly skewed and very noisy dataset?

I tried diffrent learning rates to speed up the process, 0.001 is given me not bead results but a very slow processs. I implemented followe the tensoerflow implementation.

asked Feb 06 '23 at 05:06

Malak_MAM

Increase the batch-size, or use AdamW optimizer, and increase the number of heads. – Mohammad Ahmed Feb 06 '23 at 05:14

Are Transformers (positional embedding + encoder) slow to train?

0 Answers0