Computational complexity of self-attention layers grows very much as a function of sequence length.
please watch the below video for more details:
What is Self Attention Bottleneck for Transformers Deep Learning Models
Computational complexity of self-attention layers grows very much as a function of sequence length.
please watch the below video for more details:
What is Self Attention Bottleneck for Transformers Deep Learning Models