
Self Attention Bottleneck for Transformers Deep Learning Models
Computational complexity of self-attention layers grows very much as a function of sequence length. please watch the below video for more details: What is Self Attention Bottleneck for Transformers Deep Learning Models