Self Attention Bottleneck for Transformers Deep Learning Models

Computational complexity of self-attention layers grows very much as a function of sequence length. please watch the below video for more details: What is Self Attention Bottleneck for Transformers Deep Learning Models https://youtu.be/ByYaJ3k0SAY