Self Attention Bottleneck for Transformers Deep Learning Models

Computational complexity of self-attention layers grows very much as a function of sequence length.

please watch the below video for more details:

What is Self Attention Bottleneck for Transformers Deep Learning Models

Leave a Reply