Encoder Models - Sentiment analysis
Encoder - Decoder Models
Decoder Only Models - GPT family of models
The paper “Attention is all you need” replaced recurrent neural networks (RNN) and convolutional neural networks (CNN) with transformer models (or attention-based models).
The Transformer architecture consists of an encoder and a decoder, each of which is composed of several layers. Each layer consists of two sub-layers: a multi-head self-attention mechanism and a feed-forward neural network. The multi-head self-attention mechanism allows the model to attend to different parts of the input sequence, while the feed-forward network applies a point-wise fully connected layer to each position separately and identically.
The Transformer model also uses residual connections and layer normalization to facilitate training and prevent overfitting. In addition, the authors introduce a positional encoding scheme that encodes the position of each token in the input sequence, enabling the model to capture the order of the sequence without the need for recurrent or convolutional operations.