Attention Mechanism
A neural network component that computes weighted combinations of value vectors based on the compatibility between query and key vectors. Self-attention (used in transformers) allows each position in a sequence to attend to all other positions, capturing long-range dependencies. Cross-attention enables one sequence to attend to another (e.g., language tokens attending to visual features).