Towards Compressed Transformers for Piano Transcription
Thomas Prätzlich, Richhiey Thomas, Sebastian Stober
Abstract
Recently, the sequence-to-sequence transformer model for piano transcription (TPT) has shown state-of-the-art performance. However, its memory and compute requirements still limit its use in constrained environments such as mobile phones. To address these limitations, we explore model compression techniques for reducing memory and compute demands. First, we apply importance-based attention pruning to reduce the number of model weights. We compare individually and jointly pruning the model’s attention components, and find the encoder’s self-attention component to be most sensitive to pruning. Next, we fine-tune the pruned models with a knowledge distillation loss to recover performance lost during pruning. Finally, we apply dynamic weight quantization and evaluate its impact on model storage size and transcription metrics. With these compression techniques, the TPT model can be reduced by about 4x compared to the baseline without a drastic reduction in transcription performance.
Acceptance details
- Accepted contribution
- Presentation: Vortrag (strukturierte Sitzung)
- Session: Music Information Retrieval (MIR) 1
- Day / Time: 26.03.2026, 09:20-09:40
- Abstract ID: DAGA2026/505
- DOI: 10.71568/daga2026.505
- Manuscript: PDF-Download