Towards Compressed Transformers for Piano Transcription

Thomas Prätzlich, Richhiey Thomas, Sebastian Stober

Abstract

Recently, the sequence-to-sequence transformer model for piano transcription (TPT) has shown state-of-the-art performance. However, its memory and compute requirements still limit its use in constrained environments such as mobile phones. To address these limitations, we explore model compression techniques for reducing memory and compute demands. First, we apply importance-based attention pruning to reduce the number of model weights. We compare individually and jointly pruning the model’s attention components, and find the encoder’s self-attention component to be most sensitive to pruning. Next, we fine-tune the pruned models with a knowledge distillation loss to recover performance lost during pruning. Finally, we apply dynamic weight quantization and evaluate its impact on model storage size and transcription metrics. With these compression techniques, the TPT model can be reduced by about 4x compared to the baseline without a drastic reduction in transcription performance.

Acceptance details

Accepted contribution
Presentation: Vortrag (strukturierte Sitzung)
Session: Music Information Retrieval (MIR) 1
Day / Time: 26.03.2026, 09:20-09:40
Abstract ID: DAGA2026/505
DOI: 10.71568/daga2026.505
Manuscript: PDF-Download