Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25% | Synced

By Storm Warden · March 16, 2026 · 1 min read

ai
machine learning & data science
nature language tech
research
ai

Source: Synced | AI Technology & Industry Review

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.