CrystalASR: Hierarchical Phoneme-Grounded Speech Decoding

Po-Ting Lin

doi:10.5281/zenodo.19317074

Zenodo (preprint) Preprint

Posted March 29, 2026

CrystalASR: Hierarchical Phoneme-Grounded Speech Decoding

Po-Ting Lin ¹

¹ Independent Researcher

DOI: 10.5281/zenodo.19317074
License: CC BY 4.0
Categories: Speech Recognition · Machine Learning

Download PDF DOI Cite

Abstract

We investigate a design principle for automatic speech recognition where linguistic structure is explicitly enforced through intermediate representations. The resulting system, CrystalASR, decomposes decoding into three modular layers: a 3.3M-parameter phoneme CTC head, a zero-parameter rule-based word decoder, and an optional language model (LM) for disambiguation. A defining constraint is strict upward information flow, ensuring higher layers modulate but never override lower-level acoustic evidence. Experiments on LibriSpeech dev-clean show that CrystalASR achieves 17.44% WER while requiring 21× fewer trainable parameters and 14× faster inference than a comparable end-to-end baseline. Error attribution reveals that word-level errors primarily originate from subtle phoneme inaccuracies amplified by downstream segmentation. Furthermore, a language model weight sweep reveals a sharp phase transition: beyond a narrow tiebreaker role (w_LM > 0.03), WER rises from 17% to 96% as the LM's score scale overwhelms acoustic evidence. These findings suggest that explicitly decoupling acoustic and lexical processing yields interpretable error diagnostics and substantial parameter savings, at a moderate accuracy cost relative to end-to-end models.

Keywords

automatic speech recognition
phoneme CTC
modular decoding
rule-based word decoder
language model weighting
interpretability

Full text

Loading PDF…

Cite this work

Cite as (BibTeX)

@misc{lin2026crystalasr,
  title = {CrystalASR: Hierarchical Phoneme-Grounded Speech Decoding},
  author = {Po-Ting Lin},
  year = {2026},
  howpublished = {Zenodo},
  doi = {10.5281/zenodo.19317074}
}

← All publications