Research

Publications

Peer-reviewed papers and preprints. For the most current list, see my Google Scholar.

2026

2026

CrystalASR: Hierarchical Phoneme-Grounded Speech Decoding

Po-Ting Lin

Po-Ting Lin
Zenodo (preprint) Preprint
We investigate a design principle for automatic speech recognition where linguistic structure is explicitly enforced through intermediate representations. The resulting system, CrystalASR, decomposes decoding into three modular layers: a 3.3M-parameter phoneme CTC head, a zero-parameter rule-based word decoder, and an optional language model (LM) for disambiguation. A defining constraint is strict upward information flow, ensuring higher layers modulate but never override lower-level acoustic evidence. Experiments on LibriSpeech dev-clean show that CrystalASR achieves 17.44% WER while requiring 21× fewer trainable parameters and 14× faster inference than a comparable end-to-end baseline. Error attribution reveals that word-level errors primarily originate from subtle phoneme inaccuracies amplified by downstream segmentation. Furthermore, a language model weight sweep reveals a sharp phase transition: beyond a narrow tiebreaker role (w_LM > 0.03), WER rises from 17% to 96% as the LM's score scale overwhelms acoustic evidence. These findings suggest that explicitly decoupling acoustic and lexical processing yields interpretable error diagnostics and substantial parameter savings, at a moderate accuracy cost relative to end-to-end models.

PDF DOI
2026

Structural Crystallization: A Unified Computational Framework for Memory Formation, Persistence, and Modification

Po-Ting Lin

Po-Ting Lin
Psychological Review In Review
Prevailing models of memory treat strength as a unitary quantity that increases with learning and decreases with forgetting or extinction. We argue that this conflation obscures a fundamental distinction: memory has separable dimensions of structural accumulation and representational fidelity, which are differentially modified by retrieval. Structural accumulation captures how much of a memory trace has been consolidated; representational fidelity captures how faithfully that trace preserves its original encoding. The two can change independently — a memory can retain its full structural extent while becoming progressively distorted, or dissolve while remaining perfectly faithful. This distinction, combined with a second distinction between structurally protective retrieval (externally guided, high-constraint) and structurally risky retrieval (internally generated, low-constraint), resolves several phenomena that have resisted unified explanation: why extinction is temporary but retrieval-extinction can produce lasting change, why stress-enhanced fear memories persist for orders of magnitude longer than ordinary memories, and why fear extinction is fragile while appetitive extinction is durable. We formalize these distinctions in a system of four coupled differential equations (the Structural Crystallization framework), calibrate it to two datasets, and show that it correctly predicts — without parameter adjustment — outcomes across five independent benchmarks where competing models (Rescorla–Wagner; latent cause) fail. Extensions to human declarative memory capture the testing effect crossover, while an explicit failure on the spacing effect reveals interpretable boundary conditions. The results demonstrate that separating accumulation from fidelity is not merely a modeling convenience but a theoretical necessity for any account of both the persistence and the modifiability of memory.

PDF DOI

CrystalASR: Hierarchical Phoneme-Grounded Speech Decoding

Structural Crystallization: A Unified Computational Framework for Memory Formation, Persistence, and Modification