LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling paper
arXiv, May 2026
I am a second-year Ph.D. student at the University of Maryland, College Park, advised by Prof. Heng Huang. Previously, I worked as a research assistant at the Natural Language Processing Laboratory of Northeastern University (China), supervised by Prof. Tong Xiao. I received my B.E. degree from Northeastern University of Computer Science and Engineering in 2021.
My research focuses on enhancing LLM reasoning and test-time scaling.
Reasoning and Test-Time Scaling: Parallel-R1 (ICLR'26) introduces first RL-based parallel thinking for LLMs, moving beyond sequential chain-of-thought. AutoTTS (arXiv'26) unlocks a new direction for test-time scaling—automatically discovering strategies via agentic search instead of hand-crafted inference heuristics. Parallel-Probe (ICML'26) enables efficient parallel thinking through 2D probing. MoT (ICLR'26) explores mixture-of-thought representations for logical reasoning.
Efficient Training & Inference: Multi-Draft Speculative Decoding (ICLR'25) improves the inference speed–quality trade-off via multi-draft decoding. Asymmetric MMT (ACL Findings'25) studies conflict and synergy in post-training for multilingual machine translation.
Foundation Models: UMST (ICML'22) builds multiscale Transformers over sub-word, word, and phrase units with word-boundary and phrase-level structure. EIT (ACL'24) enhances multi-head self-attention by encouraging consensus across heads via inner- and cross-subspace interactions. PartialFormer (ACL Findings'24) replaces monolithic FFNs with multiple partial FFNs for parameter-efficient Transformers.
arXiv, May 2026
ICML 2026
ICLR 2026 Main Conference; NeurIPS ER Workshop Spotlight
ICLR 2026 Main Conference; NeurIPS ER Workshop
ACL Findings, 2025
ICLR 2025
EMNLP 2024 Main
ACL 2024 Findings
ACL 2024 Main Conference
ICML 2022
Finished at August 2021