Publications

2026

ICML
ExVerus: Verus Proof Repair via Counterexample Reasoning

Jun Yang, Yuechun Sun, Yi Wu, Rodrigo Caridad, Yongwei Yuan, Jianan Yao, Shan Lu, and Kexin Pei

In Forty-third International Conference on Machine Learning, 2026

Abs arXiv Bib HTML

Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that EXVERUS significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.
@inproceedings{yang2026exverus, title = {ExVerus: Verus Proof Repair via Counterexample Reasoning}, author = {Yang, Jun and Sun, Yuechun and Wu, Yi and Caridad, Rodrigo and Yuan, Yongwei and Yao, Jianan and Lu, Shan and Pei, Kexin}, booktitle = {Forty-third International Conference on Machine Learning}, year = {2026}, url = {https://openreview.net/forum?id=FNDkXA0OUJ}, }
ACM CAIS
AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting

Jibang Wu, Chenghao Yang, Yi Wu, Simon Mahns, Chaoqi Wang, Hao Zhu, Fei Fang, and Haifeng Xu

In Proceedings of the ACM Conference on AI and Agentic Systems, 2026

Abs arXiv Bib HTML

This paper develops an agentic framework that employs large language models (LLMs) for grounded persuasive language generation in automated copywriting, with real estate marketing as a focal application. Our method is designed to align the generated content with user preferences while highlighting useful factual attributes. This agent consists of three key modules: (1) Grounding Module, mimicking expert human behavior to predict marketable features; (2) Personalization Module, aligning content with user preferences; (3) Marketing Module, ensuring factual accuracy and the inclusion of localized features. We conduct systematic human-subject experiments in the domain of real estate marketing, with a focus group of potential house buyers. The results demonstrate that marketing descriptions generated by our approach are preferred over those written by human experts by a clear margin while maintaining the same level of factual accuracy. Our findings suggest a promising agentic approach to automate large-scale targeted copywriting while ensuring factuality of content generation.
@inproceedings{10.1145/3786335.3813172, author = {Wu, Jibang and Yang, Chenghao and Wu, Yi and Mahns, Simon and Wang, Chaoqi and Zhu, Hao and Fang, Fei and Xu, Haifeng}, title = {AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting}, year = {2026}, isbn = {9798400724152}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3786335.3813172}, booktitle = {Proceedings of the ACM Conference on AI and Agentic Systems}, pages = {419–444}, numpages = {26} }

2025

Reason&Plan4LLM@ICLR
LLMs Are Not Good Strategists, Yet Memory-Enhanced Agency Boosts Reasoning

Yi Wu, and Zhimin Hu

In Workshop on Reasoning and Planning for Large Language Models, 2025

Abs Bib HTML PDF

Strategic reasoning in dynamic environments, such as games, requires a balance between long-term strategy and short-term adaptations. Although specially trained agents can achieve superhuman performance, they often lack explainability and are highly dependent on extensive data for training. In contrast, approaches that leverage large language models (LLMs) benefit from few-shot learning but struggle to maintain strategic consistency. Drawing inspiration from existing cognitive models of human decision-making, which utilize various forms of memory, we introduce EpicStar, an LLM-based agent with cognitively inspired episodic and working memory modules. Episodic memory enables agents to draw on past experiences to formulate coherent long-term strategies, while working memory modulates active observation and decision variables essential for adaptation. We evaluated EpicStar in the strategy game StarCraft II, where it competes effectively against built-in agents at Level 6 difficulty, surpassing its predecessor at Level 5 with a smaller token budget. Our approach not only enhances adaptability, but also ensures strategic consistency, demonstrating the pivotal role that cognitive memory can play in strategic reasoning.
@inproceedings{wu2025llms, title = {{LLM}s Are Not Good Strategists, Yet Memory-Enhanced Agency Boosts Reasoning}, author = {Wu, Yi and Hu, Zhimin}, booktitle = {Workshop on Reasoning and Planning for Large Language Models}, year = {2025}, url = {https://openreview.net/forum?id=34NjywGCdx}, }

2023

DistShift@NeurIPS
Evolving Domain Adaptation of Pretrained Language Models for Text Classification

Yun-Shiuan Chuang, Yi Wu, Dhruv Gupta, Rheeya Uppaal, Ananya Kumar, Luhang Sun, Makesh Narsimhan Sreedhar, Sijia Yang, Timothy T. Rogers, and Junjie Hu

In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models, 2023

Abs arXiv Bib HTML

Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method. Our analysis across various datasets reveals that this incremental method excels at adapting PLMs to EDS, outperforming traditional domain adaptation techniques. These findings highlight the importance of continually updating PLMs to ensure their effectiveness in real-world applications, paving the way for future research into PLM robustness against the natural temporal evolution of language.
@inproceedings{chuang2023evolving, title = {Evolving Domain Adaptation of Pretrained Language Models for Text Classification}, author = {Chuang, Yun-Shiuan and Wu, Yi and Gupta, Dhruv and Uppaal, Rheeya and Kumar, Ananya and Sun, Luhang and Sreedhar, Makesh Narsimhan and Yang, Sijia and Rogers, Timothy T. and Hu, Junjie}, booktitle = {NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://openreview.net/forum?id=HSBV4cCheG}, }

WMT

KnowComp Submission for WMT23 Word-Level AutoCompletion Task

Yi Wu, Haochen Shi, Weiqi Wang, and Yangqiu Song

In Proceedings of the Eighth Conference on Machine Translation, Dec 2023

Bib HTML

@inproceedings{wu-etal-2023-knowcomp,
  title = {{K}now{C}omp Submission for {WMT}23 Word-Level {A}uto{C}ompletion Task},
  author = {Wu, Yi and Shi, Haochen and Wang, Weiqi and Song, Yangqiu},
  editor = {Koehn, Philipp and Haddow, Barry and Kocmi, Tom and Monz, Christof},
  booktitle = {Proceedings of the Eighth Conference on Machine Translation},
  month = dec,
  year = {2023},
  address = {Singapore},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.wmt-1.79},
  doi = {10.18653/v1/2023.wmt-1.79},
  pages = {882--889}
}