Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that EXVERUS significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.
@inproceedings{yang2026exverus,title={ExVerus: Verus Proof Repair via Counterexample Reasoning},author={Yang, Jun and Sun, Yuechun and Wu, Yi and Caridad, Rodrigo and Yuan, Yongwei and Yao, Jianan and Lu, Shan and Pei, Kexin},booktitle={Forty-third International Conference on Machine Learning},year={2026},url={https://openreview.net/forum?id=FNDkXA0OUJ},}
ACM CAIS
AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting
Jibang Wu, Chenghao Yang, Yi Wu, Simon Mahns, Chaoqi Wang, Hao Zhu, Fei Fang, and Haifeng Xu
In Proceedings of the ACM Conference on AI and Agentic Systems, 2026
This paper develops an agentic framework that employs large language models (LLMs) for grounded persuasive language generation in automated copywriting, with real estate marketing as a focal application. Our method is designed to align the generated content with user preferences while highlighting useful factual attributes. This agent consists of three key modules: (1) Grounding Module, mimicking expert human behavior to predict marketable features; (2) Personalization Module, aligning content with user preferences; (3) Marketing Module, ensuring factual accuracy and the inclusion of localized features. We conduct systematic human-subject experiments in the domain of real estate marketing, with a focus group of potential house buyers. The results demonstrate that marketing descriptions generated by our approach are preferred over those written by human experts by a clear margin while maintaining the same level of factual accuracy. Our findings suggest a promising agentic approach to automate large-scale targeted copywriting while ensuring factuality of content generation.
@inproceedings{10.1145/3786335.3813172,author={Wu, Jibang and Yang, Chenghao and Wu, Yi and Mahns, Simon and Wang, Chaoqi and Zhu, Hao and Fang, Fei and Xu, Haifeng},title={AI Realtor: Towards Grounded Persuasive Language Generation for Automated Copywriting},year={2026},isbn={9798400724152},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3786335.3813172},booktitle={Proceedings of the ACM Conference on AI and Agentic Systems},pages={419–444},numpages={26}}
Reason&Plan4LLM@ICLR
LLMs Aren’t Good Strategists, Yet Can Accumulate Episodes for Improved Planning
Yi Wu, and Zhimin Hu
In Workshop on Reasoning and Planning for Large Language Models, 2025
Strategic reasoning in dynamic environments, such as games, requires a balance between long-term strategy and short-term adaptations. Although specially trained agents can achieve superhuman performance, they often lack explainability and are highly dependent on extensive data for training. In contrast, approaches that leverage large language models (LLMs) benefit from few-shot learning but struggle to maintain strategic consistency. Drawing inspiration from existing cognitive models of human decision-making, which utilize various forms of memory, we introduce EpicStar, an LLM-based agent with cognitively inspired episodic and working memory modules. Episodic memory enables agents to draw on past experiences to formulate coherent long-term strategies, while working memory modulates active observation and decision variables essential for adaptation. We evaluated EpicStar in the strategy game StarCraft II, where it competes effectively against built-in agents at Level 6 difficulty, surpassing its predecessor at Level 5 with a smaller token budget. Our approach not only enhances adaptability, but also ensures strategic consistency, demonstrating the pivotal role that cognitive memory can play in strategic reasoning.
@inproceedings{wu2025llms,title={{LLM}s Aren't Good Strategists, Yet Can Accumulate Episodes for Improved Planning},author={Wu, Yi and Hu, Zhimin},booktitle={Workshop on Reasoning and Planning for Large Language Models},year={2025},url={https://openreview.net/forum?id=34NjywGCdx},}
DistShift@NeurIPS
Evolving Domain Adaptation of Pretrained Language Models for Text Classification
Yun-Shiuan Chuang, Yi Wu, Dhruv Gupta, Rheeya Uppaal, Ananya Kumar, Luhang Sun, Makesh Narsimhan Sreedhar, Sijia Yang, Timothy T. Rogers, and Junjie Hu
In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models, 2023
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method. Our analysis across various datasets reveals that this incremental method excels at adapting PLMs to EDS, outperforming traditional domain adaptation techniques. These findings highlight the importance of continually updating PLMs to ensure their effectiveness in real-world applications, paving the way for future research into PLM robustness against the natural temporal evolution of language.
@inproceedings{chuang2023evolving,title={Evolving Domain Adaptation of Pretrained Language Models for Text Classification},author={Chuang, Yun-Shiuan and Wu, Yi and Gupta, Dhruv and Uppaal, Rheeya and Kumar, Ananya and Sun, Luhang and Sreedhar, Makesh Narsimhan and Yang, Sijia and Rogers, Timothy T. and Hu, Junjie},booktitle={NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models},year={2023},archiveprefix={arXiv},primaryclass={cs.CL},url={https://openreview.net/forum?id=HSBV4cCheG},}
This page is made possible with the help of the al-folio theme.