Dialogue
04 Jun 2026[*] = found in both arXiv and HF search [HF] = found via HF semantic search
written on 2026-06-06
| title | authors | categories | displaydate | upvotes |
|---|---|---|---|---|
| An Infectious Disease Spread Simulation Based on Large Language Model Decision Making | Yonchanok Khaokaew, Ruochen Kong, Andreas Zufle, Hao Xue, Taylor Anderson, Chandini Raina MacIntyre, Matthew Scotch, Flora D. Salim, David J Heslop | cs.AI | 2026-06-04 | |
| Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems | Yingzhuo Liu | cs.CL | 2026-06-04 | |
| VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents | Yunhao Yang, Neel P. Bhatt, Kevin Wang, Samuel Tetteh, Zhangyang Wang, Ufuk Topcu | cs.RO, cs.AI | 2026-06-03 | |
| Context-as-AI-Service: Surfacing Cross-File Dependency Chains for LLM-Generated Developer Documentation | Ameya Gawde, Vyzantinos Repantis, Harshvardhan Singh, Lucy Moys | cs.SE, cs.IR | 2026-06-03 | |
| Rethinking Sales Lead Scoring with LLM-based Hierarchical Preference Ranking | Chenyu Zhang, Yiwen Liu, Yin Sun, Xinyuan Zhang, Yuji Cao, Junming Jiao, Juyi Qiao | cs.IR, cs.AI | 2026-06-03 | |
| Organizational Control Layer: Governance Infrastructure at the Execution Boundary of LLM Agent Systems | Tianyu Shi, Yang Mo, Yiou Liu, Zhuonan Hao, Yin Wang, Wenzhuo Hu, Nan Yu, Meng Zhou, Jiangbo Yu | cs.MA | 2026-06-03 | |
| Efficient ASR Training with Conversations that Never Happened | Máté Gedeon, Péter Mihajlik | cs.CL, cs.AI, cs.SD, eess.AS | 2026-06-02 | |
| StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems | Taiyu Zhu, Yifan Wu, Weilin Jin, Ying Li, Gang Huang | cs.AI | 2026-06-02 | |
| RUBAS: Rubric-Based Reinforcement Learning for Agent Safety | Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui, Qi Zhu, Fei Mi, Hongning Wang, Minlie Huang | cs.LG, cs.AI, cs.CR | 2026-06-02 | |
| Chatbots Output Meaningful (but Problematic) Language | Matthew Stone, Una Stojnić | cs.CL | 2026-06-02 | |
| Topics as Proxies for Sociodemographics: How Conversational Context Affects LLM Answers | Vera Neplenbroek, Gabriele Sarti, Arianna Bisazza, Raquel Fernández | cs.CL | 2026-06-01 | |
| Trust-Calibrated Code Review: A Participatory Design Study of Review Workflows for LLM-Generated Multi-File Changes | Lo Gullstrand Heander, Agnia Sergeyuk, Ilya Zakharov, Emma Söderberg, Nikita Mukhortov | cs.SE, cs.HC | 2026-06-01 | |
| BraveGuard: From Open-World Threats to Safer Computer-Use Agents | Yunhao Feng, Xiaohu Du, Xinhao Deng, Yifan Ding, Ming Wen, Yixu Wang, Yuxiang Xie, Baihui Zheng, Yingshui Tan, Yige Li, Yutao Wu, Kerui Cao, Wenke Huang, Yanming Guo, Xingjun Ma, Yu-Gang Jiang | cs.CR, cs.CL | 2026-05-31 | |
| SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision | Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang, Qing Zong, Jiahe Guo, Zhongwei Xie, Yiyan Ji, Yauwai Yim, Hongyu Luo, Xiyu Ren, Ruan Chenyu, Haoran Li, Yangqiu Song | cs.AI | 2026-05-31 | |
| Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding | Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah, Sebastian Rogawski, Pawel Morkisz, Anahita Bhiwandiwalla, Phillip Howard | cs.CL, cs.AI | 2026-05-31 | |
| Agentic Authoring of Interactive Multiview Visualizations in Genomics | Astrid van den Brandt, Kiroong Choe, Sehi L’Yi, Devin Lange, Nils Gehlenborg | cs.HC, cs.AI | 2026-05-29 | |
| Preference-Aware Rubric Learning for Personalized Evaluation | Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yuxin Chen, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Yoko Yamakata, Tat-Seng Chua | cs.CL | 2026-05-29 | |
| Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization | Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin, Wenhao Li | cs.MA, cs.AI | 2026-05-28 | |
| Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents | Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù, Leila Kosseim | cs.CL, cs.AI, cs.LG | 2026-05-28 | |
| Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs | Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic | cs.CR, cs.CL, cs.SE | 2026-05-28 | |
| AgentCVR: Active Multi-Agent Cross-Video Reasoning via Script-Simulated Reinforcement Learning | Yilun Qiu, Jiahe Wang, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, Chun Yuan | cs.CV, cs.MA | 2026-05-28 | |
| Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models | Arturo Valdivia, Paolo Burelli | cs.AI | 2026-05-28 | |
| LLM-ALSO: LLM-Driven Adaptive Learning-Signal Optimization for Multi-Agent Reinforcement Learning | Xiaoguang Wu, Zhi Zheng, Hui Xiong | cs.MA | 2026-05-28 | |
| When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL | Youting Wang, Yuan Tang, Bowen Liu, Xuan Liu, Dingyan Shang | cs.LG, cs.AI, cs.IR | 2026-05-27 | |
| Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News | Alejandro Buitrago López, Alberto Ortega Pastor, Javier Pastor-Galindo, José A. Ruipérez-Valiente | cs.CL, cs.AI | 2026-05-27 | |
| Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents | Jihyeong Park, Ingeol Baek, Jeonghyun Park, Hwanhee Lee | cs.CL | 2026-05-27 | |
| OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents | Chenyu Zhou, Xinyun Lu, Jiangyue Zhao, Jianghao Lin, Dongdong Ge, Yinyu Ye | cs.AI | 2026-05-27 | |
| Personality, Role, and Expressive Style in Large Language Models: An Interactionist Analysis | Moe Nagao, Koichiro Terao, Mikio Nakano, Naoto Iwahashi | cs.CL | 2026-05-27 | |
| Keyphrase Generative Representation of Youth Crisis Conversations Beyond Static Taxonomies | Abeer Badawi, Will Aitken, Lydia Sequeira, Jocelyn Rankin, Maia Norman, Elham Dolatabadi | cs.CL, cs.HC | 2026-05-26 | |
| Agentic Separation Logic Specification Synthesis | Tarun Suresh, David Korczynski, Julien Vanegue | cs.PL, cs.CL, cs.SE | 2026-05-26 | |
| TADDLE: A Tool-Augmented Agent for Detecting Deficient LLM-Generated Peer Reviews | Hanqi Duan, Xiang Li | cs.AI | 2026-05-26 | |
| Knowledge Graphs as the Missing Data Layer for LLM-Based Industrial Asset Operations | Madhulatha Mandarapu, Sandeep Kunkunuru | cs.DB, cs.AI, cs.LG | 2026-05-26 | |
| Causal methods for LLM development and evaluation | Dennis Frauen, Marie Brockschmidt, Konstantin Hess, Haorui Ma, Yuchen Ma, Abdurahman Maarouf, Maresa Schröder, Jonas Schweisthal, Yuxin Wang, Athiya Deviyani, Sonali Parbhoo, Rahul G. Krishnan, Stefan Feuerriegel | cs.LG | 2026-05-25 | |
| Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing | Shugang Hao, Lingjie Duan | cs.LG, cs.AI | 2026-05-22 | |
| HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation | Zewei Deng, Tinghan Ye, Liyan Xie | cs.CL, stat.ML | 2026-05-21 | |
| Self-Evolving Multi-Agent Systems via Decentralized Memory | Guangya Hao, Yunbo Long, Zhuokai Zhao | cs.MA | 2026-05-21 | |
| Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety | Piercosma Bisconti, Matteo Prandi, Federico Pierucci, Federico Sartore, Enrico Panai, Laura Caroli, Yue Zhu, Adam Leon Smith, Luca Nannini, Marcello Galisai, Susanna Cifani, Francesco Giarrusso, Marcantonio Bracale Syrnikov, Daniele Nardi | cs.CL | 2026-05-21 | |
| Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents | Ting Liu | cs.SE, cs.AI | 2026-05-21 | |
| Polite on the Surface, Wrong in Practice: A Curated Dataset for Fixing Honorific Failures in Multilingual Bangla Generation | Md. Asaduzzaman Shuvo, Mahedi Hasan, Md. Tashin Parvez, Azizul Haque Noman, Md. Shafayet Hossain Ovi | cs.CL | 2026-05-21 | |
| SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering? | Yuxuan Sun, Yuze Zhao, Yufeng Wang, Yao Du, Zhiyuan Ma, Jinbo Wang, Mengdi Zhang, Kai Zhang, Zhenya Huang | cs.SE, cs.AI | 2026-05-21 | |
| RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator | Zhenwei Tang, Zhaoyan Liu, Rasa Hosseinzadeh, Tongzi Wu, Keyvan Golestan, Jesse C. Cresswell | cs.CL | 2026-05-20 | |
| OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection | Jeffrey Flynt | cs.CR, cs.LG | 2026-03-23 | |
| Optimizing Multi-Agent Weather Captioning via Text Gradient Descent: A Training-Free Approach with Consensus-Aware Gradient Fusion | Shixu Liu | cs.CL | 2026-03-23 | |
| Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains | Octavian Untila | cs.SE, cs.AI, cs.MA | 2026-03-22 | |
| Reasoning Gets Harder for LLMs Inside A Dialogue | Ivan Kartáč, Mateusz Lango, Ondřej Dušek | cs.CL | 2026-03-20 | |
| An Agentic Approach to Generating XAI-Narratives | Yifan He, David Martens | cs.CL | 2026-03-20 | |
| Semantic Delta: An Interpretable Signal Differentiating Human and LLMs Dialogue | Riccardo Scantamburlo, Mauro Mezzanzana, Giacomo Buonanno, Francesco Bertolotti | cs.CL, cs.AI | 2026-03-20 | |
| Skilled AI Agents for Embedded and IoT Systems Development | Yiming Li, Yuhan Cheng, Mingchen Ma, Yihang Zou, Ningyuan Yang, Wei Cheng, Hai “Helen” Li, Yiran Chen, Tingjun Chen | cs.SE, cs.AI | 2026-03-20 | |
| Mi:dm K 2.5 Pro | KT Tech innovation Group | cs.CL, cs.AI | 2026-03-19 | |
| Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM | Zizhao Hu, Mohammad Rostami, Jesse Thomason | cs.AI | 2026-03-19 | |
| When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution | Yi Nian, Haosen Cao, Shenzhe Zhu, Henry Peng Zou, Qingqing Luan, Yue Zhao | cs.AI, cs.CL | 2026-03-18 | |
| Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures | Young Bin Park | cs.AI, cs.IR, cs.LO | 2026-03-18 | |
| Evaluating LLM-Simulated Conversations in Modeling Inconsistent and Uncollaborative Behaviors in Human Social Interaction | Ryo Kamoi, Ameya Godbole, Longqi Yang, Rui Zhang, Mengting Wan, Pei Zhou | cs.CL | 2026-03-17 | |
| Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure | Caglar Yildirim | cs.AI | 2026-03-17 | |
| Proactive Rejection and Grounded Execution: A Dual-Stage Intent Analysis Paradigm for Safe and Efficient AIoT Smart Homes | Xinxin Jin, Zhengwei Ni, Zhengguo Sheng, Victor C. M. Leung | cs.AI | 2026-03-17 | |
| VIBEPASS: Can Vibe Coders Really Pass the Vibe Check? | Srijan Bansal, Jiao Fangkai, Yilun Zhou, Austin Xu, Shafiq Joty, Semih Yavuz | cs.SE, cs.AI | 2026-03-16 | |
| Practicing with Language Models Cultivates Human Empathic Communication | Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Bruce Lambert, Matthew Groh | cs.CL, cs.HC | 2026-03-16 | |
| OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora | Jeffrey Flynt | cs.CL, cs.AI, cs.IR | 2026-03-16 | |
| GNNVerifier: Graph-based Verifier for LLM Task Planning | Yu Hao, Qiuyu Wang, Cheng Yang, Yawen Li, Zhiqiang Zhang, Chuan Shi | cs.LG | 2026-03-16 | |
| GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation | Wei Zeng, Fengwei An, Zhen Liu, Jian Zhao | cs.AI | 2026-03-16 | |
| CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language | Junhang Cheng, Fang Liu, Jia Li, Chengru Wu, Nanxiang Jiang, Li Zhang | cs.SE, cs.AI, cs.CL | 2026-03-15 | |
| Infinite Problem Generator: Verifiably Scaling Physics Reasoning Data with Agentic Workflows | Aditya Sharan, Sriram Hebbale, Dhruv Kumar | cs.CL, cs.AI | 2026-03-15 | |
| QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate | Jihao Zhao, Daixuan Li, Pengfei Li, Shuaishuai Zu, Biao Qin, Hongyan Liu | cs.CL | 2026-03-12 | |
| [HF] End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering | Nhi Dang, Tung Le, Huy Tien Nguyen | 2026-03-11 | ||
| SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models | Hsiao-Ying Huang, Cheng-Han Chiang, Hung-yi Lee | cs.CL, eess.AS | 2026-03-10 | |
| SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding | Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li, Tom Mitchell, Katia Sycara | cs.LG | 2026-03-10 | |
| Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers | Pengfei Du | cs.AI | 2026-03-08 | |
| FireBench: Evaluating Instruction Following in Enterprise and API-Driven LLM Applications | Yunfan Zhang, Yijie Bei, Jetashree Ravi, Pawel Garbacki | cs.CL, cs.SE | 2026-03-05 | |
| EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue | Ratna Kandala, Niva Manchanda, Akshata Kishore Moharir, Ananth Kandala | cs.AI | 2026-03-05 | |
| Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows | Alfio Massimiliano Gliozzo, Junkyu Lee, Nahuel Defosse | cs.AI, cs.LG | 2026-03-04 | |
| Assessing the Effectiveness of LLMs in Delivering Cognitive Behavioral Therapy | Navdeep Singh Bedi, Ana-Maria Bucur, Noriko Kando, Fabio Crestani | cs.CL | 2026-03-04 | |
| BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages | Jason Lucas, Matt Murtagh-White, Adaku Uchendu, Ali Al-Lawati, Michiharu Yamashita, Dominik Macko, Ivan Srba, Robert Moro, Dongwon Lee | cs.CL | 2026-02-28 | |
| LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning | Yu Zhu, Kai Yang | cs.CL, cs.AI | 2026-02-27 | |
| Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains | Xiaochong Jiang, Shiqi Yang, Wenting Yang, Yichen Liu, Cheng Ji | cs.CR, cs.AI | 2026-02-23 | |
| TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots | Fangrui Huang, Souhad Chbeir, Arpandeep Khatua, Sheng Wang, Sijun Tan, Kenan Ye, Lily Bailey, Merryn Daniel, Ryan Louie, Sanmi Koyejo, Ehsan Adeli | cs.CL, cs.AI, cs.CY | 2026-02-23 | |
| NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs | Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Sheng Li, Anil Vullikanti | cs.LG, cs.AI, cs.CL | 2026-02-20 | |
| What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data | Dimitri Staufer, Kirsten Morehouse | cs.HC, cs.AI, cs.CL, cs.CY | 2026-02-19 | |
| From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan’s Humanities and Social Sciences | Yi-Chih Huang | cs.AI, cs.CL, cs.CY | 2026-02-19 | |
| Evaluating Collective Behaviour of Hundreds of LLM Agents | Richard Willis, Jianing Zhao, Yali Du, Joel Z. Leibo | cs.MA | 2026-02-18 | |
| AREG: Adversarial Resource Extraction Game for Evaluating Persuasion and Resistance in Large Language Models | Adib Sakhawat, Fardeen Sadab | cs.CL | 2026-02-18 | |
| LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models | Ahmed Khaled Khamis, Hesham Ali | cs.CL | 2026-02-17 | |
| AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents | Zhixing Zhang, Jesen Zhang, Hao Liu, Qinhan Lv, Jing Yang, Kaitong Cai, Keze Wang | cs.AI | 2026-02-17 | |
| Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 | Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao | cs.AI, cs.CL, cs.CV, cs.CY, cs.LG | 2026-02-16 | |
| TruthStance: An Annotated Dataset of Conversations on Truth Social | Fathima Ameen, Danielle Brown, Manusha Malgareddy, Amanul Haque | cs.CL, cs.AI | 2026-02-16 | |
| An end-to-end agentic pipeline for smart contract translation and quality evaluation | Abhinav Goel, Chaitya Shah, Agostino Capponi, Alfio Gliozzo | cs.AI, cs.SE | 2026-02-14 | |
| Never say never: Exploring the effects of available knowledge on agent persuasiveness in controlled physiotherapy motivation dialogues | Stephan Vonschallen, Rahel Häusler, Theresa Schmiedel, Friederike Eyssel | cs.HC, cs.AI | 2026-02-13 | |
| WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models | Yangzhuo Li, Shengpeng Ji, Yifu Chen, Tianle Liang, Haorong Ying, Yule Wang, Junbo Li, Jun Fang, Zhou Zhao | cs.CL | 2026-02-12 | |
| Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? | Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev | cs.SE, cs.AI | 2026-02-12 | |
| Do Large Language Models Adapt to Language Variation across Socioeconomic Status? | Elisa Bassignana, Mike Zhang, Dirk Hovy, Amanda Cercas Curry | cs.CL | 2026-02-12 | |
| RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation | Jinfang Wang, Jiajie Liu, Jianwei Wu, Ziqin Luo, Zhen Chen, Chunlei Li, Biao Han, Tao Deng, Yi Li, Shuanglong Li, Lin Liu | cs.AI | 2026-02-12 | |
| AIR: Improving Agent Safety through Incident Response | Zibo Xiao, Jun Sun, Junjie Chen | cs.AI | 2026-02-12 | |
| TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning | Sina Tayebati, Divake Kumar, Nastaran Darabi, Davide Ettori, Ranganath Krishnan, Amit Ranjan Trivedi | cs.AI | 2026-02-11 | |
| Learning to Compose for Cross-domain Agentic Workflow Generation | Jialiang Wang, Shengxiang Xu, Hanmo Liu, Jiachuan Wang, Yuyu Luo, Shimin Di, Min-Ling Zhang, Lei Chen | cs.MA, cs.AI, cs.LG, cs.SE | 2026-02-11 | |
| AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models | Wentao Zhang, Mingxuan Zhao, Jincheng Gao, Jieshun You, Huaiyu Jia, Yilei Zhao, Bo An, Shuo Sun | q-fin.TR, cs.AI | 2026-02-10 | |
| Towards Poisoning Robustness Certification for Natural Language Generation | Mihnea Ghitu, Matthew Wicker | cs.LG | 2026-02-10 | |
| Large Language Models for Designing Participatory Budgeting Rules | Nguyen Thach, Xingchen Sha, Hau Chan | cs.LG | 2026-02-10 | |
| Accelerating Social Science Research via Agentic Hypothesization and Experimentation | Jishu Sen Gupta, Harini SI, Somesh Kumar Singh, Syed Mohamad Tawseeq, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah, Balaji Krishnamurthy | cs.AI, cs.CL | 2026-02-08 | |
| Exploring AI-Augmented Sensemaking of Patient-Generated Health Data: A Mixed-Method Study with Healthcare Professionals in Cardiac Risk Reduction | Pavithren V S Pakianathan, Rania Islambouli, Diogo Branco, Albrecht Schmidt, Tiago Guerreiro, Jan David Smeddinck | cs.HC, cs.AI | 2026-02-05 | |
| Generative Ontology: When Structured Knowledge Learns to Create | Benny Cheung | cs.AI, cs.CL | 2026-02-05 | |
| Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning | John Yan, Michael Yu, Yuqi Sun, Alexander Duffy, Tyler Marques, Matthew Lyle Olson | cs.LG, cs.AI | 2026-02-05 | |
| RA-QA: Towards Respiratory Audio-based Health Question Answering | Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo | cs.SD, cs.LG, eess.AS | 2026-02-04 | |
| ProxyWar: Dynamic Assessment of LLM Code Generation in Game Arenas | Wenjun Peng, Xinyu Wang, Qi Wu | cs.SE, cs.AI | 2026-02-04 | |
| A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model | Xiaolin Hu, Hang Yuan, Xinzhu Sang, Binbin Yan, Zhou Yu, Cong Huang, Kai Chen | cs.LG, cs.AI, cs.SD | 2026-02-04 | |
| From Crafting Text to Crafting Thought: Grounding AI Writing Support to Writing Center Pedagogy | Yijun Liu, John Gallagher, Sarah Sterman, Tal August | cs.HC | 2026-02-03 | |
| The Necessity of a Unified Framework for LLM-Based Agent Evaluation | Pengyu Zhu, Li Sun, Philip S. Yu, Sen Su | cs.AI | 2026-02-03 | |
| GuideWeb: A Benchmark for Automatic In-App Guide Generation on Real-World Web UIs | Chengguang Gan, Yoshihiro Tsujii, Yunhao Liang, Tatsunori Mori, Shiwen Ni, Hiroki Itoh | cs.CL | 2026-02-02 | |
| Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles | Shaohan Wang, Benfeng Xu, Licheng Zhang, Mingxuan Du, Chiwei Zhu, Xiaorui Wang, Zhendong Mao, Yongdong Zhang | cs.CL | 2026-02-02 | |
| PedagoSense: A Pedology Grounded LLM System for Pedagogical Strategy Detection and Contextual Response Generation in Learning Dialogues | Shahem Sultan, Shahem Fadi, Yousef Melhim, Ibrahim Alsarraj, Besher Hassan | cs.CL | 2026-02-01 | |
| PaperBanana: Automating Academic Illustration for AI Scientists | Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon | cs.CL, cs.CV | 2026-01-30 | |
| WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents | Yao Zhang, Shijie Tang, Zeyu Li, Zhen Han, Volker Tresp | cs.AI | 2026-01-29 | |
| Embodied Task Planning via Graph-Informed Action Generation with Large Language Model | Xiang Li, Ning Yan, Masood Mortazavi | cs.CL | 2026-01-29 | |
| More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests | Haoming Huang, Pongchai Jaisri, Shota Shimizu, Lingfeng Chen, Sota Nakashima, Gema Rodríguez-Pérez | cs.SE, cs.AI, cs.HC | 2026-01-29 | |
| Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement | Kaiyuan Wu, Aditya Nagori, Rishikesan Kamaleswaran | cs.AI, cs.MA | 2026-01-28 | |
| A Dialectic Pipeline for Improving LLM Robustness | Sara Candussio | cs.CL, cs.MA | 2026-01-28 | |
| RobustExplain: Evaluating Robustness of LLM-Based Explanation Agents for Recommendation | Guilin Zhang, Kai Zhao, Jeffrey Friedman, Xu Chu | cs.IR, cs.AI, cs.LG | 2026-01-27 | |
| Assessing the Quality of Mental Health Support in LLM Responses through Multi-Attribute Human Evaluation | Abeer Badawi, Md Tahmid Rahman Laskar, Elahe Rahimi, Sheri Grach, Lindsay Bertrand, Lames Danok, Frank Rudzicz, Jimmy Huang, Elham Dolatabadi | cs.AI, cs.HC | 2026-01-26 | |
| LegalMALR:Multi-Agent Query Understanding and LLM-Based Reranking for Chinese Statute Retrieval | Yunhan Li, Mingjie Xie, Gaoli Kang, Zihan Gong, Gengshen Wu, Min Yang | cs.IR, cs.CL | 2026-01-25 | |
| Status Hierarchies in Language Models | Emilio Barkett | cs.HC, cs.AI, cs.CL | 2026-01-24 | |
| The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents | Chen Chen, Kim Young Il, Yuan Yang, Wenhao Su, Yilin Zhang, Xueluan Gong, Qian Wang, Yongsen Zheng, Ziyao Liu, Kwok-Yan Lam | cs.CL | 2026-01-24 | |
| On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification | David Condrey | cs.CR, cs.AI, cs.HC | 2026-01-24 | |
| LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation | Stergios Chatzikyriakidis | cs.CL | 2026-01-14 | |
| Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models | Santiago Martínez Novoa, Nicolás Rozo Fajardo, Diego Alejandro González Vargas, Nicolás Bedoya Figueroa | cs.CL | 2026-01-14 | |
| Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity | Samhita Bollepally, Aurora Sloman-Moll, Takashi Yamauchi | cs.CL, cs.AI | 2026-01-14 | |
| OpenMic: A Multi-Agent-Based Stand-Up Comedy Generation System | Yuyang Wu, Hanzhong Cao, Jianhao Chen, Yufei Li | cs.AI | 2026-01-13 | |
| Order in the Evaluation Court: A Critical Analysis of NLG Evaluation Trends | Jing Yang, Nils Feldhus, Salar Mohtaj, Leonhard Hennig, Qianli Wang, Eleni Metheniti, Sherzod Hakimov, Charlott Jakob, Veronika Solopova, Konrad Rieck, David Schlangen, Sebastian Möller, Vera Schmitt | cs.CL | 2026-01-12 | |
| PsyCLIENT: Client Simulation via Conversational Trajectory Modeling for Trainee Practice and Model Evaluation in Mental Health Counseling | Huachuan Qiu, Zhaoming Chen, Yuqian Chen, Yuan Xie, Yu Lu, Zhenzhong Lan | cs.CL | 2026-01-12 | |
| Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation (Extended Version) | Aja Khanal, Kaushik T. Ranade, Rishabh Agrawal, Kalyan S. Basu, Apurva Narayan | cs.MA | 2026-01-12 | |
| Can a Unimodal Language Agent Provide Preferences to Tune a Multimodal Vision-Language Model? | Sazia Tabasum Mim, Jack Morris, Manish Dhakal, Yanming Xiu, Maria Gorlatova, Yi Ding | cs.CL | 2026-01-10 | |
| STELP: Secure Transpilation and Execution of LLM-Generated Programs | Swapnil Shinde, Sahil Wadhwa, Andy Luo, Akshay Gupta, Mohammad Shahed Sorower | cs.SE, cs.AI | 2026-01-09 | |
| A Preliminary Agentic Framework for Matrix Deflation | Paimon Goulart, Evangelos E. Papalexakis | cs.LG | 2026-01-06 | |
| The Path Ahead for Agentic AI: Challenges and Opportunities | Nadia Sibai, Yara Ahmed, Serry Sibaee, Sawsan AlHalawani, Adel Ammar, Wadii Boulila | cs.AI | 2026-01-06 | |
| AgentMark: Utility-Preserving Behavioral Watermarking for Agents | Kaibo Huang, Jin Tan, Yukun Wei, Wanling Li, Zipei Zhang, Hui Tian, Zhongliang Yang, Linna Zhou | cs.CR, cs.AI | 2026-01-05 | |
| WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics | Chenxu Liu, Yingjie Fu, Wei Yang, Ying Zhang, Tao Xie | cs.SE, cs.AI | 2026-01-05 | |
| CaveAgent: Transforming LLMs into Stateful Runtime Operators | Maohao Ran, Zhenglin Wan, Cooper Lin, Yanting Zhang, Hongyu Xin, Hongwei Fan, Yibo Xu, Beier Luo, Yaxin Zhou, Wangbo Zhao, Lijie Yang, Lang Feng, Fuchao Yang, Jingxuan Wu, Yiqiao Huang, Chendong Ma, Dailing Jiang, Jianbo Deng, Sihui Han, Bo An, Yike Guo, Jun Song | cs.AI, cs.SE | 2026-01-04 | |
| MAMA-Memeia! Multi-Aspect Multi-Agent Collaboration for Depressive Symptoms Identification in Memes | Siddhant Agarwal, Adya Dhuler, Polly Ruhnke, Melvin Speisman, Md Shad Akhtar, Shweta Yadav | cs.CL | 2025-12-31 | |
| Do Large Language Models Know What They Are Capable Of? | Casey O. Barkan, Sid Black, Oliver Sourbut | cs.CL, cs.AI | 2025-12-31 | |
| The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models | Giuseppe Canale, Kashyap Thimmaraju | cs.CR, cs.AI, cs.CY, cs.HC | 2025-12-30 | |
| Web World Models | Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, Mengdi Wang | cs.AI, cs.CL, cs.CV | 2025-12-29 | |
| TCEval: Using Thermal Comfort to Assess Cognitive and Perceptual Abilities of AI | Jingming Li | cs.AI | 2025-12-29 | |
| AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents | Bhanu Prakash Vangala, Ali Adibifar, Tanu Malik, Ashish Gehani | cs.SE, cs.AI, cs.MA | 2025-12-26 | |
| Emotion Diffusion in Real and Simulated Social Graphs: Structural Limits of LLM-Based Social Simulation | Qiqi Qiang | cs.SI | 2025-12-24 | |
| NVIDIA Nemotron 3: Efficient and Open Intelligence | NVIDIA, :, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Amy Shen, Anahita Bhiwandiwalla, Andrew Tao, Anjulie Agrusa, Ankur Verma, Ann Guan, Anubhav Mandarwal, Arham Mehta, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asma Kuriparambil Thekkumpate, Ayush Dattagupta, Banghua Zhu, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Besmira Nushi, Bilal Kartal, Bita Darvish Rouhani, Boris Ginsburg, Brandon Norick, Brandon Soubasis, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Carlo del Mundo, Chantal Hwang, Charles Wang, Cheng-Ping Hsieh, Chenghao Zhang, Chenhan Yu, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christopher Parisien, Collin Neale, Cyril Meurillon, Damon Mosk-Aoyama, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daria Gitman, Daria Levy, Darko Stosic, David Mosallanezhad, Deepak Narayanan, Dhruv Nathawani, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dong Ahn, Duncan Riach, Dusan Stosic, Edgar Minasyan, Edward Lin, Eileen Long, Eileen Peters Long, Elad Segal, Elena Lantz, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Tramel, Erick Galinkin, Erik Pounds, Evan Briones, Evelina Bakhturina, Evgeny Tsykunov, Faisal Ladhak, Fay Wang, Fei Jia, Felipe Soares, Feng Chen, Ferenc Galko, Frank Sun, Frankie Siino, Gal Hubara Agam, Ganesh Ajjanagadde, Gantavya Bhatt, Gargi Prasad, George Armstrong, Gerald Shen, Gorkem Batmaz, Grigor Nalbandyan, Haifeng Qian, Harsh Sharma, Hayley Ross, Helen Ngo, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huizi Mao, Huy C Nguyen, Huy Q Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igor Gitman, Ilya Loshchilov, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jian Zhang, Jiaqi Zeng, Jie Lou, Jimmy Zhang, Jinhang Choi, Jining Huang, Joey Conway, Joey Guman, John Kamalu, Johnny Greco, Jonathan Cohen, Joseph Jennings, Joyjit Daw, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kai Xu, Kan Zhu, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kevin Shih, Kezhi Kong, Khushi Bhardwaj, Kirthi Shankar, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Lawrence McAfee, Laya Sleiman, Leon Derczynski, Li Ding, Lizzie Wei, Lucas Liebenwein, Luis Vega, Maanu Grover, Maarten Van Segbroeck, Maer Rodrigues de Melo, Mahdi Nazemi, Makesh Narsimhan Sreedhar, Manoj Kilaru, Maor Ashkenazi, Marc Romeijn, Marcin Chochowski, Mark Cai, Markus Kliegl, Maryam Moosaei, Matt Kulka, Matvei Novikov, Mehrzad Samadi, Melissa Corpuz, Mengru Wang, Meredith Price, Michael Andersch, Michael Boone, Michael Evans, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Najeeb Nabwani, Natalie Hereth, Nave Assaf, Negar Habibi, Neta Zmora, Netanel Haber, Nicola Sessions, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nishant Sharma, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Puny, Oren Tropp, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Pawel Morkisz, Peter Dykas, Peter Jin, Pinky Xu, Piotr Januszewski, Pranav Prashant Thombre, Prasoon Varshney, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Rabeeh Karimi Mahabadi, Rachit Garg, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Rich Harang, Rick Izzo, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Hesse, Roger Waleffe, Rohit Watve, Roi Koren, Ruoxi Zhang, Russell Hewett, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Sadegh Mahdavi, Sahil Modi, Samuel Kriman, Sangkug Lim, Sanjay Kariyappa, Sanjeev Satheesh, Saori Kaji, Satish Pasumarthi, Saurav Muralidharan, Sean Narentharen, Sean Narenthiran, Seonmyeong Bak, Sergey Kashirsky, Seth Poulos, Shahar Mor, Shanmugam Ramasamy, Shantanu Acharya, Shaona Ghosh, Sharath Turuvekere Sreenivas, Shelby Thomas, Shiqing Fan, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuoyang Ding, Siddharth Singh, Simeng Sun, Smita Ithape, Somshubra Majumdar, Soumye Singhal, Stas Sergienko, Stefania Alborghetti, Stephen Ge, Sugam Dipak Devare, Sumeet Kumar Barua, Suseella Panguluri, Suyog Gupta, Sweta Priyadarshi, Syeda Nahida Akter, Tan Bui, Teodor-Dumitru Ene, Terry Kong, Thanh Do, Tijmen Blankevoort, Tim Moon, Tom Balough, Tomer Asida, Tomer Bar Natan, Tomer Ronen, Tugrul Konuk, Twinkle Vashishth, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vinay Rao, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wenfei Zhou, Will Jennings, William Zhang, Wojciech Prazuch, Xiaowei Ren, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Yigong Qin, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Subara, Yoshi Suhara, Yubo Gao, Zach Moshe, Zhen Dong, Zhongbo Zhu, Zihan Liu, Zijia Chen, Zijie Yan | cs.CL, cs.AI, cs.LG | 2025-12-24 | |
| AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent | Haipeng Luo, Huawen Feng, Qingfeng Sun, Can Xu, Kai Zheng, Yufei Wang, Tao Yang, Han Hu, Yansong Tang, Di Wang | cs.AI, cs.CL, cs.LG | 2025-12-23 | |
| SA-DiffuSeq: Addressing Computational and Scalability Challenges in Long-Document Generation with Sparse Attention | Alexandros Christoforos, Chadbourne Davis | cs.CL, cs.AI | 2025-12-23 | |
| MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts | Alexandros Christoforos, Chadbourne Davis | cs.CL | 2025-12-23 | |
| Distilling to Hybrid Attention Models via KL-Guided Layer Selection | Yanhong Li, Songlin Yang, Shawn Tan, Mayank Mishra, Rameswar Panda, Jiawei Zhou, Yoon Kim | cs.CL, cs.AI | 2025-12-23 | |
| LLM Agents Implement an NLG System from Scratch: Building Interpretable Rule-Based RDF-to-Text Generators | Mateusz Lango, Ondřej Dušek | cs.CL, cs.AI | 2025-12-20 | |
| ShareChat: A Dataset of Chatbot Conversations in the Wild | Yueru Yan, Tuc Nguyen, Bo Su, Melissa Lieffers, Thai Le | cs.CL, cs.AI, cs.HC | 2025-12-19 | |
| Polypersona: Persona-Grounded LLM for Synthetic Survey Responses | Tejaswani Dash, Dinesh Karri, Anudeep Vurity, Gautam Datla, Tazeem Ahmad, Saima Rafi, Rohith Tangudu | cs.CL, cs.AI | 2025-12-16 | |
| Evaluation of AI Ethics Tools in Language Models: A Developers’ Perspective Case Stud | Jhessica Silva, Diego A. B. Moreira, Gabriel O. dos Santos, Alef Ferreira, Helena Maia, Sandra Avila, Helio Pedrini | cs.CY, cs.AI, cs.CL | 2025-12-16 | |
| Workflow is All You Need: Escaping the “Statistical Smoothing Trap” via High-Entropy Information Foraging and Adversarial Pacing | Zhongjie Jiang | cs.CL, cs.AI, cs.CY, q-fin.GN | 2025-12-10 | |
| Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making | Qingyuan Zhang, Yuxi Wang, Cancan Hua, Yulin Huang, Ning Lyu | cs.CL | 2025-12-10 | |
| The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing? | Sadat Shahriar, Navid Ayoobi, Arjun Mukherjee | cs.LG, cs.AI | 2025-12-04 | |
| Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning | Wenlong Tang | cs.LG, cs.AI | 2025-11-28 | |
| Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach | Shuqi Liu, Han Wu, Guanzhi Deng, Jianshu Chen, Xiaoyang Wang, Linqi Song | cs.CL, cs.AI | 2025-11-28 | |
| Adaptive LLM Agents: Toward Personalized Empathetic Care | Priyanka Singh, Sebastian Von Mammen | cs.HC | 2025-11-25 | |
| Deep Research: A Systematic Survey | Zhengliang Shi, Yiqun Chen, Haitao Li, Weiwei Sun, Shiyu Ni, Yougang Lyu, Run-Ze Fan, Bowen Jin, Yixuan Weng, Minjun Zhu, Qiujie Xie, Xinyu Guo, Qu Yang, Jiayi Wu, Jujia Zhao, Xiaqiang Tang, Xinbei Ma, Cunxiang Wang, Jiaxin Mao, Qingyao Ai, Jen-Tse Huang, Wenxuan Wang, Yue Zhang, Yiming Yang, Zhaopeng Tu, Zhaochun Ren | cs.CL, cs.AI, cs.IR | 2025-11-24 | |
| MindEval: Benchmarking Language Models on Multi-turn Mental Health Support | José Pombal, Maya D’Eon, Nuno M. Guerreiro, Pedro Henrique Martins, António Farinhas, Ricardo Rei | cs.CL, cs.AI | 2025-11-23 | |
| NAMeGEn: Creative Name Generation via A Novel Agent-based Multiple Personalized Goal Enhancement Framework | Shanlin Zhou, Xinpeng Wang, Jianxun Lian, Zhenghao Liu, Laks V. S. Lakshmanan, Xiaoyuan Yi, Yongtao Hao | cs.CL, cs.AI, cs.IR, cs.MA, cs.NE | 2025-11-19 | |
| AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR | Gabrial Zencha Ashungafac, Mardhiyah Sanni, Busayo Awobade, Alex Gichamba, Tobi Olatunji | cs.CL | 2025-11-18 | |
| Generalist Foundation Models Are Not Clinical Enough for Hospital Operations | Lavender Y. Jiang, Angelica Chen, Xu Han, Xujin Chris Liu, Radhika Dua, Kevin Eaton, Frederick Wolff, Robert Steele, Jeff Zhang, Anton Alyakin, Qingkai Pan, Yanbing Chen, Karl L. Sangwon, Daniel A. Alber, Jaden Stryker, Jin Vivian Lee, Yindalon Aphinyanaphongs, Kyunghyun Cho, Eric Karl Oermann | cs.CL, cs.AI, cs.LG | 2025-11-17 | |
| Prompt-Based Value Steering of Large Language Models | Giulio Antonio Abbo, Tony Belpaeme | cs.CL, cs.AI | 2025-11-14 | |
| Self-Correcting Large Language Models: Generation vs. Multiple Choice | Hossein A. Rahmani, Satyapriya Krishna, Xi Wang, Mohammadmehdi Naghiaei, Emine Yilmaz | cs.CL, cs.AI | 2025-11-12 | |
| HalluClean: A Unified Framework to Combat Hallucinations in LLMs | Yaxin Zhao, Yu Zhang | cs.CL | 2025-11-12 | |
| Simulating Students with Large Language Models: A Review of Architecture, Mechanisms, and Role Modelling in Education with Generative AI | Luis Marquez-Carpintero, Alberto Lopez-Sellers, Miguel Cazorla | cs.CY, cs.AI, cs.CL | 2025-11-08 | |
| Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance | Mashrur Rahman, Mantaqa abedin, Monowar Zamil Abir, Faizul Islam Ansari, Adib Reza, Farig Yousuf Sadeque, Niloy Farhan | cs.IR, cs.CL | 2025-11-06 | |
| Multi-Agent Collaborative Framework For Math Problem Generation | Kia Karbasi, Kevin Hong, Mohammad Amin Samadi, Gregory Pottie | cs.MA, cs.CL, cs.HC | 2025-11-06 | |
| Bayesian Evaluation of Large Language Model Behavior | Rachel Longjohn, Shang Wu, Saatvik Kher, Catarina Belém, Padhraic Smyth | cs.CL, cs.LG, stat.AP, stat.ML | 2025-11-04 | |
| Hybrid Quantum Transformer for Language Generation | Desheng Kong, Xiangshuo Cui, Jiaying Jin, Jing Xu, Donglin Wang | cs.CL, cs.AI, quant-ph | 2025-11-02 | |
| Fine-Tuning DialoGPT on Common Diseases in Rural Nepal for Medical Conversations | Birat Poudel, Satyam Ghimire, Er. Prakash Chandra Prasad | cs.CL | 2025-11-01 | |
| Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning | Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, Natasha Jaques | cs.CL, cs.AI | 2025-10-31 | |
| CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions | Lingyue Fu, Xin Ding, Yaoming Zhu, Shao Zhang, Lin Qiu, Weiwen Liu, Weinan Zhang, Xuezhi Cao, Xunliang Cai, Jiaxin Ding, Yong Yu | cs.AI, cs.CL | 2025-10-30 | |
| Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations | Syed Zohaib Hassan, Pål Halvorsen, Miriam S. Johnson, Pierre Lison | cs.CL | 2025-10-28 | |
| Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning) | Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier, Sina Zarrieß | cs.CL | 2025-10-23 | |
| Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety | Vamshi Krishna Bonagiri, Ponnurangam Kumaragurum, Khanh Nguyen, Benjamin Plaut | cs.CL | 2025-10-18 | |
| Efficient Seq2seq Coreference Resolution Using Entity Representations | Matt Grenander, Shay B. Cohen, Mark Steedman | cs.CL | 2025-10-16 | |
| Generating Fair Consensus Statements with Social Choice on Token-Level MDPs | Carter Blair, Kate Larson | cs.AI, cs.CL, cs.GT | 2025-10-15 | |
| [HF] Deflanderization for Game Dialogue: Balancing Character Authenticity with Task Execution in LLM-based NPCs | Pasin Buakhaw, Kun Kerdthaisong, Phuree Phenhiran, Pitikorn Khlaisamniang, Supasate Vorathammathorn, Piyalitt Ittichaiwong, Nutchanon Yongsatianchot | 2025-10-15 | 1 | |
| MADREC: A Multi-Aspect Driven LLM Agent for Explainable and Adaptive Recommendation | Jiin Park, Misuk Kim | cs.IR, cs.AI | 2025-10-15 | |
| CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation | Yee Man Choi, Xuehang Guo, Yi R. Fung, Qingyun Wang | cs.DL | 2025-10-15 | |
| GOAT: A Training Framework for Goal-Oriented Agent with Tools | Hyunji Min, Sangwon Jung, Junyoung Sung, Dosung Lee, Leekyeung Han, Paul Hongsuck Seo | cs.AI | 2025-10-14 | |
| ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory | Yunzhong Xiao, Yangmin Li, Hewei Wang, Yunlong Tang, Zora Zhiruo Wang | cs.CL | 2025-10-08 | |
| What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems | Kiyotada Mori, Seiya Kawano, Chaoran Liu, Carlos Toshinori Ishi, Angel Fernando Garcia Contreras, Koichiro Yoshino | cs.CL | 2025-08-06 | |
| Investigating Hallucination in Conversations for Low Resource Languages | Amit Das, Md. Najib Hasan, Souvika Sarkar, Zheng Zhang, Fatemeh Jamshidi, Tathagata Bhattacharya, Nilanjana Raychawdhury, Dongji Feng, Vinija Jain, Aman Chadha | cs.CL | 2025-07-30 | |
| Teaching Language Models To Gather Information Proactively | Tenghao Huang, Sihao Chen, Muhao Chen, Jonathan May, Longqi Yang, Mengting Wan, Pei Zhou | cs.AI, cs.CL | 2025-07-28 | |
| [HF] RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing | Hao Xiang, Tianyi Tang, Yang Su, Bowen Yu, An Yang, Fei Huang, Yichang Zhang, Yaojie Lu, Hongyu Lin, Xianpei Han, Jingren Zhou, Junyang Lin, Le Sun | 2025-07-27 | ||
| AI-Driven Generation of Old English: A Framework for Low-Resource Languages | Rodrigo Gabriel Salazar Alva, Matías Nuñez, Cristian López, Javier Martín Arista | cs.CL, cs.AI | 2025-07-27 | |
| CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards | Cheng Liu, Yifei Lu, Fanghua Ye, Jian Li, Xingyu Chen, Feiliang Ren, Zhaopeng Tu, Xiaolong Li | cs.CL | 2025-07-23 | |
| [HF] DialogueForge: LLM Simulation of Human-Chatbot Dialogue | Ruizhe Zhu, Hao Zhu, Yaxuan Li, Syang Zhou, Shijing Cai, Malgorzata Lazuka, Elliott Ash | 2025-07-21 | 1 | |
| On the Semantics of Large Language Models | Martin Schuele | cs.CL, cs.AI | 2025-07-07 | |
| SHNU Multilingual Conversational Speech Recognition System for INTERSPEECH 2025 MLC-SLM Challenge | Yuxiang Mei, Yuang Zheng, Dongxing Xu, Yanhua Long | cs.CL, eess.AS | 2025-07-04 | |
| The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems | Reza Yousefi Maragheh, Yashar Deldjoo | cs.IR | 2025-07-02 | |
| Decision-Oriented Text Evaluation | Yu-Shiang Huang, Chuan-Ju Wang, Chung-Chi Chen | cs.CL | 2025-07-02 | |
| [HF] SPADE: Systematic Prompt Framework for Automated Dialogue Expansion in Machine-Generated Text Detection | Haoyi Li, Angela Yifei Yuan, Soyeon Caren Han, Christopher Leckie | 2025-03-19 | ||
| [HF] Open-Source Large Language Models as Multilingual Crowdworkers: Synthesizing Open-Domain Dialogues in Several Languages With No Examples in Targets and No Machine Translation | Ahmed Njifenjou, Virgile Sucal, Bassam Jabaian, Fabrice Lefèvre | 2025-03-05 | ||
| [HF] Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs | Reham Omar, Omij Mangukiya, Essam Mansour | 2025-01-17 | ||
| [HF] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling | Minzheng Wang, Xinghua Zhang, Kun Chen, Nan Xu, Haiyang Yu, Fei Huang, Wenji Mao, Yongbin Li | 2024-12-06 | 8 | |
| [HF] DiaSynth – Synthetic Dialogue Generation Framework | Sathya Krishnan Suresh, Wu Mengjun, Tushar Pranav, Eng Siong Chng | 2024-09-25 | 20 | |
| [HF] J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling | Wataru Nakata, Kentaro Seki, Hitomi Yanaka, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari | 2024-07-22 | ||
| [HF] PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models | Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang, Kyung-Ah Sohn | 2024-04-01 | ||
| [HF] StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation | Jinpeng Li, Zekai Zhang, Quan Tu, Xin Cheng, Dongyan Zhao, Rui Yan | 2024-03-18 | ||
| [HF] KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark | Seongbo Jang, Seonghyeon Lee, Hwanjo Yu | 2024-02-27 | ||
| [HF] Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue | Jian Wang, Chak Tou Leong, Jiashuo Wang, Dongding Lin, Wenjie Li, Xiao-Yong Wei | 2024-02-10 | ||
| [HF] Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | Dennis Ulmer, Elman Mansimov, Kaixiang Lin, Justin Sun, Xibin Gao, Yi Zhang | 2024-01-10 | 18 | |
| [HF] Faithful Persona-based Conversational Dataset Generation with Large Language Models | Pegah Jandaghi, XiangHai Sheng, Xinyi Bai, Jay Pujara, Hakim Sidahmed | 2023-12-15 | 11 | |
| [HF] CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models | Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang | 2023-11-28 | 1 | |
| [HF] PRODIGy: a PROfile-based DIalogue Generation dataset | Daniela Occhipinti, Serra Sinem Tekiroglu, Marco Guerini | 2023-11-09 | ||
| [HF] Learning From Free-Text Human Feedback – Collect New Datasets Or Extend Existing Ones? | Dominic Petrak, Nafise Sadat Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych | 2023-10-24 | ||
| [HF] MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control | Zhenyi Lu, Wei Wei, Xiaoye Qu, XianLing Mao, Dangyang Chen, Jixiong Chen | 2023-10-22 | ||
| [HF] BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues | Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen | 2023-10-20 | ||
| [HF] Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations | Jihyoung Jang, Minseong Boo, Hyounghun Kim | 2023-10-20 | 2 | |
| [HF] We are what we repeatedly do: Inducing and deploying habitual schemas in persona-based responses | Benjamin Kane, Lenhart Schubert | 2023-10-10 | 1 | |
| [HF] Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities | Shih-Cheng Huang, Pin-Zu Li, Yu-Chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tzong-Han Tsai, Hung-yi Lee | 2023-10-07 | ||
| [HF] Towards human-like spoken dialogue generation between AI agents from written dialogue | Kentaro Mitsui, Yukiya Hono, Kei Sawada | 2023-10-02 | ||
| [HF] ChatHaruhi: Reviving Anime Character in Reality via Large Language Model | Cheng Li, Ziang Leng, Chenxi Yan, Junyi Shen, Hao Wang, Weishi MI, Yaying Fei, Xiaoyang Feng, Song Yan, HaoSheng Wang, Linkang Zhan, Yaokai Jia, Pingyu Wu, Haozhen Sun | 2023-08-18 | 1 | |
| [HF] Three Ways of Using Large Language Models to Evaluate Chat | Ondřej Plátek, Vojtěch Hudeček, Patricia Schmidtová, Mateusz Lango, Ondřej Dušek | 2023-08-12 | ||
| [HF] DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations | Bo-Ru Lu, Nikita Haduong, Chia-Hsuan Lee, Zeqiu Wu, Hao Cheng, Paul Koester, Jean Utke, Tao Yu, Noah A. Smith, Mari Ostendorf | 2023-07-13 | 17 | |
| [HF] Prompted LLMs as Chatbot Modules for Long Open-domain Conversation | Gibbeum Lee, Volker Hartmann, Jongho Park, Dimitris Papailiopoulos, Kangwook Lee | 2023-05-08 | ||
| [HF] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili | 2023-04-26 | 1 | |
| [HF] ChatLLM Network: More brains, More intelligence | Rui Hao, Linmei Hu, Weijian Qi, Qingliu Wu, Yirui Zhang, Liqiang Nie | 2023-04-24 | ||
| [HF] Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data | Jing Wei, Sungdong Kim, Hyunhoon Jung, Young-Ho Kim | 2023-01-14 | ||
| [HF] Controllable Dialogue Simulation with In-Context Learning | Zekun Li, Wenhu Chen, Shiyang Li, Hong Wang, Jing Qian, Xifeng Yan | 2022-10-09 | ||
| [HF] A Benchmark for Understanding and Generating Dialogue between Characters in Stories | Jianzhu Yao, Ziqi Liu, Jian Guan, Minlie Huang | 2022-09-18 | ||
| [HF] MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages | Qingyu Zhang, Xiaoyu Shen, Ernie Chang, Jidong Ge, Pengke Chen | 2022-08-27 | ||
| [HF] Building a Personalized Dialogue System with Prompt-Tuning | Tomohito Kasahara, Daisuke Kawahara, Nguyen Tung, Shengzhe Li, Kenta Shinzato, Toshinori Sato | 2022-06-11 | ||
| [HF] A Mixture-of-Expert Approach to RL-based Dialogue Management | Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier | 2022-05-31 | 1 | |
| [HF] Towards a Progression-Aware Autonomous Dialogue Agent | Abraham Sanders, Tomek Strzalkowski, Mei Si, Albert Chang, Deepanshu Dey, Jonas Braasch, Dakuo Wang | 2022-05-07 | ||
| [HF] Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models | Sanghwan Bae, Donghyun Kwak, Sungdong Kim, Donghoon Ham, Soyoung Kang, Sang-Woo Lee, Woomyoung Park | 2022-04-30 | ||
| [HF] Meet Your Favorite Character: Open-domain Chatbot Mimicking Fictional Characters with only a Few Utterances | Seungju Han, Beomsu Kim, Jin Yong Yoo, Seokjun Seo, Sangbum Kim, Enkhbayar Erdenee, Buru Chang | 2022-04-22 | 1 | |
| [HF] Multimodal Dialogue Response Generation | Qingfeng Sun, Yujing Wang, Can Xu, Kai Zheng, Yaming Yang, Huang Hu, Fei Xu, Jessica Zhang, Xiubo Geng, Daxin Jiang | 2021-10-16 | ||
| [HF] CharacterChat: Supporting the Creation of Fictional Characters through Conversation and Progressive Manifestation with a Chatbot | Oliver Schmitt, Daniel Buschek | 2021-06-23 | ||
| [HF] Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey | Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, Erik Cambria | 2021-05-10 | ||
| [HF] Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions | Bodhisattwa Prasad Majumder, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Julian McAuley | 2020-10-07 | ||
| [HF] A Large-Scale Chinese Short-Text Conversation Dataset | Yida Wang, Pei Ke, Yinhe Zheng, Kaili Huang, Yong Jiang, Xiaoyan Zhu, Minlie Huang | 2020-08-10 | ||
| [HF] Recipes for building an open-domain chatbot | Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston | 2020-04-28 | 1 | |
| [HF] A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data | Yinhe Zheng, Rongsheng Zhang, Xiaoxi Mao, Minlie Huang | 2019-11-12 | ||
| [HF] DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation | Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan | 2019-11-01 | 2 | |
| [HF] ALOHA: Artificial Learning of Human Attributes for Dialogue Agents | Aaron W. Li, Veronica Jiang, Steven Y. Feng, Julia Sprague, Wei Zhou, Jesse Hoey | 2019-10-18 | ||
| [HF] PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable | Siqi Bao, Huang He, Fan Wang, Hua Wu, Haifeng Wang | 2019-10-17 | ||
| [HF] Towards Deep Conversational Recommendations | Raymond Li, Samira Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, Chris Pal | 2018-12-18 |