[*] = found in both arXiv and HF search [HF] = found via HF semantic search
written on 2026-03-28
| title | authors | categories | displaydate | upvotes |
|---|---|---|---|---|
| OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection | Jeffrey Flynt | cs.CR, cs.LG | 2026-03-23 | |
| Optimizing Multi-Agent Weather Captioning via Text Gradient Descent: A Training-Free Approach with Consensus-Aware Gradient Fusion | Shixu Liu | cs.CL | 2026-03-23 | |
| Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains | Octavian Untila | cs.SE, cs.AI, cs.MA | 2026-03-22 | |
| Reasoning Gets Harder for LLMs Inside A Dialogue | Ivan Kartáč, Mateusz Lango, Ondřej Dušek | cs.CL | 2026-03-20 | |
| An Agentic Approach to Generating XAI-Narratives | Yifan He, David Martens | cs.CL | 2026-03-20 | |
| Semantic Delta: An Interpretable Signal Differentiating Human and LLMs Dialogue | Riccardo Scantamburlo, Mauro Mezzanzana, Giacomo Buonanno, Francesco Bertolotti | cs.CL, cs.AI | 2026-03-20 | |
| Skilled AI Agents for Embedded and IoT Systems Development | Yiming Li, Yuhan Cheng, Mingchen Ma, Yihang Zou, Ningyuan Yang, Wei Cheng, Hai “Helen” Li, Yiran Chen, Tingjun Chen | cs.SE, cs.AI | 2026-03-20 | |
| Mi:dm K 2.5 Pro | KT Tech innovation Group | cs.CL, cs.AI | 2026-03-19 | |
| Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM | Zizhao Hu, Mohammad Rostami, Jesse Thomason | cs.AI | 2026-03-19 | |
| When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution | Yi Nian, Haosen Cao, Shenzhe Zhu, Henry Peng Zou, Qingqing Luan, Yue Zhao | cs.AI, cs.CL | 2026-03-18 | |
| Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures | Young Bin Park | cs.AI, cs.IR, cs.LO | 2026-03-18 | |
| Evaluating LLM-Simulated Conversations in Modeling Inconsistent and Uncollaborative Behaviors in Human Social Interaction | Ryo Kamoi, Ameya Godbole, Longqi Yang, Rui Zhang, Mengting Wan, Pei Zhou | cs.CL | 2026-03-17 | |
| Differential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure | Caglar Yildirim | cs.AI | 2026-03-17 | |
| Proactive Rejection and Grounded Execution: A Dual-Stage Intent Analysis Paradigm for Safe and Efficient AIoT Smart Homes | Xinxin Jin, Zhengwei Ni, Zhengguo Sheng, Victor C. M. Leung | cs.AI | 2026-03-17 | |
| VIBEPASS: Can Vibe Coders Really Pass the Vibe Check? | Srijan Bansal, Jiao Fangkai, Yilun Zhou, Austin Xu, Shafiq Joty, Semih Yavuz | cs.SE, cs.AI | 2026-03-16 | |
| Practicing with Language Models Cultivates Human Empathic Communication | Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Bruce Lambert, Matthew Groh | cs.CL, cs.HC | 2026-03-16 | |
| OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora | Jeffrey Flynt | cs.CL, cs.AI, cs.IR | 2026-03-16 | |
| GNNVerifier: Graph-based Verifier for LLM Task Planning | Yu Hao, Qiuyu Wang, Cheng Yang, Yawen Li, Zhiqiang Zhang, Chuan Shi | cs.LG | 2026-03-16 | |
| GameUIAgent: An LLM-Powered Framework for Automated Game UI Design with Structured Intermediate Representation | Wei Zeng, Fengwei An, Zhen Liu, Jian Zhao | cs.AI | 2026-03-16 | |
| CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language | Junhang Cheng, Fang Liu, Jia Li, Chengru Wu, Nanxiang Jiang, Li Zhang | cs.SE, cs.AI, cs.CL | 2026-03-15 | |
| Infinite Problem Generator: Verifiably Scaling Physics Reasoning Data with Agentic Workflows | Aditya Sharan, Sriram Hebbale, Dhruv Kumar | cs.CL, cs.AI | 2026-03-15 | |
| QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate | Jihao Zhao, Daixuan Li, Pengfei Li, Shuaishuai Zu, Biao Qin, Hongyan Liu | cs.CL | 2026-03-12 | |
| [HF] End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering | Nhi Dang, Tung Le, Huy Tien Nguyen | 2026-03-11 | ||
| SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models | Hsiao-Ying Huang, Cheng-Han Chiang, Hung-yi Lee | cs.CL, eess.AS | 2026-03-10 | |
| SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding | Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li, Tom Mitchell, Katia Sycara | cs.LG | 2026-03-10 | |
| Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers | Pengfei Du | cs.AI | 2026-03-08 | |
| FireBench: Evaluating Instruction Following in Enterprise and API-Driven LLM Applications | Yunfan Zhang, Yijie Bei, Jetashree Ravi, Pawel Garbacki | cs.CL, cs.SE | 2026-03-05 | |
| EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue | Ratna Kandala, Niva Manchanda, Akshata Kishore Moharir, Ananth Kandala | cs.AI | 2026-03-05 | |
| Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows | Alfio Massimiliano Gliozzo, Junkyu Lee, Nahuel Defosse | cs.AI, cs.LG | 2026-03-04 | |
| Assessing the Effectiveness of LLMs in Delivering Cognitive Behavioral Therapy | Navdeep Singh Bedi, Ana-Maria Bucur, Noriko Kando, Fabio Crestani | cs.CL | 2026-03-04 | |
| BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages | Jason Lucas, Matt Murtagh-White, Adaku Uchendu, Ali Al-Lawati, Michiharu Yamashita, Dominik Macko, Ivan Srba, Robert Moro, Dongwon Lee | cs.CL | 2026-02-28 | |
| LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning | Yu Zhu, Kai Yang | cs.CL, cs.AI | 2026-02-27 | |
| Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains | Xiaochong Jiang, Shiqi Yang, Wenting Yang, Yichen Liu, Cheng Ji | cs.CR, cs.AI | 2026-02-23 | |
| TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots | Fangrui Huang, Souhad Chbeir, Arpandeep Khatua, Sheng Wang, Sijun Tan, Kenan Ye, Lily Bailey, Merryn Daniel, Ryan Louie, Sanmi Koyejo, Ehsan Adeli | cs.CL, cs.AI, cs.CY | 2026-02-23 | |
| NIMMGen: Learning Neural-Integrated Mechanistic Digital Twins with LLMs | Zihan Guan, Rituparna Datta, Mengxuan Hu, Shunshun Liu, Aiying Zhang, Prasanna Balachandran, Sheng Li, Anil Vullikanti | cs.LG, cs.AI, cs.CL | 2026-02-20 | |
| What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data | Dimitri Staufer, Kirsten Morehouse | cs.HC, cs.AI, cs.CL, cs.CY | 2026-02-19 | |
| From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research Perspectives in Taiwan’s Humanities and Social Sciences | Yi-Chih Huang | cs.AI, cs.CL, cs.CY | 2026-02-19 | |
| Evaluating Collective Behaviour of Hundreds of LLM Agents | Richard Willis, Jianing Zhao, Yali Du, Joel Z. Leibo | cs.MA | 2026-02-18 | |
| AREG: Adversarial Resource Extraction Game for Evaluating Persuasion and Resistance in Large Language Models | Adib Sakhawat, Fardeen Sadab | cs.CL | 2026-02-18 | |
| LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models | Ahmed Khaled Khamis, Hesham Ali | cs.CL | 2026-02-17 | |
| AgriWorld:A World Tools Protocol Framework for Verifiable Agricultural Reasoning with Code-Executing LLM Agents | Zhixing Zhang, Jesen Zhang, Hao Liu, Qinhan Lv, Jing Yang, Kaitong Cai, Keze Wang | cs.AI | 2026-02-17 | |
| Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5 | Dongrui Liu, Yi Yu, Jie Zhang, Guanxu Chen, Qihao Lin, Hanxi Zhu, Lige Huang, Yijin Zhou, Peng Wang, Shuai Shao, Boxuan Zhang, Zicheng Liu, Jingwei Sun, Yu Li, Yuejin Xie, Jiaxuan Guo, Jia Xu, Chaochao Lu, Bowen Zhou, Xia Hu, Jing Shao | cs.AI, cs.CL, cs.CV, cs.CY, cs.LG | 2026-02-16 | |
| TruthStance: An Annotated Dataset of Conversations on Truth Social | Fathima Ameen, Danielle Brown, Manusha Malgareddy, Amanul Haque | cs.CL, cs.AI | 2026-02-16 | |
| An end-to-end agentic pipeline for smart contract translation and quality evaluation | Abhinav Goel, Chaitya Shah, Agostino Capponi, Alfio Gliozzo | cs.AI, cs.SE | 2026-02-14 | |
| Never say never: Exploring the effects of available knowledge on agent persuasiveness in controlled physiotherapy motivation dialogues | Stephan Vonschallen, Rahel Häusler, Theresa Schmiedel, Friederike Eyssel | cs.HC, cs.AI | 2026-02-13 | |
| WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models | Yangzhuo Li, Shengpeng Ji, Yifu Chen, Tianle Liang, Haorong Ying, Yule Wang, Junbo Li, Jun Fang, Zhou Zhao | cs.CL | 2026-02-12 | |
| Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? | Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev | cs.SE, cs.AI | 2026-02-12 | |
| Do Large Language Models Adapt to Language Variation across Socioeconomic Status? | Elisa Bassignana, Mike Zhang, Dirk Hovy, Amanda Cercas Curry | cs.CL | 2026-02-12 | |
| RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation | Jinfang Wang, Jiajie Liu, Jianwei Wu, Ziqin Luo, Zhen Chen, Chunlei Li, Biao Han, Tao Deng, Yi Li, Shuanglong Li, Lin Liu | cs.AI | 2026-02-12 | |
| AIR: Improving Agent Safety through Incident Response | Zibo Xiao, Jun Sun, Junjie Chen | cs.AI | 2026-02-12 | |
| TRACER: Trajectory Risk Aggregation for Critical Episodes in Agentic Reasoning | Sina Tayebati, Divake Kumar, Nastaran Darabi, Davide Ettori, Ranganath Krishnan, Amit Ranjan Trivedi | cs.AI | 2026-02-11 | |
| Learning to Compose for Cross-domain Agentic Workflow Generation | Jialiang Wang, Shengxiang Xu, Hanmo Liu, Jiachuan Wang, Yuyu Luo, Shimin Di, Min-Ling Zhang, Lei Chen | cs.MA, cs.AI, cs.LG, cs.SE | 2026-02-11 | |
| AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models | Wentao Zhang, Mingxuan Zhao, Jincheng Gao, Jieshun You, Huaiyu Jia, Yilei Zhao, Bo An, Shuo Sun | q-fin.TR, cs.AI | 2026-02-10 | |
| Towards Poisoning Robustness Certification for Natural Language Generation | Mihnea Ghitu, Matthew Wicker | cs.LG | 2026-02-10 | |
| Large Language Models for Designing Participatory Budgeting Rules | Nguyen Thach, Xingchen Sha, Hau Chan | cs.LG | 2026-02-10 | |
| Accelerating Social Science Research via Agentic Hypothesization and Experimentation | Jishu Sen Gupta, Harini SI, Somesh Kumar Singh, Syed Mohamad Tawseeq, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah, Balaji Krishnamurthy | cs.AI, cs.CL | 2026-02-08 | |
| Exploring AI-Augmented Sensemaking of Patient-Generated Health Data: A Mixed-Method Study with Healthcare Professionals in Cardiac Risk Reduction | Pavithren V S Pakianathan, Rania Islambouli, Diogo Branco, Albrecht Schmidt, Tiago Guerreiro, Jan David Smeddinck | cs.HC, cs.AI | 2026-02-05 | |
| Generative Ontology: When Structured Knowledge Learns to Create | Benny Cheung | cs.AI, cs.CL | 2026-02-05 | |
| Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning | John Yan, Michael Yu, Yuqi Sun, Alexander Duffy, Tyler Marques, Matthew Lyle Olson | cs.LG, cs.AI | 2026-02-05 | |
| RA-QA: Towards Respiratory Audio-based Health Question Answering | Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo | cs.SD, cs.LG, eess.AS | 2026-02-04 | |
| ProxyWar: Dynamic Assessment of LLM Code Generation in Game Arenas | Wenjun Peng, Xinyu Wang, Qi Wu | cs.SE, cs.AI | 2026-02-04 | |
| A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model | Xiaolin Hu, Hang Yuan, Xinzhu Sang, Binbin Yan, Zhou Yu, Cong Huang, Kai Chen | cs.LG, cs.AI, cs.SD | 2026-02-04 | |
| From Crafting Text to Crafting Thought: Grounding AI Writing Support to Writing Center Pedagogy | Yijun Liu, John Gallagher, Sarah Sterman, Tal August | cs.HC | 2026-02-03 | |
| The Necessity of a Unified Framework for LLM-Based Agent Evaluation | Pengyu Zhu, Li Sun, Philip S. Yu, Sen Su | cs.AI | 2026-02-03 | |
| GuideWeb: A Benchmark for Automatic In-App Guide Generation on Real-World Web UIs | Chengguang Gan, Yoshihiro Tsujii, Yunhao Liang, Tatsunori Mori, Shiwen Ni, Hiroki Itoh | cs.CL | 2026-02-02 | |
| Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles | Shaohan Wang, Benfeng Xu, Licheng Zhang, Mingxuan Du, Chiwei Zhu, Xiaorui Wang, Zhendong Mao, Yongdong Zhang | cs.CL | 2026-02-02 | |
| PedagoSense: A Pedology Grounded LLM System for Pedagogical Strategy Detection and Contextual Response Generation in Learning Dialogues | Shahem Sultan, Shahem Fadi, Yousef Melhim, Ibrahim Alsarraj, Besher Hassan | cs.CL | 2026-02-01 | |
| PaperBanana: Automating Academic Illustration for AI Scientists | Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon | cs.CL, cs.CV | 2026-01-30 | |
| WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents | Yao Zhang, Shijie Tang, Zeyu Li, Zhen Han, Volker Tresp | cs.AI | 2026-01-29 | |
| Embodied Task Planning via Graph-Informed Action Generation with Large Language Model | Xiang Li, Ning Yan, Masood Mortazavi | cs.CL | 2026-01-29 | |
| More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests | Haoming Huang, Pongchai Jaisri, Shota Shimizu, Lingfeng Chen, Sota Nakashima, Gema Rodríguez-Pérez | cs.SE, cs.AI, cs.HC | 2026-01-29 | |
| Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement | Kaiyuan Wu, Aditya Nagori, Rishikesan Kamaleswaran | cs.AI, cs.MA | 2026-01-28 | |
| A Dialectic Pipeline for Improving LLM Robustness | Sara Candussio | cs.CL, cs.MA | 2026-01-28 | |
| RobustExplain: Evaluating Robustness of LLM-Based Explanation Agents for Recommendation | Guilin Zhang, Kai Zhao, Jeffrey Friedman, Xu Chu | cs.IR, cs.AI, cs.LG | 2026-01-27 | |
| Assessing the Quality of Mental Health Support in LLM Responses through Multi-Attribute Human Evaluation | Abeer Badawi, Md Tahmid Rahman Laskar, Elahe Rahimi, Sheri Grach, Lindsay Bertrand, Lames Danok, Frank Rudzicz, Jimmy Huang, Elham Dolatabadi | cs.AI, cs.HC | 2026-01-26 | |
| LegalMALR:Multi-Agent Query Understanding and LLM-Based Reranking for Chinese Statute Retrieval | Yunhan Li, Mingjie Xie, Gaoli Kang, Zihan Gong, Gengshen Wu, Min Yang | cs.IR, cs.CL | 2026-01-25 | |
| Status Hierarchies in Language Models | Emilio Barkett | cs.HC, cs.AI, cs.CL | 2026-01-24 | |
| The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents | Chen Chen, Kim Young Il, Yuan Yang, Wenhao Su, Yilin Zhang, Xueluan Gong, Qian Wang, Yongsen Zheng, Ziyao Liu, Kwok-Yan Lam | cs.CL | 2026-01-24 | |
| On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification | David Condrey | cs.CR, cs.AI, cs.HC | 2026-01-24 | |
| LLMs Got Rhythm? Hybrid Phonological Filtering for Greek Poetry Rhyme Detection and Generation | Stergios Chatzikyriakidis | cs.CL | 2026-01-14 | |
| Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models | Santiago Martínez Novoa, Nicolás Rozo Fajardo, Diego Alejandro González Vargas, Nicolás Bedoya Figueroa | cs.CL | 2026-01-14 | |
| Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity | Samhita Bollepally, Aurora Sloman-Moll, Takashi Yamauchi | cs.CL, cs.AI | 2026-01-14 | |
| OpenMic: A Multi-Agent-Based Stand-Up Comedy Generation System | Yuyang Wu, Hanzhong Cao, Jianhao Chen, Yufei Li | cs.AI | 2026-01-13 | |
| Order in the Evaluation Court: A Critical Analysis of NLG Evaluation Trends | Jing Yang, Nils Feldhus, Salar Mohtaj, Leonhard Hennig, Qianli Wang, Eleni Metheniti, Sherzod Hakimov, Charlott Jakob, Veronika Solopova, Konrad Rieck, David Schlangen, Sebastian Möller, Vera Schmitt | cs.CL | 2026-01-12 | |
| PsyCLIENT: Client Simulation via Conversational Trajectory Modeling for Trainee Practice and Model Evaluation in Mental Health Counseling | Huachuan Qiu, Zhaoming Chen, Yuqian Chen, Yuan Xie, Yu Lu, Zhenzhong Lan | cs.CL | 2026-01-12 | |
| Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation (Extended Version) | Aja Khanal, Kaushik T. Ranade, Rishabh Agrawal, Kalyan S. Basu, Apurva Narayan | cs.MA | 2026-01-12 | |
| Can a Unimodal Language Agent Provide Preferences to Tune a Multimodal Vision-Language Model? | Sazia Tabasum Mim, Jack Morris, Manish Dhakal, Yanming Xiu, Maria Gorlatova, Yi Ding | cs.CL | 2026-01-10 | |
| STELP: Secure Transpilation and Execution of LLM-Generated Programs | Swapnil Shinde, Sahil Wadhwa, Andy Luo, Akshay Gupta, Mohammad Shahed Sorower | cs.SE, cs.AI | 2026-01-09 | |
| A Preliminary Agentic Framework for Matrix Deflation | Paimon Goulart, Evangelos E. Papalexakis | cs.LG | 2026-01-06 | |
| The Path Ahead for Agentic AI: Challenges and Opportunities | Nadia Sibai, Yara Ahmed, Serry Sibaee, Sawsan AlHalawani, Adel Ammar, Wadii Boulila | cs.AI | 2026-01-06 | |
| AgentMark: Utility-Preserving Behavioral Watermarking for Agents | Kaibo Huang, Jin Tan, Yukun Wei, Wanling Li, Zipei Zhang, Hui Tian, Zhongliang Yang, Linna Zhou | cs.CR, cs.AI | 2026-01-05 | |
| WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics | Chenxu Liu, Yingjie Fu, Wei Yang, Ying Zhang, Tao Xie | cs.SE, cs.AI | 2026-01-05 | |
| CaveAgent: Transforming LLMs into Stateful Runtime Operators | Maohao Ran, Zhenglin Wan, Cooper Lin, Yanting Zhang, Hongyu Xin, Hongwei Fan, Yibo Xu, Beier Luo, Yaxin Zhou, Wangbo Zhao, Lijie Yang, Lang Feng, Fuchao Yang, Jingxuan Wu, Yiqiao Huang, Chendong Ma, Dailing Jiang, Jianbo Deng, Sihui Han, Bo An, Yike Guo, Jun Song | cs.AI, cs.SE | 2026-01-04 | |
| MAMA-Memeia! Multi-Aspect Multi-Agent Collaboration for Depressive Symptoms Identification in Memes | Siddhant Agarwal, Adya Dhuler, Polly Ruhnke, Melvin Speisman, Md Shad Akhtar, Shweta Yadav | cs.CL | 2025-12-31 | |
| Do Large Language Models Know What They Are Capable Of? | Casey O. Barkan, Sid Black, Oliver Sourbut | cs.CL, cs.AI | 2025-12-31 | |
| The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models | Giuseppe Canale, Kashyap Thimmaraju | cs.CR, cs.AI, cs.CY, cs.HC | 2025-12-30 | |
| Web World Models | Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, Mengdi Wang | cs.AI, cs.CL, cs.CV | 2025-12-29 | |
| TCEval: Using Thermal Comfort to Assess Cognitive and Perceptual Abilities of AI | Jingming Li | cs.AI | 2025-12-29 | |
| AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents | Bhanu Prakash Vangala, Ali Adibifar, Tanu Malik, Ashish Gehani | cs.SE, cs.AI, cs.MA | 2025-12-26 | |
| Emotion Diffusion in Real and Simulated Social Graphs: Structural Limits of LLM-Based Social Simulation | Qiqi Qiang | cs.SI | 2025-12-24 | |
| NVIDIA Nemotron 3: Efficient and Open Intelligence | NVIDIA, :, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Amy Shen, Anahita Bhiwandiwalla, Andrew Tao, Anjulie Agrusa, Ankur Verma, Ann Guan, Anubhav Mandarwal, Arham Mehta, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asma Kuriparambil Thekkumpate, Ayush Dattagupta, Banghua Zhu, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Besmira Nushi, Bilal Kartal, Bita Darvish Rouhani, Boris Ginsburg, Brandon Norick, Brandon Soubasis, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Carlo del Mundo, Chantal Hwang, Charles Wang, Cheng-Ping Hsieh, Chenghao Zhang, Chenhan Yu, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christopher Parisien, Collin Neale, Cyril Meurillon, Damon Mosk-Aoyama, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daria Gitman, Daria Levy, Darko Stosic, David Mosallanezhad, Deepak Narayanan, Dhruv Nathawani, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dong Ahn, Duncan Riach, Dusan Stosic, Edgar Minasyan, Edward Lin, Eileen Long, Eileen Peters Long, Elad Segal, Elena Lantz, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Tramel, Erick Galinkin, Erik Pounds, Evan Briones, Evelina Bakhturina, Evgeny Tsykunov, Faisal Ladhak, Fay Wang, Fei Jia, Felipe Soares, Feng Chen, Ferenc Galko, Frank Sun, Frankie Siino, Gal Hubara Agam, Ganesh Ajjanagadde, Gantavya Bhatt, Gargi Prasad, George Armstrong, Gerald Shen, Gorkem Batmaz, Grigor Nalbandyan, Haifeng Qian, Harsh Sharma, Hayley Ross, Helen Ngo, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huizi Mao, Huy C Nguyen, Huy Q Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igor Gitman, Ilya Loshchilov, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jian Zhang, Jiaqi Zeng, Jie Lou, Jimmy Zhang, Jinhang Choi, Jining Huang, Joey Conway, Joey Guman, John Kamalu, Johnny Greco, Jonathan Cohen, Joseph Jennings, Joyjit Daw, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kai Xu, Kan Zhu, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kevin Shih, Kezhi Kong, Khushi Bhardwaj, Kirthi Shankar, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Lawrence McAfee, Laya Sleiman, Leon Derczynski, Li Ding, Lizzie Wei, Lucas Liebenwein, Luis Vega, Maanu Grover, Maarten Van Segbroeck, Maer Rodrigues de Melo, Mahdi Nazemi, Makesh Narsimhan Sreedhar, Manoj Kilaru, Maor Ashkenazi, Marc Romeijn, Marcin Chochowski, Mark Cai, Markus Kliegl, Maryam Moosaei, Matt Kulka, Matvei Novikov, Mehrzad Samadi, Melissa Corpuz, Mengru Wang, Meredith Price, Michael Andersch, Michael Boone, Michael Evans, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Najeeb Nabwani, Natalie Hereth, Nave Assaf, Negar Habibi, Neta Zmora, Netanel Haber, Nicola Sessions, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nishant Sharma, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Puny, Oren Tropp, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Pawel Morkisz, Peter Dykas, Peter Jin, Pinky Xu, Piotr Januszewski, Pranav Prashant Thombre, Prasoon Varshney, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Rabeeh Karimi Mahabadi, Rachit Garg, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Rich Harang, Rick Izzo, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Hesse, Roger Waleffe, Rohit Watve, Roi Koren, Ruoxi Zhang, Russell Hewett, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Sadegh Mahdavi, Sahil Modi, Samuel Kriman, Sangkug Lim, Sanjay Kariyappa, Sanjeev Satheesh, Saori Kaji, Satish Pasumarthi, Saurav Muralidharan, Sean Narentharen, Sean Narenthiran, Seonmyeong Bak, Sergey Kashirsky, Seth Poulos, Shahar Mor, Shanmugam Ramasamy, Shantanu Acharya, Shaona Ghosh, Sharath Turuvekere Sreenivas, Shelby Thomas, Shiqing Fan, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuoyang Ding, Siddharth Singh, Simeng Sun, Smita Ithape, Somshubra Majumdar, Soumye Singhal, Stas Sergienko, Stefania Alborghetti, Stephen Ge, Sugam Dipak Devare, Sumeet Kumar Barua, Suseella Panguluri, Suyog Gupta, Sweta Priyadarshi, Syeda Nahida Akter, Tan Bui, Teodor-Dumitru Ene, Terry Kong, Thanh Do, Tijmen Blankevoort, Tim Moon, Tom Balough, Tomer Asida, Tomer Bar Natan, Tomer Ronen, Tugrul Konuk, Twinkle Vashishth, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vinay Rao, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wenfei Zhou, Will Jennings, William Zhang, Wojciech Prazuch, Xiaowei Ren, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Yigong Qin, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Subara, Yoshi Suhara, Yubo Gao, Zach Moshe, Zhen Dong, Zhongbo Zhu, Zihan Liu, Zijia Chen, Zijie Yan | cs.CL, cs.AI, cs.LG | 2025-12-24 | |
| AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent | Haipeng Luo, Huawen Feng, Qingfeng Sun, Can Xu, Kai Zheng, Yufei Wang, Tao Yang, Han Hu, Yansong Tang, Di Wang | cs.AI, cs.CL, cs.LG | 2025-12-23 | |
| SA-DiffuSeq: Addressing Computational and Scalability Challenges in Long-Document Generation with Sparse Attention | Alexandros Christoforos, Chadbourne Davis | cs.CL, cs.AI | 2025-12-23 | |
| MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts | Alexandros Christoforos, Chadbourne Davis | cs.CL | 2025-12-23 | |
| Distilling to Hybrid Attention Models via KL-Guided Layer Selection | Yanhong Li, Songlin Yang, Shawn Tan, Mayank Mishra, Rameswar Panda, Jiawei Zhou, Yoon Kim | cs.CL, cs.AI | 2025-12-23 | |
| LLM Agents Implement an NLG System from Scratch: Building Interpretable Rule-Based RDF-to-Text Generators | Mateusz Lango, Ondřej Dušek | cs.CL, cs.AI | 2025-12-20 | |
| ShareChat: A Dataset of Chatbot Conversations in the Wild | Yueru Yan, Tuc Nguyen, Bo Su, Melissa Lieffers, Thai Le | cs.CL, cs.AI, cs.HC | 2025-12-19 | |
| Polypersona: Persona-Grounded LLM for Synthetic Survey Responses | Tejaswani Dash, Dinesh Karri, Anudeep Vurity, Gautam Datla, Tazeem Ahmad, Saima Rafi, Rohith Tangudu | cs.CL, cs.AI | 2025-12-16 | |
| Evaluation of AI Ethics Tools in Language Models: A Developers’ Perspective Case Stud | Jhessica Silva, Diego A. B. Moreira, Gabriel O. dos Santos, Alef Ferreira, Helena Maia, Sandra Avila, Helio Pedrini | cs.CY, cs.AI, cs.CL | 2025-12-16 | |
| Workflow is All You Need: Escaping the “Statistical Smoothing Trap” via High-Entropy Information Foraging and Adversarial Pacing | Zhongjie Jiang | cs.CL, cs.AI, cs.CY, q-fin.GN | 2025-12-10 | |
| Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making | Qingyuan Zhang, Yuxi Wang, Cancan Hua, Yulin Huang, Ning Lyu | cs.CL | 2025-12-10 | |
| The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing? | Sadat Shahriar, Navid Ayoobi, Arjun Mukherjee | cs.LG, cs.AI | 2025-12-04 | |
| Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning | Wenlong Tang | cs.LG, cs.AI | 2025-11-28 | |
| Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach | Shuqi Liu, Han Wu, Guanzhi Deng, Jianshu Chen, Xiaoyang Wang, Linqi Song | cs.CL, cs.AI | 2025-11-28 | |
| Adaptive LLM Agents: Toward Personalized Empathetic Care | Priyanka Singh, Sebastian Von Mammen | cs.HC | 2025-11-25 | |
| Deep Research: A Systematic Survey | Zhengliang Shi, Yiqun Chen, Haitao Li, Weiwei Sun, Shiyu Ni, Yougang Lyu, Run-Ze Fan, Bowen Jin, Yixuan Weng, Minjun Zhu, Qiujie Xie, Xinyu Guo, Qu Yang, Jiayi Wu, Jujia Zhao, Xiaqiang Tang, Xinbei Ma, Cunxiang Wang, Jiaxin Mao, Qingyao Ai, Jen-Tse Huang, Wenxuan Wang, Yue Zhang, Yiming Yang, Zhaopeng Tu, Zhaochun Ren | cs.CL, cs.AI, cs.IR | 2025-11-24 | |
| MindEval: Benchmarking Language Models on Multi-turn Mental Health Support | José Pombal, Maya D’Eon, Nuno M. Guerreiro, Pedro Henrique Martins, António Farinhas, Ricardo Rei | cs.CL, cs.AI | 2025-11-23 | |
| NAMeGEn: Creative Name Generation via A Novel Agent-based Multiple Personalized Goal Enhancement Framework | Shanlin Zhou, Xinpeng Wang, Jianxun Lian, Zhenghao Liu, Laks V. S. Lakshmanan, Xiaoyuan Yi, Yongtao Hao | cs.CL, cs.AI, cs.IR, cs.MA, cs.NE | 2025-11-19 | |
| AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR | Gabrial Zencha Ashungafac, Mardhiyah Sanni, Busayo Awobade, Alex Gichamba, Tobi Olatunji | cs.CL | 2025-11-18 | |
| Generalist Foundation Models Are Not Clinical Enough for Hospital Operations | Lavender Y. Jiang, Angelica Chen, Xu Han, Xujin Chris Liu, Radhika Dua, Kevin Eaton, Frederick Wolff, Robert Steele, Jeff Zhang, Anton Alyakin, Qingkai Pan, Yanbing Chen, Karl L. Sangwon, Daniel A. Alber, Jaden Stryker, Jin Vivian Lee, Yindalon Aphinyanaphongs, Kyunghyun Cho, Eric Karl Oermann | cs.CL, cs.AI, cs.LG | 2025-11-17 | |
| Prompt-Based Value Steering of Large Language Models | Giulio Antonio Abbo, Tony Belpaeme | cs.CL, cs.AI | 2025-11-14 | |
| Self-Correcting Large Language Models: Generation vs. Multiple Choice | Hossein A. Rahmani, Satyapriya Krishna, Xi Wang, Mohammadmehdi Naghiaei, Emine Yilmaz | cs.CL, cs.AI | 2025-11-12 | |
| HalluClean: A Unified Framework to Combat Hallucinations in LLMs | Yaxin Zhao, Yu Zhang | cs.CL | 2025-11-12 | |
| Simulating Students with Large Language Models: A Review of Architecture, Mechanisms, and Role Modelling in Education with Generative AI | Luis Marquez-Carpintero, Alberto Lopez-Sellers, Miguel Cazorla | cs.CY, cs.AI, cs.CL | 2025-11-08 | |
| Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance | Mashrur Rahman, Mantaqa abedin, Monowar Zamil Abir, Faizul Islam Ansari, Adib Reza, Farig Yousuf Sadeque, Niloy Farhan | cs.IR, cs.CL | 2025-11-06 | |
| Multi-Agent Collaborative Framework For Math Problem Generation | Kia Karbasi, Kevin Hong, Mohammad Amin Samadi, Gregory Pottie | cs.MA, cs.CL, cs.HC | 2025-11-06 | |
| Bayesian Evaluation of Large Language Model Behavior | Rachel Longjohn, Shang Wu, Saatvik Kher, Catarina Belém, Padhraic Smyth | cs.CL, cs.LG, stat.AP, stat.ML | 2025-11-04 | |
| Hybrid Quantum Transformer for Language Generation | Desheng Kong, Xiangshuo Cui, Jiaying Jin, Jing Xu, Donglin Wang | cs.CL, cs.AI, quant-ph | 2025-11-02 | |
| Fine-Tuning DialoGPT on Common Diseases in Rural Nepal for Medical Conversations | Birat Poudel, Satyam Ghimire, Er. Prakash Chandra Prasad | cs.CL | 2025-11-01 | |
| Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning | Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, Natasha Jaques | cs.CL, cs.AI | 2025-10-31 | |
| CATArena: Evaluation of LLM Agents through Iterative Tournament Competitions | Lingyue Fu, Xin Ding, Yaoming Zhu, Shao Zhang, Lin Qiu, Weiwen Liu, Weinan Zhang, Xuezhi Cao, Xunliang Cai, Jiaxin Ding, Yong Yu | cs.AI, cs.CL | 2025-10-30 | |
| Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations | Syed Zohaib Hassan, Pål Halvorsen, Miriam S. Johnson, Pierre Lison | cs.CL | 2025-10-28 | |
| Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning) | Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier, Sina Zarrieß | cs.CL | 2025-10-23 | |
| Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety | Vamshi Krishna Bonagiri, Ponnurangam Kumaragurum, Khanh Nguyen, Benjamin Plaut | cs.CL | 2025-10-18 | |
| Efficient Seq2seq Coreference Resolution Using Entity Representations | Matt Grenander, Shay B. Cohen, Mark Steedman | cs.CL | 2025-10-16 | |
| Generating Fair Consensus Statements with Social Choice on Token-Level MDPs | Carter Blair, Kate Larson | cs.AI, cs.CL, cs.GT | 2025-10-15 | |
| MADREC: A Multi-Aspect Driven LLM Agent for Explainable and Adaptive Recommendation | Jiin Park, Misuk Kim | cs.IR, cs.AI | 2025-10-15 | |
| CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation | Yee Man Choi, Xuehang Guo, Yi R. Fung, Qingyun Wang | cs.DL | 2025-10-15 | |
| GOAT: A Training Framework for Goal-Oriented Agent with Tools | Hyunji Min, Sangwon Jung, Junyoung Sung, Dosung Lee, Leekyeung Han, Paul Hongsuck Seo | cs.AI | 2025-10-14 | |
| ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory | Yunzhong Xiao, Yangmin Li, Hewei Wang, Yunlong Tang, Zora Zhiruo Wang | cs.CL | 2025-10-08 | |
| What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems | Kiyotada Mori, Seiya Kawano, Chaoran Liu, Carlos Toshinori Ishi, Angel Fernando Garcia Contreras, Koichiro Yoshino | cs.CL | 2025-08-06 | |
| Investigating Hallucination in Conversations for Low Resource Languages | Amit Das, Md. Najib Hasan, Souvika Sarkar, Zheng Zhang, Fatemeh Jamshidi, Tathagata Bhattacharya, Nilanjana Raychawdhury, Dongji Feng, Vinija Jain, Aman Chadha | cs.CL | 2025-07-30 | |
| Teaching Language Models To Gather Information Proactively | Tenghao Huang, Sihao Chen, Muhao Chen, Jonathan May, Longqi Yang, Mengting Wan, Pei Zhou | cs.AI, cs.CL | 2025-07-28 | |
| AI-Driven Generation of Old English: A Framework for Low-Resource Languages | Rodrigo Gabriel Salazar Alva, Matías Nuñez, Cristian López, Javier Martín Arista | cs.CL, cs.AI | 2025-07-27 | |
| CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards | Cheng Liu, Yifei Lu, Fanghua Ye, Jian Li, Xingyu Chen, Feiliang Ren, Zhaopeng Tu, Xiaolong Li | cs.CL | 2025-07-23 | |
| [HF] DialogueForge: LLM Simulation of Human-Chatbot Dialogue | Ruizhe Zhu, Hao Zhu, Yaxuan Li, Syang Zhou, Shijing Cai, Malgorzata Lazuka, Elliott Ash | 2025-07-21 | 1 | |
| On the Semantics of Large Language Models | Martin Schuele | cs.CL, cs.AI | 2025-07-07 | |
| SHNU Multilingual Conversational Speech Recognition System for INTERSPEECH 2025 MLC-SLM Challenge | Yuxiang Mei, Yuang Zheng, Dongxing Xu, Yanhua Long | cs.CL, eess.AS | 2025-07-04 | |
| The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems | Reza Yousefi Maragheh, Yashar Deldjoo | cs.IR | 2025-07-02 | |
| Decision-Oriented Text Evaluation | Yu-Shiang Huang, Chuan-Ju Wang, Chung-Chi Chen | cs.CL | 2025-07-02 |