[*] = found in both arXiv and HF search [HF] = found via HF semantic search
written on 2026-06-06
| title | authors | categories | displaydate | upvotes |
|---|---|---|---|---|
| An Infectious Disease Spread Simulation Based on Large Language Model Decision Making | Yonchanok Khaokaew, Ruochen Kong, Andreas Zufle, Hao Xue, Taylor Anderson, Chandini Raina MacIntyre, Matthew Scotch, Flora D. Salim, David J Heslop | cs.AI | 2026-06-04 | |
| NAVIRA: Decoupled Stochastic Remasking for Masked Diffusion Language Models | Andrey Fomenko, Maksim Kryzhanovskiy, Svetlana Glazyrina, Roman Ischenko | cs.CL | 2026-06-04 | |
| Interpreting Style Representations via Style-Eliciting Prompts | Junghwan Kim, David Jurgens | cs.CL | 2026-06-04 | |
| From Attack Simulation to SIEM Rule: Deterministic Detection-as-Code Synthesis with Probe-Level Traceability | Alexandre Cristovão Maiorano | cs.CR, cs.AI | 2026-06-03 | |
| BEATS: Bootstrapping E-commerce Attribute Taxonomies for Search through Iterative Human-AI Collaboration | Yung-Yu Shih, Shang-Yu Su, Tzu-I Ho, Dongzhe Wang, Yun-Nung Chen | cs.IR, cs.CL | 2026-06-03 | |
| VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark | Amirhossein Dabiriaghdam, Shayan Vassef, Mohammadreza Bakhtiari, Yasamin Medghalchi, Ilker Hacihaliloglu, Mesrob Ohannessian, Lele Wang, Giuseppe Carenini | cs.AI, cs.CL, cs.CV, cs.LG | 2026-06-02 | |
| Building Reliable Long-Form Generation via Hallucination Rejection Sampling | Lin Li, Georgia Channing, Suhaas M Bhat, Gabriel Davis Jones, Yarin Gal | cs.CL, cs.AI, cs.LG | 2026-06-02 | |
| Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks | Malia Barker, Bishal Lakha, Edoardo Serra, Francesco Gullo | cs.CR, cs.AI | 2026-06-02 | |
| StepFinder: A Temporal Semantic Framework for Failure Attribution in Multi-Agent Systems | Taiyu Zhu, Yifan Wu, Weilin Jin, Ying Li, Gang Huang | cs.AI | 2026-06-02 | |
| RUBAS: Rubric-Based Reinforcement Learning for Agent Safety | Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui, Qi Zhu, Fei Mi, Hongning Wang, Minlie Huang | cs.LG, cs.AI, cs.CR | 2026-06-02 | |
| AI as a Tool for Simulation-Based Experiments in Literary Studies | Matthew Wilkens | cs.CL | 2026-06-01 | |
| Argument Collapse: LLMs Flatten Long-Form Public Debate | Yekyung Kim, Yapei Chang, Chau Minh Pham, Mohit Iyyer | cs.CL, cs.AI | 2026-06-01 | |
| When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval | Zhicheng Zhang, Jiwei Tang, Kuicai Dong, Xiaopeng Li, Jieming Zhu, Jingyu Li, Qianhui Zhu, Fengyuan Lu, Wang Jiaheng, Gang Wang, Hai-Tao Zheng, Zhaocheng Du | cs.LG | 2026-05-31 | |
| Linguistics-Aware Non-Distortionary LLM Watermarking | Shinwoo Park, Hyejin Park, Hyeseon An, Yo-Sub Han | cs.CL, cs.AI | 2026-05-30 | |
| Agentic Authoring of Interactive Multiview Visualizations in Genomics | Astrid van den Brandt, Kiroong Choe, Sehi L’Yi, Devin Lange, Nils Gehlenborg | cs.HC, cs.AI | 2026-05-29 | |
| Coupling Language Models with Physics-based Simulation for Synthesis of Inorganic Materials | Edward W. Staley, Tom Arbaugh, Michael Pekala, Alexander New, Christopher D. Stiles, Nam Q. Le, Gregory Bassen, Wyatt Bunstine, Tyrel McQueen | cs.AI, cond-mat.mtrl-sci | 2026-05-29 | |
| Effects of Varying LLM Access on Essay Writing Behavior | Julia Christenson, Karin de Langis, Shirley Anugrah Hayati, Dongyeop Kang | cs.CL, cs.AI, cs.HC | 2026-05-29 | |
| Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models | Pedro Dal Bianco, Jean Paul Nunes Reinhold, Oscar Stanchi, Facundo Quiroga, Franco Ronchetti, Ulisses Brisolara Corrêa | cs.CL, cs.AI | 2026-05-29 | |
| Generating Reports or Repeating Templates? Measuring and Mitigating Template Collapse in 3D CT Report Generation | Tom Maye-Lasserre, Yitong Li, Bailiang Jian, Morteza Ghahremani, Benedikt Wiestler, Christian Wachinger | cs.CV, cs.AI, cs.CL | 2026-05-29 | |
| Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization | Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin, Wenhao Li | cs.MA, cs.AI | 2026-05-28 | |
| Projectional Decoding: Towards Semantic-Aware LLM Generation | Boqi Chen, José Antonio Hernández López, Aren A. Babikian | cs.SE, cs.AI | 2026-05-28 | |
| Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs | Vinay Samuel, Yapei Chang, Mohit Iyyer | cs.CL | 2026-05-28 | |
| Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents | Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù, Leila Kosseim | cs.CL, cs.AI, cs.LG | 2026-05-28 | |
| Inferring Code Correctness from Specification | Tambon Florian, Papadakis Mike | cs.SE, cs.AI | 2026-05-28 | |
| LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs | Jung Hyun Lee, June Yong Yang, Jungwook Choi, Eunho Yang | cs.AI | 2026-05-28 | |
| Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models | Arturo Valdivia, Paolo Burelli | cs.AI | 2026-05-28 | |
| Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback | Evgeny S. Saveliev, Samuel Holt, Nabeel Seedat, David L. Bentley, Jim Weatherall, Mihaela van der Schaar | cs.LG, cs.AI | 2026-05-27 | |
| Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents | Jihyeong Park, Ingeol Baek, Jeonghyun Park, Hwanhee Lee | cs.CL | 2026-05-27 | |
| SYNAPSE: Neuro-Symbolic Visual Thought-to-Text Decoding via Topological Semantic Denoising | Akshaj Murhekar, Abhijit Mishra | cs.LG | 2026-05-27 | |
| Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases | Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee | cs.AI, cs.CL, cs.LG | 2026-05-26 | |
| Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering | Hunter McNichols, Alexander Scarlatos, Mihai Dascalu, Danielle McNamara, Andrew Lan | cs.AI, cs.CL | 2026-05-26 | |
| Generating Robust Portfolios of Optimization Models using Large Language Models | Eleni Straitouri, Cheol Woo Kim, Milind Tambe | cs.AI | 2026-05-26 | |
| Accountable Human-AI Deliberation with LLMs: Scaling Collective Intelligence through Symbiotic Scaffolding | Wajdi Zaghouani | cs.CL | 2026-05-26 | |
| LECTOR: Joint Optimization of Scientific Reasoning Graphs and Introduction Generation | Jiabei Xiao, Yizhou Wang, Chen Tang, Pengze Li, Wanli Ouyang, Shixiang Tang | cs.AI | 2026-05-25 | |
| QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability | Bo Zou, Chao Xu | cs.CL, cs.AI, cs.LG | 2026-05-25 | |
| AutoSG: LLM-Driven Solver Generation Solely from Task Prompts for Expensive Optimization | Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang | cs.CL, cs.AI | 2026-05-25 | |
| Guess the Unified Model: How Much Can We Recover from Generated Images? | Jasin Cekinmez, Ryo Mitsuhashi, Addison J. Wu, Yida Yin | cs.CV, cs.AI | 2026-05-24 | |
| DTO: a Differentiable Training Objective for Effective Counterfactual Story Rewriting | Amelia Girard, Massimo Piccardi | cs.CL | 2026-05-24 | |
| TIGER: Text-Informed Generalized Enzyme-Reaction Retrieval | Yuhang Zhang, Keyan Ding, Peilin Chen, Han Liu, Can Lin, Ruixi Chen, Shiqi Wang, Qi Song | cs.AI, q-bio.BM | 2026-05-23 | |
| Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion | Jaihoon Kim, Taehoon Yoon, Prin Phunyaphibarn, Seungjun Kim, Morteza Mardani, Minhyuk Sung | cs.LG | 2026-05-22 | |
| Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing | Shugang Hao, Lingjie Duan | cs.LG, cs.AI | 2026-05-22 | |
| Graph Alignment Topology as an Inductive Bias for Grounding Detection | Paul Landes, Pranav Herur, Adam Cross, Jimeng Sun | cs.CL, cs.AI | 2026-05-21 | |
| Self-Evolving Multi-Agent Systems via Decentralized Memory | Guangya Hao, Yunbo Long, Zhuokai Zhao | cs.MA | 2026-05-21 | |
| A Multi-Source Framework for Relational Validation of Large Language Models Using Expert-Curated Encyclopedic Sources | Moses Boudourides | cs.SI | 2026-05-21 | |
| Polite on the Surface, Wrong in Practice: A Curated Dataset for Fixing Honorific Failures in Multilingual Bangla Generation | Md. Asaduzzaman Shuvo, Mahedi Hasan, Md. Tashin Parvez, Azizul Haque Noman, Md. Shafayet Hossain Ovi | cs.CL | 2026-05-21 | |
| SWE-Mutation: Can LLMs Generate Reliable Test Suites in Software Engineering? | Yuxuan Sun, Yuze Zhao, Yufeng Wang, Yao Du, Zhiyuan Ma, Jinbo Wang, Mengdi Zhang, Kai Zhang, Zhenya Huang | cs.SE, cs.AI | 2026-05-21 | |
| [HF] NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment | Wenqing Wu, Yi Zhao, Yuzhuo Wang, Siyou Li, Juexi Shao, Yunfei Long, Chengzhi Zhang | 2026-04-13 | ||
| LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics | Farhan Ahmed, Yuya Jeremy Ong, Chad DeLuca | cs.AI, cs.CL, cs.IT | 2026-03-26 | |
| MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination | Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie Hu, Yu Qin, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang | cs.CL | 2026-03-25 | |
| Evaluating LLM-Based Test Generation Under Software Evolution | Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar | cs.SE, cs.AI | 2026-03-24 | |
| Is AI Catching Up to Human Expression? Exploring Emotion, Personality, Authorship, and Linguistic Style in English and Arabic with Six Large Language Models | Nasser A Alsadhan | cs.CL, cs.LG | 2026-03-24 | |
| Parametric Knowledge and Retrieval Behavior in RAG Fine-Tuning for Electronic Design Automation | Julian Oestreich, Maximilian Bley, Frank Binder, Lydia Müller, Maksym Sydorenko, André Alcalde | cs.CL, cs.AI, cs.CE | 2026-03-24 | |
| LLM-guided headline rewriting for clickability enhancement without clickbait | Yehudit Aperstein, Linoy Halifa, Sagiv Bar, Alexander Apartsin | cs.CL, cs.AI | 2026-03-23 | |
| Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch | Stella Eva Tsiapali, Cong-Thanh Do, Kate Knill | cs.CL | 2026-03-23 | |
| Optimizing Multi-Agent Weather Captioning via Text Gradient Descent: A Training-Free Approach with Consensus-Aware Gradient Fusion | Shixu Liu | cs.CL | 2026-03-23 | |
| LLM-Based Test Case Generation in DBMS through Monte Carlo Tree Search | Yujia Chen, Yingli Zhou, Fangyuan Zhang, Cuiyun Gao | cs.SE, cs.AI | 2026-03-23 | |
| SafePilot: A Framework for Assuring LLM-enabled Cyber-Physical Systems | Weizhe Xu, Mengyu Liu, Fanxin Kong | cs.RO, cs.AI | 2026-03-23 | |
| MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages | Anri Lombard, Simbarashe Mawere, Temi Aina, Ethan Wolff, Sbonelo Gumede, Elan Novick, Francois Meyer, Jan Buys | cs.CL | 2026-03-21 | |
| D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion Decoding | Jonathan Lys, Vincent Gripon, Bastien Pasdeloup, Axel Marmoret, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene | cs.AI, cs.LG | 2026-03-19 | |
| Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity | Qiawen Ella Liu, Marina Dubova, Henry Conklin, Takumi Harada, Thomas L. Griffiths | cs.AI, cs.CL | 2026-03-19 | |
| Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM | Zizhao Hu, Mohammad Rostami, Jesse Thomason | cs.AI | 2026-03-19 | |
| The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices | Esteban Garces Arias, Nurzhan Sapargali, Christian Heumann, Matthias Aßenmacher | cs.CL, cs.LG, stat.ML | 2026-03-19 | |
| How LLMs Distort Our Written Language | Marwa Abdulhai, Isadora White, Yanming Wan, Ibrahim Qureshi, Joel Leibo, Max Kleiman-Weiner, Natasha Jaques | cs.CL, cs.AI | 2026-03-18 | |
| VeriGrey: Greybox Agent Validation | Yuntong Zhang, Sungmin Kang, Ruijie Meng, Marcel Böhme, Abhik Roychoudhury | cs.AI | 2026-03-18 | |
| Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions | Madhav S. Baidya, S. S. Baidya, Chirag Chawla | cs.CL, cs.AI | 2026-03-18 | |
| Prompt Engineering for Scale Development in Generative Psychometrics | Lara Lee Russell-Lasalandra, Hudson Golino | cs.AI, cs.CL, cs.HC | 2026-03-16 | |
| LLM-Driven Discovery of High-Entropy Catalysts via Retrieval-Augmented Generation | AI Scientists, Xinyi Lin, Danqing Yin, Ying Guo | cond-mat.mtrl-sci, cs.AI | 2026-03-16 | |
| GNNVerifier: Graph-based Verifier for LLM Task Planning | Yu Hao, Qiuyu Wang, Cheng Yang, Yawen Li, Zhiqiang Zhang, Chuan Shi | cs.LG | 2026-03-16 | |
| CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language | Junhang Cheng, Fang Liu, Jia Li, Chengru Wu, Nanxiang Jiang, Li Zhang | cs.SE, cs.AI, cs.CL | 2026-03-15 | |
| Infinite Problem Generator: Verifiably Scaling Physics Reasoning Data with Agentic Workflows | Aditya Sharan, Sriram Hebbale, Dhruv Kumar | cs.CL, cs.AI | 2026-03-15 | |
| Creative Convergence or Imitation? Genre-Specific Homogeneity in LLM-Generated Chinese Literature | Yuanchi Ma, Kaize Shi, Hui He, Zhihua Zhang, Zhongxiang Lei, Ziliang Qiu, Renfen Hu, Jiamou Liu | cs.CL | 2026-03-15 | |
| AI Model Modulation with Logits Redistribution | Zihan Wang, Zhongkui Ma, Xinguo Feng, Zhiyang Mei, Ethan Ma, Derui Wang, Minhui Xue, Guangdong Bai | cs.AI | 2026-03-13 | |
| Experimental evidence of progressive ChatGPT models self-convergence | Konstantinos F. Xylogiannopoulos, Petros Xanthopoulos, Panagiotis Karampelas, Georgios A. Bakamitsos | cs.CL, cs.AI | 2026-03-13 | |
| RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection | He Zhu, Yanshu Li, Wen Liu, Haitian Yang | cs.CL, cs.CR | 2026-03-13 | |
| FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning | Yijun Pan, Weikang Qiu, Qiyao Ma, Mingxuan Ju, Tong Zhao, Neil Shah, Rex Ying | cs.LG | 2026-03-12 | |
| In the LLM era, Word Sense Induction remains unsolved | Anna Mosolova, Marie Candito, Carlos Ramisch | cs.CL | 2026-03-12 | |
| KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation | Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, Shuang Liang | cs.LG, cs.AI, cs.CR | 2026-03-12 | |
| Is this Idea Novel? An Automated Benchmark for Judgment of Research Ideas | Tim Schopf, Michael Färber | cs.CL, cs.AI | 2026-03-11 | |
| Writing literature reviews with AI: principles, hurdles and some lessons learned | Saadi Lahlou, Annabelle Gouttebroze, Atrina Oraee, Julian Madera | cs.CY, cs.AI, cs.HC | 2026-03-08 | |
| AgriPath: A Systematic Exploration of Architectural Trade-offs for Crop Disease Classification | Hamza Mooraj, George Pantazopoulos, Alessandro Suglia | cs.CV, cs.LG | 2026-03-08 | |
| Do Foundation Models Know Geometry? Probing Frozen Features for Continuous Physical Measurement | Yakov Pyotr Shkolnikov | cs.CV, cs.AI | 2026-03-06 | |
| Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models | Sean Lamont, Christian Walder, Paul Montague, Amir Dezfouli, Michael Norrish | cs.CL, cs.AI | 2026-03-05 | |
| Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference | Patrick Wilhelm, Thorsten Wittkopp, Odej Kao | cs.CL | 2026-03-04 | |
| Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation | Chengkai Wang, Baisong Liu | cs.AI | 2026-03-03 | |
| SIGMA: A Semantic-Grounded Instruction-Driven Generative Multi-Task Recommender at AliExpress | Yang Yu, Lei Kou, Huaikuan Yi, Bin Chen, Yayu Cao, Lei Shen, Chao Zhang, Bing Wang, Xiaoyi Zeng | cs.IR, cs.LG | 2026-02-26 | |
| Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking | Haodong Chen, Shengyao Zhuang, Zheng Yao, Guido Zuccon, Teerapong Leelanupab | cs.IR | 2026-02-26 | |
| Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs | Jiří Milička, Hana Bednářová | cs.CL, cs.AI | 2026-02-25 | |
| When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models | Satyam Kumar Navneet, Joydeep Chandra, Yong Zhang | cs.HC, cs.AI, cs.CL | 2026-02-25 | |
| Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs | Heng Wang, Changxing Wu | cs.CL | 2026-02-25 | |
| Sparsity Induction for Accurate Post-Training Pruning of Large Language Models | Minhao Jiang, Zhikai Li, Xuewen Liu, Jing Zhang, Mengjuan Chen, Qingyi Gu | cs.CL, cs.AI | 2026-02-25 | |
| The Design Space of Tri-Modal Masked Diffusion Models | Louis Bethune, Victor Turrisi, Bruno Kacper Mlodozeniec, Pau Rodriguez Lopez, Lokesh Boominathan, Nikhil Bhendawade, Amitis Shidani, Joris Pelemans, Theo X. Olausson, Devon Hjelm, Paul Dixon, Joao Monteiro, Pierre Ablin, Vishnu Banna, Arno Blaas, Nick Henderson, Kari Noriy, Dan Busbridge, Josh Susskind, Marco Cuturi, Irina Belousova, Luca Zappella, Russ Webb, Jason Ramapuram | cs.LG | 2026-02-25 | |
| PaperTrail: A Claim-Evidence Interface for Grounding Provenance in LLM-based Scholarly Q&A | Anna Martin-Boyle, Cara A. C. Leckey, Martha C. Brown, Harmanpreet Kaur | cs.HC, cs.CL | 2026-02-24 | |
| HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG | Yuqi Huang, Ning Liao, Kai Yang, Anning Hu, Shengchao Hu, Xiaoxing Wang, Junchi Yan | cs.AI | 2026-02-24 | |
| Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content | Simon Münker, Nils Schwager, Kai Kugler, Michael Heseltine, Achim Rettinger | cs.CL, cs.AI | 2026-02-22 | |
| IDLM: Inverse-distilled Diffusion Language Models | David Li, Nikita Gushchin, Dmitry Abulkhanov, Eric Moulines, Ivan Oseledets, Maxim Panov, Alexander Korotin | cs.LG, cs.AI | 2026-02-22 | |
| Feedback-based Automated Verification in Vibe Coding of CAS Adaptation Built on Constraint Logic | Michal Töpfer, František Plášil, Tomáš Bureš, Petr Hnětynka | cs.AI | 2026-02-20 | |
| TFL: Targeted Bit-Flip Attack on Large Language Model | Jingkai Guo, Chaitali Chakrabarti, Deliang Fan | cs.CR, cs.CL, cs.LG | 2026-02-19 | |
| MusicSem: A Semantically Rich Language–Audio Dataset of Natural Music Descriptions | Rebecca Salganik, Teng Tu, Fei-Yueh Chen, Xiaohao Liu, Keifeng Lu, Ethan Luvisia, Zhiyao Duan, Guillaume Salha-Galvan, Anson Kahng, Yunshan Ma, Jian Kang | cs.MM, cs.SD, eess.AS | 2026-02-19 | |
| How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment | Hang Li, Kaiqi Yang, Xianxuan Long, Fedor Filippov, Yucheng Chu, Yasemin Copur-Gencturk, Peng He, Cory Miller, Namsoo Shin, Joseph Krajcik, Hui Liu, Jiliang Tang | cs.AI | 2026-02-17 | |
| FeDecider: An LLM-Based Framework for Federated Cross-Domain Recommendation | Xinrui He, Ting-Wei Li, Tianxin Wei, Xuying Ning, Xinyu He, Wenxuan Bao, Hanghang Tong, Jingrui He | cs.IR | 2026-02-17 | |
| LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models | Ahmed Khaled Khamis, Hesham Ali | cs.CL | 2026-02-17 | |
| CCiV: A Benchmark for Structure, Rhythm and Quality in LLM-Generated Chinese \textit{Ci} Poetry | Shangqing Zhao, Yupei Ren, Yuhao Zhou, Xiaopeng Bai, Man Lan | cs.CL | 2026-02-15 | |
| EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition | Xiao Wang, Xingxing Xiong, Jinfeng Gao, Xufeng Lou, Bo Jiang, Si-bao Chen, Yaowei Wang, Yonghong Tian | cs.CV, cs.AI, cs.NE | 2026-02-13 | |
| WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models | Yangzhuo Li, Shengpeng Ji, Yifu Chen, Tianle Liang, Haorong Ying, Yule Wang, Junbo Li, Jun Fang, Zhou Zhao | cs.CL | 2026-02-12 | |
| Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards | Ryo Mikasa, Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri | cs.LG | 2026-02-12 | |
| Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? | Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev | cs.SE, cs.AI | 2026-02-12 | |
| Do Large Language Models Adapt to Language Variation across Socioeconomic Status? | Elisa Bassignana, Mike Zhang, Dirk Hovy, Amanda Cercas Curry | cs.CL | 2026-02-12 | |
| Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making | Shreya Chappidi, Jatinder Singh, Andra V. Krauze | cs.HC, cs.AI | 2026-02-12 | |
| Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning | Haoran Dang, Cuiling Lan, Hai Wan, Xibin Zhao, Yan Lu | cs.LG | 2026-02-12 | |
| Same Feedback, Different Source: How AI vs. Human Feedback Shapes Learner Engagement | Caitlin Morris, Pattie Maes | cs.HC | 2026-02-11 | |
| Just on Time: Token-Level Early Stopping for Diffusion Language Models | Zahar Kohut, Severyn Shykula, Dmytro Khamula, Mykola Vysotskyi, Taras Rumezhak, Volodymyr Karpiv | cs.LG, cs.CL | 2026-02-11 | |
| Learning to Compose for Cross-domain Agentic Workflow Generation | Jialiang Wang, Shengxiang Xu, Hanmo Liu, Jiachuan Wang, Yuyu Luo, Shimin Di, Min-Ling Zhang, Lei Chen | cs.MA, cs.AI, cs.LG, cs.SE | 2026-02-11 | |
| Beyond Confidence: The Rhythms of Reasoning in Generative Models | Deyuan Liu, Zecheng Wang, Zhanyue Qin, Zhiying Tu, Dianhui Chu, Dianbo Sui | cs.CL, cs.AI | 2026-02-11 | |
| Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity | Haihui Pan, Yuzhong Hong, Shaoke Lv, Junwei Bao, Hongfei Jiang, Yang Song | cs.CL, cs.LG | 2026-02-11 | |
| Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets | Bo Xue, Yunchong Song, Fanghao Shao, Xuekai Zhu, Lin Chen, Luoyi Fu, Xinbing Wang, Zhouhan Lin | cs.AI | 2026-02-11 | |
| Identifying Evidence-Based Nudges in Biomedical Literature with Large Language Models | Jaydeep Chauhan, Mark Seidman, Pezhman Raeisian Parvari, Zhi Zheng, Zina Ben-Miled, Cristina Barboi, Andrew Gonzalez, Malaz Boustani | cs.LG | 2026-02-10 | |
| AmharicIR+Instr: A Two-Dataset Resource for Neural Retrieval and Instruction Tuning | Tilahun Yeshambel, Moncef Garouani, Josiane Mothe | cs.CL, cs.IR | 2026-02-10 | |
| Accelerating Post-Quantum Cryptography via LLM-Driven Hardware-Software Co-Design | Yuchao Liao, Tosiron Adegbija, Roman Lysecky | cs.AR, cs.AI | 2026-02-10 | |
| Large Language Models for Designing Participatory Budgeting Rules | Nguyen Thach, Xingchen Sha, Hau Chan | cs.LG | 2026-02-10 | |
| Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion | Minghan Li, Ercong Nie, Siqi Zhao, Tongna Chen, Huiping Huang, Guodong Zhou | cs.IR, cs.AI | 2026-02-09 | |
| OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation | Yehua Huang, Penglei Sun, Zebin Chen, Zhenheng Tang, Xiaowen Chu | cs.IR, cs.AI | 2026-02-09 | |
| Accelerating Social Science Research via Agentic Hypothesization and Experimentation | Jishu Sen Gupta, Harini SI, Somesh Kumar Singh, Syed Mohamad Tawseeq, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah, Balaji Krishnamurthy | cs.AI, cs.CL | 2026-02-08 | |
| Echoes in the Loop: Diagnosing Risks in LLM-Powered Recommender Systems under Feedback Loops | Donguk Park, Dongwon Lee, Yeon-Chang Lee | cs.HC, cs.IR | 2026-02-07 | |
| Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math | Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Hyunwoo Ko, Amit Agarwal, Sunghee Ahn, Kyong-Ha Lee, Youngjae Yu | cs.CL | 2026-02-06 | |
| [HF] Creative Image Generation with Diffusion Model | Kunpeng Song, Ahmed Elgammal | 2026-01-29 | ||
| [HF] DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing | Qian Cao, Yahui Liu, Wei Bi, Yi Zhao, Ruihua Song, Xiting Wang, Ruiming Tang, Guorui Zhou, Han Li | 2026-01-14 | 3 | |
| Efficient Maintenance of Leiden Communities in Large Dynamic Graphs | Chunxu Lin, Yumao Xie, Yixiang Fang, Yongmin Hu, Yingqian Hu, Chen Cheng | cs.SI, cs.DB, cs.GR | 2026-01-13 | |
| GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation | Dimple Vijay Kochar, Nathaniel Pinckney, Guan-Ting Liu, Chia-Tung Ho, Chenhui Deng, Haoxing Ren, Brucek Khailany | cs.AR, cs.CL, cs.LG | 2026-01-12 | |
| Interpretable Text Classification Applied to the Detection of LLM-generated Creative Writing | Minerva Suvanto, Andrea McGlinchey, Mattias Wahde, Peter J Barclay | cs.CL | 2026-01-12 | |
| PsyCLIENT: Client Simulation via Conversational Trajectory Modeling for Trainee Practice and Model Evaluation in Mental Health Counseling | Huachuan Qiu, Zhaoming Chen, Yuqian Chen, Yuan Xie, Yu Lu, Zhenzhong Lan | cs.CL | 2026-01-12 | |
| Document-Level Zero-Shot Relation Extraction with Entity Side Information | Mohan Raj Chanthran, Soon Lay Ki, Ong Huey Fang, Bhawani Selvaretnam | cs.CL | 2026-01-12 | |
| Can Large Language Models Understand, Reason About, and Generate Code-Switched Text? | Genta Indra Winata, David Anugraha, Patrick Amadeus Irawan, Anirban Das, Haneul Yoo, Paresh Dashore, Shreyas Kulkarni, Ruochen Zhang, Haruki Sakajo, Frederikus Hudi, Anaelia Ovalle, Syrielle Montariol, Felix Gaschi, Michael Anugraha, Rutuj Ravindra Puranik, Zawad Hayat Ahmed, Adril Putra Merin, Emmanuele Chersoni | cs.CL, cs.AI | 2026-01-12 | |
| Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation (Extended Version) | Aja Khanal, Kaushik T. Ranade, Rishabh Agrawal, Kalyan S. Basu, Apurva Narayan | cs.MA | 2026-01-12 | |
| LLM Performance Predictors: Learning When to Escalate in Hybrid Human-AI Moderation Systems | Or Bachar, Or Levi, Sardhendu Mishra, Adi Levi, Manpreet Singh Minhas, Justin Miller, Omer Ben-Porat, Eilon Sheetrit, Jonathan Morra | cs.AI | 2026-01-11 | |
| MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education | Dongsuk Jang, Ziyao Shangguan, Kyle Tegtmeyer, Anurag Gupta, Jan Czerminski, Sophie Chheang, Arman Cohan | cs.CL | 2026-01-11 | |
| BiasLab: A Multilingual, Dual-Framing Framework for Robust Measurement of Output-Level Bias in Large Language Models | William Guey, Wei Zhang, Pei-Luen Patrick Rau, Pierrick Bougault, Vitor D. de Moura, Bertan Ucar, Jose O. Gomes | cs.CL, cs.AI | 2026-01-11 | |
| AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs | Huatao Xu, Zihe Liu, Zilin Zeng, Baichuan Li, Mo Li | cs.HC, cs.AI, cs.CV | 2026-01-11 | |
| Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization | Mizanur Rahman, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque | cs.CL | 2026-01-08 | |
| AI Generated Text Detection | Adilkhan Alikhanov, Aidar Amangeldi, Diar Demeubay, Dilnaz Akhmetzhan, Nurbek Moldakhmetov, Omar Polat, Galymzhan Zharas | cs.CL, cs.AI | 2026-01-07 | |
| Evaluation Framework for AI Creativity: A Case Study Based on Story Generation | Pharath Sathya, Yin Jou Huang, Fei Cheng | cs.CL | 2026-01-07 | |
| A Preliminary Agentic Framework for Matrix Deflation | Paimon Goulart, Evangelos E. Papalexakis | cs.LG | 2026-01-06 | |
| ReTreVal: Reasoning Tree with Validation – A Hybrid Framework for Enhanced LLM Multi-Step Reasoning | Abhishek HS, Pavan C Shekar, Arpit Jain, Ashwanth Krishnan | cs.AI, cs.CL | 2026-01-06 | |
| FlowPlan-G2P: A Structured Generation Framework for Transforming Scientific Papers into Patent Descriptions | Kris W Pan, Yongmin Yoo | cs.CL, cs.AI | 2026-01-05 | |
| Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows | Yingte Shu, Yuchuan Tian, Chao Xu, Yunhe Wang, Hanting Chen | cs.CL, cs.AI | 2026-01-05 | |
| MORE: Multi-Objective Adversarial Attacks on Speech Recognition | Xiaoxue Gao, Zexin Li, Yiming Chen, Nancy F. Chen | eess.AS, cs.AI, cs.LG | 2026-01-05 | |
| WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics | Chenxu Liu, Yingjie Fu, Wei Yang, Ying Zhang, Tao Xie | cs.SE, cs.AI | 2026-01-05 | |
| Can LLMs Track Their Output Length? A Dynamic Feedback Mechanism for Precise Length Regulation | Meiman Xiao, Ante Wang, Qingguo Hu, Zhongjian Miao, Huangjun Shen, Longyue Wang, Weihua Luo, Jinsong Su | cs.CL | 2026-01-05 | |
| QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models | Rachmad Vidya Wicaksana Putra, Pasindu Wickramasinghe, Muhammad Shafique | cs.NE, cs.AI, cs.LG | 2026-01-02 | |
| How Large Language Models Systematically Misrepresent American Climate Opinions | Sola Kim, Jieshu Wang, Marco A. Janssen, John M. Anderies | cs.CY, cs.AI | 2025-12-29 | |
| Web World Models | Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, Mengdi Wang | cs.AI, cs.CL, cs.CV | 2025-12-29 | |
| [HF] Divergent-Convergent Thinking in Large Language Models for Creative Problem Generation | Manh Hung Nguyen, Adish Singla | 2025-12-29 | ||
| Divergent-Convergent Thinking in Large Language Models for Creative Problem Generation | Manh Hung Nguyen, Adish Singla | cs.AI | 2025-12-29 | |
| Anka: A Domain-Specific Language for Reliable LLM Code Generation | Saif Khalfan Saif Al Mazrouei | cs.CL, cs.LG, cs.PL, cs.SE | 2025-12-29 | |
| Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process | Zhijun Chen, Zeyu Ji, Qianren Mao, Junhang Cheng, Bangjie Qin, Hao Wu, Zhuoran Li, Jingzheng Li, Kai Sun, Zizhe Wang, Yikun Ban, Zhu Sun, Xiangyang Ji, Hailong Sun | cs.CL, cs.AI | 2025-12-29 | |
| Not too long do read: Evaluating LLM-generated extreme scientific summaries | Zhuoqi Lyu, Qing Ke | cs.CL, cs.AI | 2025-12-29 | |
| BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks | Md. Rakibul Islam, Md. Kamrozzaman Bhuiyan, Safwan Muntasir, Arifur Rahman Jawad, Most. Sharmin Sultana Samu | cs.CV, cs.AI | 2025-12-25 | |
| Quadrupped-Legged Robot Movement Plan Generation using Large Language Model | Muhtadin, Vincentius Gusti Putu A. B. M., Ahmad Zaini, Mauridhi Hery Purnomo, I Ketut Eddy Purnama, Chastine Fatichah | cs.RO, cs.HC | 2025-12-24 | |
| Emotion Diffusion in Real and Simulated Social Graphs: Structural Limits of LLM-Based Social Simulation | Qiqi Qiang | cs.SI | 2025-12-24 | |
| NVIDIA Nemotron 3: Efficient and Open Intelligence | NVIDIA, :, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Amy Shen, Anahita Bhiwandiwalla, Andrew Tao, Anjulie Agrusa, Ankur Verma, Ann Guan, Anubhav Mandarwal, Arham Mehta, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asma Kuriparambil Thekkumpate, Ayush Dattagupta, Banghua Zhu, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Besmira Nushi, Bilal Kartal, Bita Darvish Rouhani, Boris Ginsburg, Brandon Norick, Brandon Soubasis, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Carlo del Mundo, Chantal Hwang, Charles Wang, Cheng-Ping Hsieh, Chenghao Zhang, Chenhan Yu, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christopher Parisien, Collin Neale, Cyril Meurillon, Damon Mosk-Aoyama, Dan Su, Dane Corneil, Daniel Afrimi, Daniel Lo, Daniel Rohrer, Daniel Serebrenik, Daria Gitman, Daria Levy, Darko Stosic, David Mosallanezhad, Deepak Narayanan, Dhruv Nathawani, Dima Rekesh, Dina Yared, Divyanshu Kakwani, Dong Ahn, Duncan Riach, Dusan Stosic, Edgar Minasyan, Edward Lin, Eileen Long, Eileen Peters Long, Elad Segal, Elena Lantz, Ellie Evans, Elliott Ning, Eric Chung, Eric Harper, Eric Tramel, Erick Galinkin, Erik Pounds, Evan Briones, Evelina Bakhturina, Evgeny Tsykunov, Faisal Ladhak, Fay Wang, Fei Jia, Felipe Soares, Feng Chen, Ferenc Galko, Frank Sun, Frankie Siino, Gal Hubara Agam, Ganesh Ajjanagadde, Gantavya Bhatt, Gargi Prasad, George Armstrong, Gerald Shen, Gorkem Batmaz, Grigor Nalbandyan, Haifeng Qian, Harsh Sharma, Hayley Ross, Helen Ngo, Herbert Hum, Herman Sahota, Hexin Wang, Himanshu Soni, Hiren Upadhyay, Huizi Mao, Huy C Nguyen, Huy Q Nguyen, Iain Cunningham, Ido Galil, Ido Shahaf, Igor Gitman, Ilya Loshchilov, Itamar Schen, Itay Levy, Ivan Moshkov, Izik Golan, Izzy Putterman, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jatin Mitra, Jeffrey Glick, Jenny Chen, Jesse Oliver, Jian Zhang, Jiaqi Zeng, Jie Lou, Jimmy Zhang, Jinhang Choi, Jining Huang, Joey Conway, Joey Guman, John Kamalu, Johnny Greco, Jonathan Cohen, Joseph Jennings, Joyjit Daw, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kai Xu, Kan Zhu, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kevin Shih, Kezhi Kong, Khushi Bhardwaj, Kirthi Shankar, Krishna C. Puvvada, Krzysztof Pawelec, Kumar Anik, Lawrence McAfee, Laya Sleiman, Leon Derczynski, Li Ding, Lizzie Wei, Lucas Liebenwein, Luis Vega, Maanu Grover, Maarten Van Segbroeck, Maer Rodrigues de Melo, Mahdi Nazemi, Makesh Narsimhan Sreedhar, Manoj Kilaru, Maor Ashkenazi, Marc Romeijn, Marcin Chochowski, Mark Cai, Markus Kliegl, Maryam Moosaei, Matt Kulka, Matvei Novikov, Mehrzad Samadi, Melissa Corpuz, Mengru Wang, Meredith Price, Michael Andersch, Michael Boone, Michael Evans, Miguel Martinez, Mikail Khona, Mike Chrzanowski, Minseok Lee, Mohammad Dabbah, Mohammad Shoeybi, Mostofa Patwary, Nabin Mulepati, Najeeb Nabwani, Natalie Hereth, Nave Assaf, Negar Habibi, Neta Zmora, Netanel Haber, Nicola Sessions, Nidhi Bhatia, Nikhil Jukar, Nikki Pope, Nikolai Ludwig, Nima Tajbakhsh, Nir Ailon, Nirmal Juluru, Nishant Sharma, Oleksii Hrinchuk, Oleksii Kuchaiev, Olivier Delalleau, Oluwatobi Olabiyi, Omer Ullman Argov, Omri Puny, Oren Tropp, Ouye Xie, Parth Chadha, Pasha Shamis, Paul Gibbons, Pavlo Molchanov, Pawel Morkisz, Peter Dykas, Peter Jin, Pinky Xu, Piotr Januszewski, Pranav Prashant Thombre, Prasoon Varshney, Pritam Gundecha, Przemek Tredak, Qing Miao, Qiyu Wan, Rabeeh Karimi Mahabadi, Rachit Garg, Ran El-Yaniv, Ran Zilberstein, Rasoul Shafipour, Rich Harang, Rick Izzo, Rima Shahbazyan, Rishabh Garg, Ritika Borkar, Ritu Gala, Riyad Islam, Robert Hesse, Roger Waleffe, Rohit Watve, Roi Koren, Ruoxi Zhang, Russell Hewett, Russell J. Hewett, Ryan Prenger, Ryan Timbrook, Sadegh Mahdavi, Sahil Modi, Samuel Kriman, Sangkug Lim, Sanjay Kariyappa, Sanjeev Satheesh, Saori Kaji, Satish Pasumarthi, Saurav Muralidharan, Sean Narentharen, Sean Narenthiran, Seonmyeong Bak, Sergey Kashirsky, Seth Poulos, Shahar Mor, Shanmugam Ramasamy, Shantanu Acharya, Shaona Ghosh, Sharath Turuvekere Sreenivas, Shelby Thomas, Shiqing Fan, Shreya Gopal, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shuoyang Ding, Siddharth Singh, Simeng Sun, Smita Ithape, Somshubra Majumdar, Soumye Singhal, Stas Sergienko, Stefania Alborghetti, Stephen Ge, Sugam Dipak Devare, Sumeet Kumar Barua, Suseella Panguluri, Suyog Gupta, Sweta Priyadarshi, Syeda Nahida Akter, Tan Bui, Teodor-Dumitru Ene, Terry Kong, Thanh Do, Tijmen Blankevoort, Tim Moon, Tom Balough, Tomer Asida, Tomer Bar Natan, Tomer Ronen, Tugrul Konuk, Twinkle Vashishth, Udi Karpas, Ushnish De, Vahid Noorozi, Vahid Noroozi, Venkat Srinivasan, Venmugil Elango, Victor Cui, Vijay Korthikanti, Vinay Rao, Vitaly Kurin, Vitaly Lavrukhin, Vladimir Anisimov, Wanli Jiang, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wenfei Zhou, Will Jennings, William Zhang, Wojciech Prazuch, Xiaowei Ren, Yashaswi Karnati, Yejin Choi, Yev Meyer, Yi-Fu Wu, Yian Zhang, Yigong Qin, Ying Lin, Yonatan Geifman, Yonggan Fu, Yoshi Subara, Yoshi Suhara, Yubo Gao, Zach Moshe, Zhen Dong, Zhongbo Zhu, Zihan Liu, Zijia Chen, Zijie Yan | cs.CL, cs.AI, cs.LG | 2025-12-24 | |
| AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration | Ruiqi Wang, Xinchen Wang, Cuiyun Gao, Chun Yong Chong, Xin Xia, Qing Liao | cs.SE, cs.AI | 2025-12-23 | |
| CodeSimpleQA: Scaling Factuality in Code Large Language Models | Jian Yang, Wei Zhang, Yizhi Li, Shawn Guo, Haowen Wang, Aishan Liu, Ge Zhang, Zili Wang, Zhoujun Li, Xianglong Liu, Weifeng Lv | cs.CL | 2025-12-22 | |
| VIGOR+: Iterative Confounder Generation and Validation via LLM-CEVAE Feedback Loop | JiaWei Zhu, ZiHeng Liu | cs.AI, cs.LG | 2025-12-22 | |
| Identifying Features Associated with Bias Against 93 Stigmatized Groups in Language Models and Guardrail Model Safety Mitigation | Anna-Maria Gueorguieva, Aylin Caliskan | cs.CL, cs.AI, cs.LG | 2025-12-22 | |
| Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding | Ruiqi Ma, Yu Yan, Chunhong Zhang, Minghao Yin, XinChao Liu, Zhihong Jin, Zheng Hu | cs.CV, cs.CL | 2025-12-22 | |
| MemEvolve: Meta-Evolution of Agent Memory Systems | Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, Shuicheng Yan | cs.CL, cs.MA | 2025-12-21 | |
| LLM Agents Implement an NLG System from Scratch: Building Interpretable Rule-Based RDF-to-Text Generators | Mateusz Lango, Ondřej Dušek | cs.CL, cs.AI | 2025-12-20 | |
| Inflation Attitudes of Large Language Models | Nikoleta Anesti, Edward Hill, Andreas Joseph | cs.CL, econ.EM | 2025-12-16 | |
| Embedding-Based Rankings of Educational Resources based on Learning Outcome Alignment: Benchmarking, Expert Validation, and Learner Performance | Mohammadreza Molavi, Mohammad Moein, Mohammadreza Tavakoli, Abdolali Faraji, Stefan T. Mol, Gábor Kismihók | cs.CY, cs.AI | 2025-12-15 | |
| MineTheGap: Automatic Mining of Biases in Text-to-Image Models | Noa Cohen, Nurit Spingarn-Eliezer, Inbar Huberman-Spiegelglas, Tomer Michaeli | cs.CV, cs.LG | 2025-12-15 | |
| Pre-review to Peer review: Pitfalls of Automating Reviews using Large Language Models | Akhil Pandey Akella, Harish Varma Siravuri, Shaurya Rohatgi | cs.DL, cs.AI, cs.CY | 2025-12-14 | |
| HyperEdit: Unlocking Instruction-based Text Editing in LLMs via Hypernetworks | Yiming Zeng, Jinghan Cao, Zexin Li, Wanhao Yu, Zhankai Ye, Dawei Xiang, Ting Hua, Xin Liu, Shangqian Gao, Tingting Yu | cs.CL, cs.LG | 2025-12-14 | |
| Beyond the Black Box: Identifiable Interpretation and Control in Generative Models via Causal Minimality | Lingjing Kong, Shaoan Xie, Guangyi Chen, Yuewen Sun, Xiangchen Song, Eric P. Xing, Kun Zhang | cs.LG | 2025-12-11 | |
| LLM-Auction: Generative Auction towards LLM-Native Advertising | Chujie Zhao, Qun Hu, Shiping Song, Dagui Chen, Han Zhu, Jian Xu, Bo Zheng | cs.GT, cs.AI, cs.LG | 2025-12-11 | |
| Semantic Reconstruction of Adversarial Plagiarism: A Context-Aware Framework for Detecting and Restoring “Tortured Phrases” in Scientific Literature | Agniva Maiti, Prajwal Panth, Suresh Chandra Satapathy | cs.CL | 2025-12-11 | |
| INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT | Idan Tankel, Nir Mazor, Rafi Brada, Christina LeBedis, Guy ben-Yosef | cs.LG, cs.AI, cs.CV, eess.IV | 2025-12-10 | |
| PARAN: Persona-Augmented Review ANswering system on Food Delivery Review Dataset | Moonsoo Park, Jeongseok Yun, Bohyung Kim | cs.CL, cs.AI | 2025-12-10 | |
| Generate-Then-Validate: A Novel Question Generation Approach Using Small Language Models | Yumou Wei, John Stamper, Paulo F. Carvalho | cs.CL, cs.HC | 2025-12-10 | |
| Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition | João Lucas Luz Lima Sarcinelli, Diego Furtado Silva | cs.LG | 2025-12-10 | |
| Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection | Paloma Piot, David Otero, Patricia Martín-Rodilla, Javier Parapar | cs.CL, cs.AI | 2025-12-10 | |
| ImageTalk: Designing a Multimodal AAC Text Generation System Driven by Image Recognition and Natural Language Generation | Boyin Yang, Puming Jiang, Per Ola Kristensson | cs.HC, cs.AI, cs.CV | 2025-12-10 | |
| Large Language Models for Education and Research: An Empirical and User Survey-based Analysis | Md Mostafizer Rahman, Ariful Islam Shiplu, Md Faizul Ibne Amin, Yutaka Watanobe, Lu Peng | cs.AI | 2025-12-08 | |
| MINES: Explainable Anomaly Detection through Web API Invariant Inference | Wenjie Zhang, Yun Lin, Chun Fung Amos Kwok, Xiwen Teoh, Xiaofei Xie, Frank Liauw, Hongyu Zhang, Jin Song Dong | cs.SE, cs.CR, cs.DB, cs.LG | 2025-12-07 | |
| LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts | Krunal Jesani, Dmitry Ignatov, Radu Timofte | cs.LG, cs.AI, cs.CL, cs.CV | 2025-12-07 | |
| Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains | Ben Malin, Tatiana Kalganova, Nikolaos Boulgouris | cs.CL, cs.AI | 2025-12-05 | |
| Decoding the Black Box: Discerning AI Rhetorics About and Through Poetic Prompting | P. D. Edgar, Alia Hall | cs.CL, cs.CY | 2025-12-04 | |
| Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models | Haidong Kang, Wei Wu, Hanling Wang | cs.LG | 2025-12-03 | |
| LLM-Generated Ads: From Personalization Parity to Persuasion Superiority | Elyas Meguellati, Stefano Civelli, Lei Han, Abraham Bernstein, Shazia Sadiq, Gianluca Demartini | cs.CY, cs.CL | 2025-12-03 | |
| ASCIIBench: Evaluating Language-Model-Based Understanding of Visually-Oriented Text | Kerry Luo, Michael Fu, Joshua Peguero, Husnain Malik, Anvay Patil, Joyce Lin, Megan Van Overborg, Ryan Sarmiento, Kevin Zhu | cs.LG | 2025-12-02 | |
| PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models | Robert Belanec, Ivan Srba, Maria Bielikova | cs.CL | 2025-12-02 | |
| DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses | Han Luo, Guy Laban | cs.AI, cs.HC, cs.MA | 2025-12-01 | |
| InstructLR: A Scalable Approach to Create Instruction Dataset for Under-Resourced Languages | Mamadou K. Keita, Sebastien Diarra, Christopher Homan, Seydou Diallo | cs.LG | 2025-12-01 | |
| First, do NOHARM: towards clinically safe large language models | David Wu, Fateme Nateghi Haredasht, Saloni Kumar Maharaj, Priyank Jain, Jessica Tran, Matthew Gwiazdon, Arjun Rustagi, Jenelle Jindal, Jacob M. Koshy, Vinay Kadiyala, Anup Agarwal, Bassman Tappuni, Brianna French, Sirus Jesudasen, Christopher V. Cosgriff, Rebanta Chakraborty, Jillian Caldwell, Susan Ziolkowski, David J. Iberri, Robert Diep, Rahul S. Dalal, Kira L. Newman, Kristin Galetta, J. Carl Pallais, Nancy Wei, Kathleen M. Buchheit, David I. Hong, Ernest Y. Lee, Allen Shih, Vartan Pahalyants, Tamara B. Kaplan, Vishnu Ravi, Sarita Khemani, April S. Liang, Daniel Shirvani, Advait Patil, Nicholas Marshall, Kanav Chopra, Joel Koh, Adi Badhwar, Liam G. McCoy, David J. H. Wu, Yingjie Weng, Sumant Ranji, Kevin Schulman, Nigam H. Shah, Jason Hom, Arnold Milstein, Adam Rodman, Jonathan H. Chen, Ethan Goh | cs.CY, cs.AI | 2025-12-01 | |
| Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics | Jinu Lee, Kyoung-Woon On, Simeng Han, Arman Cohan, Julia Hockenmaier | cs.AI, cs.CL | 2025-11-30 | |
| [HF] Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey | Fatemeh Shahhosseini, Arash Marioriyad, Ali Momen, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban, Shaghayegh Haghjooy Javanmard | 2025-11-05 | 3 | |
| [HF] VLM-Guided Adaptive Negative Prompting for Creative Generation | Shelly Golan, Yotam Nitzan, Zongze Wu, Or Patashnik | 2025-10-12 | 4 | |
| [HF] Combinatorial Creativity: A New Frontier in Generalization Abilities | Samuel Schapiro, Sumuk Shashidhar, Alexi Gladstone, Jonah Black, Royce Moon, Dilek Hakkani-Tur, Lav R. Varshney | 2025-09-25 | 3 | |
| [HF] Evaluating the Creativity of LLMs in Persian Literary Text Generation | Armin Tourajmehr, Mohammad Reza Modarres, Yadollah Yaghoobzadeh | 2025-09-22 | ||
| [HF] Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards | Xiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin | 2025-08-29 | 4 | |
| [HF] The Ramon Llull’s Thinking Machine for Automated Ideation | Xinran Zhao, Boyuan Zheng, Chenglei Si, Haofei Yu, Ken Liu, Runlong Zhou, Ruochen Li, Tong Chen, Xiang Li, Yiming Zhang, Tongshuang Wu | 2025-08-26 | ||
| [HF] ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline | Morris Alper, Moran Yanuka, Raja Giryes, Gašper Beguš | 2025-08-08 | ||
| [HF] The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas | Chenglei Si, Tatsunori Hashimoto, Diyi Yang | 2025-06-25 | ||
| [HF] Harnessing Large Language Models for Scientific Novelty Detection | Yan Liu, Zonglin Yang, Soujanya Poria, Thanh-Son Nguyen, Erik Cambria | 2025-05-30 | 5 | |
| [HF] Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science | Xiao Liu, Xinyi Dong, Xinyang Gao, Yansong Feng, Xun Pang | 2025-05-27 | ||
| [HF] Creative Preference Optimization | Mete Ismayilzada, Antonio Laverghetta Jr., Simone A. Luchini, Reet Patel, Antoine Bosselut, Lonneke van der Plas, Roger Beaty | 2025-05-20 | ||
| [HF] Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination | Moran Mizrahi, Chen Shani, Gabriel Stanovsky, Dan Jurafsky, Dafna Shahaf | 2025-04-29 | ||
| [HF] Spark: A System for Scientifically Creative Idea Generation | Aishik Sanyal, Samuel Schapiro, Sumuk Shashidhar, Royce Moon, Lav R. Varshney, Dilek Hakkani-Tur | 2025-04-25 | ||
| [HF] AI Idea Bench 2025: AI Research Idea Generation Benchmark | Yansheng Qiu, Haoquan Zhang, Zhaopan Xu, Ming Li, Diping Song, Zheng Wang, Kaipeng Zhang | 2025-04-19 | ||
| [HF] Modifying Large Language Model Post-Training for Diverse Creative Writing | John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele, Yuqian Sun, Max Kreminski | 2025-03-21 | 36 | |
| [HF] Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM | Xinyu Fang, Zhijian Chen, Kai Lan, Shengyuan Ding, Yingji Liang, Xiangyu Zhao, Farong Wen, Zicheng Zhang, Guofeng Zhang, Haodong Duan, Kai Chen, Dahua Lin | 2025-03-18 | 48 | |
| [HF] Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art | Hayato Ikoma, Teruko Mitamura | 2025-02-10 | ||
| [HF] Self-reflecting Large Language Models: A Hegelian Dialectical Approach | Sara Abdali, Can Goksen, Saeed Amizadeh, Kazuhito Koishida | 2025-01-24 | ||
| [HF] LiveIdeaBench: Evaluating LLMs’ Scientific Creativity and Idea Generation with Minimal Context | Kai Ruan, Xuan Wang, Jixiang Hong, Hao Sun | 2024-12-23 | 6 | |
| [HF] Learning to Generate Research Idea with Dynamic Control | Ruochen Li, Liqiang Jing, Chi Han, Jiawei Zhou, Xinya Du | 2024-12-19 | ||
| [HF] Benchmarking Linguistic Diversity of Large Language Models | Yanzhu Guo, Guokan Shang, Chloé Clavel | 2024-12-13 | ||
| [HF] Large Language Models show both individual and collective creativity comparable to humans | Luning Sun, Yuzhuo Yuan, Yuan Yao, Yanyan Li, Hao Zhang, Xing Xie, Xiting Wang, Fang Luo, David Stillwell | 2024-12-04 | ||
| [HF] Evaluating Creative Short Story Generation in Humans and Large Language Models | Mete Ismayilzada, Claire Stevenson, Lonneke van der Plas | 2024-11-04 | 1 | |
| [HF] IdeaBench: Benchmarking Large Language Models for Research Idea Generation | Sikun Guo, Amir Hassan Shariatmadari, Guangzhi Xiong, Albert Huang, Eric Xie, Stefan Bekiranov, Aidong Zhang | 2024-10-31 | ||
| [HF] SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He, Jieping Ye | 2024-10-30 | ||
| [HF] LLM Tree Search | Dylan Wilson | 2024-10-24 | 1 | |
| [HF] Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems | Junyi Ye, Jingyi Gu, Xinyun Zhao, Wenpeng Yin, Guiling Wang | 2024-10-24 | ||
| [HF] On the Diversity of Synthetic Data and its Impact on Training Large Language Models | Hao Chen, Abdul Waheed, Xiang Li, Yidong Wang, Jindong Wang, Bhiksha Raj, Marah I. Abdin | 2024-10-19 | ||
| [HF] Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas | Xiang Hu, Hongyu Fu, Jinge Wang, Yifeng Wang, Zhikun Li, Renjun Xu, Yu Lu, Yaochu Jin, Lili Pan, Zhenzhong Lan | 2024-10-18 | ||
| [HF] AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text | Ximing Lu, Melanie Sclar, Skyler Hallinan, Niloofar Mireshghallah, Jiacheng Liu, Seungju Han, Allyson Ettinger, Liwei Jiang, Khyathi Chandu, Nouha Dziri, Yejin Choi | 2024-10-05 | ||
| [HF] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling | Kyuheon Jung, Yongdeuk Seo, Seongwoo Cho, Jaeyoung Kim, Hyun-seok Min, Sungchul Choi | 2024-09-25 | ||
| [HF] A Character-Centric Creative Story Generation via Imagination | Kyeongman Park, Minbeom Kim, Kyomin Jung | 2024-09-25 | ||
| [HF] MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models | Sarfaroz Yunusov, Hamza Sidat, Ali Emami | 2024-09-20 | ||
| [HF] Can Large Language Models Unlock Novel Scientific Research Ideas? | Sandeep Kumar, Tirthankar Ghosal, Vinayak Goyal, Asif Ekbal | 2024-09-10 | 15 | |
| [HF] Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | Chenglei Si, Diyi Yang, Tatsunori Hashimoto | 2024-09-06 | 48 | |
| [HF] Controllable Text Generation for Large Language Models: A Survey | Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, Zhiyu Li | 2024-08-22 | 65 | |
| [HF] Benchmarking Language Model Creativity: A Case Study on Code Generation | Yining Lu, Dixuan Wang, Tianjian Li, Dongwei Jiang, Daniel Khashabi | 2024-07-12 | 4 | |
| [HF] HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing | Jing Chen, Xinyu Zhu, Cheng Yang, Chufan Shi, Yadong Xi, Yuxiang Zhang, Junjie Wang, Jiashu Pu, Rongsheng Zhang, Yujiu Yang, Tian Feng | 2024-06-17 | ||
| [HF] CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts | Zhen Tao, Zhiyu Li, Dinghao Xi, Wei Xu | 2024-06-13 | ||
| [HF] Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking | Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin | 2024-06-09 | 3 | |
| [HF] Creativity Has Left the Chat: The Price of Debiasing Language Models | Behnam Mohammadi | 2024-06-08 | 1 | |
| [HF] SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation | Abhishek Divekar, Greg Durrett | 2024-05-16 | ||
| [HF] Divergent Creativity in Humans and Large Language Models | Antoine Bellemare-Pepin, François Lespinasse, Philipp Thölke, Yann Harel, Kory Mathewson, Jay A. Olson, Yoshua Bengio, Karim Jerbi | 2024-05-13 | ||
| [HF] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun | 2024-05-10 | ||
| [HF] Is Temperature the Creativity Parameter of Large Language Models? | Max Peeperkorn, Tom Kouwenhoven, Dan Brown, Anna Jordanous | 2024-05-01 | ||
| [HF] CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation | Matthew DeLorenzo, Vasudev Gohil, Jeyavijayan Rajendran | 2024-04-12 | ||
| [HF] Assessing and Understanding Creativity in Large Language Models | Yunpu Zhao, Rui Zhang, Wenyi Li, Di Huang, Jiaming Guo, Shaohui Peng, Yifan Hao, Yuanbo Wen, Xing Hu, Zidong Du, Qi Guo, Ling Li, Yunji Chen | 2024-01-23 | ||
| [HF] Let’s Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation | Shanshan Zhong, Zhongzhan Huang, Shanghua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou | 2023-12-05 | 1 | |
| [HF] Evaluating Large Language Model Creativity from a Literary Perspective | Murray Shanahan, Catherine Clarke | 2023-11-30 | ||
| [HF] How Far Can We Extract Diverse Perspectives from Large Language Models? | Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang | 2023-11-16 | ||
| [HF] A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing | Carlos Gómez-Rodríguez, Paul Williams | 2023-10-12 | 2 | |
| [HF] GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence | Zhihua Wen, Zhiliang Tian, Wei Wu, Yuxin Yang, Yanqi Shi, Zhen Huang, Dongsheng Li | 2023-10-09 | 4 | |
| [HF] Teach LLMs to Personalize – An Approach inspired by Writing Education | Cheng Li, Mingyang Zhang, Qiaozhu Mei, Yaqing Wang, Spurthi Amba Hombaiah, Yi Liang, Michael Bendersky | 2023-08-15 | 26 | |
| [HF] ConceptLab: Creative Generation using Diffusion Prior Constraints | Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or | 2023-08-03 | 25 | |
| [HF] RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting | Lei Shu, Liangchen Luo, Jayakumar Hoskere, Yun Zhu, Canoee Liu, Simon Tong, Jindong Chen, Lei Meng | 2023-05-25 | 4 | |
| [HF] On the Creativity of Large Language Models | Giorgio Franceschelli, Mirco Musolesi | 2023-03-27 | ||
| [HF] The Next Chapter: A Study of Large Language Models in Storytelling | Zhuohan Xie, Trevor Cohn, Jey Han Lau | 2023-01-24 | ||
| [HF] GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation | Biyang Guo, Yeyun Gong, Yelong Shen, Songqiao Han, Hailiang Huang, Nan Duan, Weizhu Chen | 2022-11-18 | ||
| [HF] Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing | Tuhin Chakrabarty, Vishakh Padmakumar, He He | 2022-10-25 |