Image2text

written on 2024-11-28

title authors categories displaydate
LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation Steven Song, Anirudh Subramanyam, Irene Madejski, Robert L. Grossman cs.CV, cs.CL 2024-11-25
MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model Yifan Wu, Min Zeng, Yang Li, Yang Zhang, Min Li cs.ET, cs.CL 2024-11-23
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation Junhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul Ye cs.CV, cs.LG, eess.IV 2024-11-23
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains Yurii Paniv, Artur Kiulian, Dmytro Chaplynskyi, Mykola Khandoga, Anton Polishko, Tetiana Bas, Guillermo Gabrielli cs.CL 2024-11-22
From Text to Pose to Image: Improving Diffusion Model Control and Quality Clément Bonnet, Ariel N. Lee, Franck Wertel, Antoine Tamano, Tanguy Cizain, Pablo Ducru cs.CV, cs.AI, cs.LG 2024-11-19
Debias your Large Multi-Modal Model at Test-Time with Non-Contrastive Visual Attribute Steering Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard cs.CV, cs.LG 2024-11-15
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions Moran Yanuka, Assaf Ben Kish, Yonatan Bitton, Idan Szpektor, Raja Giryes cs.CV, cs.CL, cs.LG 2024-11-13
Decoding Report Generators: A Cyclic Vision-Language Adapter for Counterfactual Explanations Yingying Fang, Zihao Jin, Shaojie Guo, Jinda Liu, Yijian Gao, Junzhi Ning, Zhiling Yue, Zhi Li, Simon LF Walsh, Guang Yang cs.CV, cs.AI, cs.CL, cs.LG 2024-11-08
PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation Daniel C. Castro, Aurelia Bustos, Shruthi Bannur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores Sánchez-Valverde, Lara Jaques-Pérez, Lourdes Pérez-Rodríguez, Kenji Takeda, José María Salinas, Javier Alvarez-Valle, Joaquín Galant Herrero, Antonio Pertusa cs.AI, cs.CL, cs.CV 2024-11-07
RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Hui Lin, Danfeng Hong, Shuhang Ge, Chuyao Luo, Kai Jiang, Hao Jin, Congcong Wen cs.CV, cs.AI 2024-11-03
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models Georgia Gabriela Sampaio, Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu, Josh Susskind, Navdeep Jaitly, Yizhe Zhang cs.CV, cs.AI 2024-11-02
Using Multimodal Deep Neural Networks to Disentangle Language from Visual Aesthetics Colin Conwell, Christopher Hamblin, Chelsea Boccagno, David Mayo, Jesse Cummings, Leyla Isik, Andrei Barbu cs.CV, cs.CL 2024-10-31
Private Synthetic Text Generation with Diffusion Models Sebastian Ochs, Ivan Habernal cs.CL 2024-10-30
Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data Badr AlKhamissi, Yingtian Tang, Abdülkadir Gökce, Johannes Mehrer, Martin Schrimpf cs.CV, cs.LG 2024-10-29
Towards Visual Text Design Transfer Across Languages Yejin Choi, Jiwan Chung, Sumin Shim, Giyeong Oh, Youngjae Yu cs.CV, cs.AI 2024-10-24
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Soeun Lee, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim cs.CV, cs.AI, cs.CL, cs.LG 2024-09-26
MIO: A Foundation Model on Multimodal Tokens Zekun Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang cs.CL, cs.AI, cs.LG 2024-09-26
Copying style, Extracting value: Illustrators’ Perception of AI Style Transfer and its Impact on Creative Labor Julien Porquet, Sitong Wang, Lydia B. Chilton cs.HC 2024-09-25
Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning Siddharth Betala, Ishan Chokshi cs.CL, cs.AI 2024-09-23
Recommendation with Generative Models Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Arnau Ramisa, Rene Vidal, Maheswaran Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci cs.IR 2024-09-18
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko, Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li cs.CV, cs.AI, cs.GR 2024-09-16
Spatio-Temporal Context Prompting for Zero-Shot Action Detection Wei-Jhe Huang, Min-Hung Chen, Shang-Hong Lai cs.CV, cs.AI 2024-08-28
DIAGen: Diverse Image Augmentation with Generative Models Tobias Lingenberg, Markus Reuter, Gopika Sudhakaran, Dominik Gojny, Stefan Roth, Simone Schaub-Meyer cs.CV, cs.AI 2024-08-26
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara cs.CV, cs.AI, cs.CL, cs.MM 2024-08-26
Cap2Sum: Learning to Summarize Videos by Generating Captions Cairong Zhao, Chutian Wang, Zifan Song, Guosheng Hu, Haonan Chen, Xiaofan Zhai cs.MM 2024-08-23
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation Xiangyu Zhao, Yuehan Zhang, Wenlong Zhang, Xiao-Ming Wu cs.CV, cs.AI 2024-08-21
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao cs.LG, cs.AI 2024-08-19
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description Yizhang Jin, Jian Li, Jiangning Zhang, Jianlong Hu, Zhenye Gan, Xin Tan, Yong Liu, Yabiao Wang, Chengjie Wang, Lizhuang Ma cs.CV, cs.AI 2024-08-09
Wolf: Captioning Everything with a World Summarization Framework Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone cs.LG, cs.CL, cs.CV 2024-07-26
Guided Latent Slot Diffusion for Object-Centric Learning Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth cs.CV, cs.LG 2024-07-25
Generative artificial intelligence in dentistry: Current approaches and future challenges Fabián Villena, Claudia Véliz, Rosario García-Huidobro, Sebastián Aguayo cs.CL 2024-07-24
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez cs.CL, cs.AI, cs.CR, cs.CV, cs.LG 2024-07-21
DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer Jinfeng Wei, Xiaofeng Zhang cs.CL, cs.AI 2024-07-21
KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai cs.AI 2024-07-19
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai, Nathalie Baracaldo cs.CR, cs.AI 2024-07-17
Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts Sam Yu-Te Lee, Aryaman Bahukhandi, Dongyu Liu, Kwan-Liu Ma cs.HC 2024-07-16
CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation Kalliopi Basioti, Mohamed A. Abdelsalam, Federico Fancellu, Vladimir Pavlovic, Afsaneh Fazly cs.CV, cs.AI, cs.CL, cs.LG 2024-07-16
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion Philipp Allgeuer, Kyra Ahrens, Stefan Wermter cs.CV, cs.AI, cs.CL 2024-07-15
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation Seonghoon Yu, Paul Hongsuck Seo, Jeany Son cs.CV, cs.AI 2024-07-10
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification Laura Fieback, Jakob Spiegelberg, Hanno Gottschalk cs.CV, cs.CL, cs.LG, I.4 2024-05-29
Alt4Blind: A User Interface to Simplify Charts Alt-Text Creation Omar Moured, Shahid Ali Farooqui, Karin Muller, Sharifeh Fadaeijouybari, Thorsten Schwarz, Mohammed Javed, Rainer Stiefelhagen cs.CV, cs.HC 2024-05-29
Automatic detection of cognitive impairment in elderly people using an entertainment chatbot with Natural Language Processing capabilities Francisco de Arriba-Pérez, Silvia García-Méndez, Francisco J. González-Castaño, Enrique Costa-Montenegro cs.AI, cs.CL, cs.HC, cs.LG 2024-05-28
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text Han Yu, Peikun Guo, Akane Sano eess.SP, cs.AI 2024-05-26
VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha cs.CV, cs.AI, cs.CL 2024-05-24
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence Ali Kashefi physics.flu-dyn, cs.LG 2024-05-24
Calibrated Self-Rewarding Vision Language Models Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao cs.LG, cs.CL, cs.CV 2024-05-23
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval Young Kyun Jang, Donghyun Kim, Zihang Meng, Dat Huynh, Ser-Nam Lim cs.CV, cs.AI 2024-04-23
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Charith Chandra Sai Balne, Sreyoshi Bhaduri, Tamoghna Roy, Vinija Jain, Aman Chadha cs.LG, cs.AI, cs.CL 2024-04-21
Data Alignment for Zero-Shot Concept Generation in Dermatology AI Soham Gadgil, Mahtab Bigverdi cs.CV, cs.CL, cs.LG 2024-04-19
Incubating Text Classifiers Following User Instruction with Nothing but LLM Letian Peng, Jingbo Shang cs.CL 2024-04-16
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun cs.AI, cs.CL, cs.CV 2024-04-16
Contextual Chart Generation for Cyber Deception David D. Nguyen, David Liebowitz, Surya Nepal, Salil S. Kanhere, Sharif Abuadbba cs.LG, cs.AI, cs.CR 2024-04-07
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching Yang Yang cs.CV, cs.AI, cs.LG 2024-03-26
The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge Dian Chao, Xin Song, Shupeng Zhong, Boyuan Wang, Xiangyu Wu, Chen Zhu, Yang Yang cs.CV, cs.AI 2024-03-26
UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Foundation Model for Urban Indicator Prediction Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen, Yuxuan Liang cs.CV, cs.AI 2024-03-25
Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT Rohit Raju, Peeta Basa Pati, SA Gandheesh, Gayatri Sanjana Sannala, Suriya KS cs.CL 2024-03-25
Visually Guided Generative Text-Layout Pre-training for Document Intelligence Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong cs.CL, cs.CV 2024-03-25
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo cs.CV, cs.AI 2024-03-25
Dia-LLaMA: Towards Large Language Model-driven CT Report Generation Zhixuan Chen, Luyang Luo, Yequan Bie, Hao Chen cs.CV, cs.AI 2024-03-25
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models Minchan Kim, Minyeong Kim, Junik Bae, Suhwan Choi, Sungkyung Kim, Buru Chang cs.CV, cs.CL 2024-03-24
Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content Zhicheng Du, Zhaotian Xie, Huazhang Ying, Likun Zhang, Peiwu Qin cs.CV, cs.AI 2024-03-23
InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection Thales Bertaglia, Lily Heisig, Rishabh Kaushal, Adriana Iamnitchi cs.CY, cs.CL, cs.SI 2024-03-22
SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes Timothee Mickus, Elaine Zosa, Raúl Vázquez, Teemu Vahtola, Jörg Tiedemann, Vincent Segonne, Alessandro Raganato, Marianna Apidianaki cs.CL 2024-03-12
Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning Mark D. McDonnell, Dong Gong, Ehsan Abbasnejad, Anton van den Hengel cs.CV, cs.LG 2024-03-12
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains Marvin Zammit, Antonios Liapis, Georgios N. Yannakakis cs.NE 2024-03-11
One Category One Prompt: Dataset Distillation using Diffusion Models Ali Abbasi, Ashkan Shahbazi, Hamed Pirsiavash, Soheil Kolouri cs.CV, cs.CL, cs.LG 2024-03-11
Narrating Causal Graphs with Large Language Models Atharva Phatak, Vijay K. Mago, Ameeta Agrawal, Aravind Inbasekaran, Philippe J. Giabbanelli cs.CL 2024-03-11
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback Adarsh N L, Arun P V, Aravindh N L cs.CV, cs.AI 2024-03-11
Defending Against Unforeseen Failure Modes with Latent Adversarial Training Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell cs.CR, cs.AI, cs.LG 2024-03-08
Enhancing Court View Generation with Knowledge Injection and Guidance Ang Li, Yiquan Wu, Yifei Liu, Fei Wu, Ming Cai, Kun Kuang cs.AI 2024-03-07
Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity Hagyeong Lee, Minkyu Kim, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Lee cs.CV, cs.LG 2024-03-05
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models Neta Shaul, Uriel Singer, Ricky T. Q. Chen, Matthew Le, Ali Thabet, Albert Pumarola, Yaron Lipman cs.LG, cs.AI, cs.CV 2024-03-02
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee cs.CV, cs.AI, cs.CL, cs.LG 2024-02-20
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo Wang, Ruichao Yang cs.CL, cs.AI 2024-01-24
Large Language Models for Scientific Information Extraction: An Empirical Study for Virology Mahsa Shamsabadi, Jennifer D’Souza, Sören Auer cs.CL, cs.AI, cs.DL, cs.IT, math.IT 2024-01-18
Textual Summarisation of Large Sets: Towards a General Approach Kittipitch Kuptavanich, Ehud Reiter, Kees Van Deemter, Advaith Siddharthan cs.CL 2024-01-17
Jewelry Recognition via Encoder-Decoder Models José M. Alcalde-Llergo, Enrique Yeguas-Bolívar, Andrea Zingoni, Alejandro Fuerte-Jurado cs.CV, cs.AI 2024-01-15
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng cs.CL, cs.AI 2024-01-14
PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller cs.CL 2024-01-12
Zur Darstellung eines mehrstufigen Prototypbegriffs in der multilingualen automatischen Sprachgenerierung: vom Korpus über word embeddings bis hin zum automatischen Wörterbuch María José Domínguez Vázquez cs.CL 2023-12-26
Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models Ling Li, Shaohua Li, Winda Marantika, Alex C. Kot, Huijing Zhan cs.IR, cs.AI 2023-12-24
Continuous Diffusion for Mixed-Type Tabular Data Markus Mueller, Kathrin Gruber, Dennis Fok cs.LG, stat.ML 2023-12-16
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji cs.CL 2023-12-15
Fast Sampling via De-randomization for Discrete Diffusion Models Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu cs.LG, cs.AI, stat.ML 2023-12-14
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer Xinpeng Wang, Xiaoyuan Yi, Han Jiang, Shanlin Zhou, Zhihua Wei, Xing Xie cs.CL, cs.AI 2023-12-13
Multimodal Sentiment Analysis: Perceived vs Induced Sentiments Aditi Aggarwal, Deepika Varshney, Saurabh Patel cs.CV, cs.LG, cs.SI 2023-12-12
Adaptive Compression of the Latent Space in Variational Autoencoders Gabriela Sejnova, Michal Vavrecka, Karla Stepanova cs.LG, cs.AI 2023-12-11
Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation Atoosa Chegini, Soheil Feizi cs.CV, cs.LG 2023-12-09
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Ling Cai, Nathalie Baracaldo cs.CR, cs.AI, cs.CL 2023-12-07
Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation Yifu Qiu, Varun Embar, Shay B. Cohen, Benjamin Han cs.CL, cs.AI 2023-11-16
GRIM: GRaph-based Interactive narrative visualization for gaMes Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan cs.CL 2023-11-15
Zero-shot audio captioning with audio-language model guidance and audio context keywords Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata eess.AS, cs.AI, cs.CL, cs.SD 2023-11-14
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia cs.LG, cs.AI, cs.CV 2023-11-07
Grounded Intuition of GPT-Vision’s Abilities with Scientific Images Alyssa Hwang, Andrew Head, Chris Callison-Burch cs.CL 2023-11-03
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andrés Hernández, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Andrés Link, Pablo Arbeláez, Rahul Dodhia, Juan Lavista Ferres cs.CV, cs.LG 2023-11-02
Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning Gaoang Wang, Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li cs.CV, cs.AI 2023-11-02
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements Peter A. Zachares, Vahan Hovhannisyan, Alan Mosca, Yarin Gal cs.LG 2023-11-01
Woodpecker: Hallucination Correction for Multimodal Large Language Models Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen cs.CV, cs.AI, cs.CL, cs.LG 2023-10-24
GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions Ting-Yao Hsu, Chieh-Yang Huang, Ryan Rossi, Sungchul Kim, C. Lee Giles, Ting-Hao K. Huang cs.CL 2023-10-23
HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models Vibhor Agarwal, Yu Chen, Nishanth Sastry cs.CL 2023-10-21
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering Yuduo Wang, Pedram Ghamisi cs.CV, cs.LG 2023-10-19
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua cs.CL, cs.MM 2023-10-19
Motion2Language, Unsupervised learning of synchronized semantic motion segmentation Karim Radouane, Andon Tchechmedjiev, Sylvie Ranwez, Julien Lagarde cs.CV, cs.CL 2023-10-16
BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation Ji Qi, Kaixuan Ji, Jifan Yu, Duokang Wang, Bin Xu, Lei Hou, Juanzi Li cs.CV, cs.CL 2023-10-16
Prompting for Discovery: Flexible Sense-Making for AI Art-Making with Dreamsheets Shm Garanganao Almeda, J. D. Zamfirescu-Pereira, Kyu Won Kim, Pradeep Mani Rathnam, Bjoern Hartmann cs.HC 2023-10-15
VLIS: Unimodal Language Models Guide Multimodal Language Generation Jiwan Chung, Youngjae Yu cs.CL, cs.AI 2023-10-15
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models Yuanchun Shen, Ruotong Liao, Zhen Han, Yunpu Ma, Volker Tresp cs.CL 2023-10-12
CP-KGC: Constrained-Prompt Knowledge Graph Completion with Large Language Models Rui Yang, Li Fang, Yi Zhou cs.CL, cs.AI 2023-10-12
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning Junyu Lu, Dixiang Zhang, Xiaojun Wu, Xinyu Gao, Ruyi Gan, Jiaxing Zhang, Yan Song, Pingjian Zhang cs.CL 2023-10-12
Multimodal Graph Learning for Generative Tasks Minji Yoon, Jing Yu Koh, Bryan Hooi, Ruslan Salakhutdinov cs.AI 2023-10-11
Video-CSR: Complex Video Digest Creation for Visual-Language Models Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang cs.CV, cs.AI 2023-10-08
InstructProtein: Aligning Human and Protein Language via Knowledge Instruction Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, Huajun Chen q-bio.BM, cs.CL 2023-10-05
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, Yanqing Guo cs.CV, cs.AI, cs.CL 2023-09-10
Zero-Shot Audio Captioning via Audibility Guidance Tal Shaharabany, Ariel Shaulov, Lior Wolf cs.SD, cs.CL, eess.AS 2023-09-07
Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation Arvind Krishna Sridhar, Yinyi Guo, Erik Visser, Rehana Mahfuz cs.CL, cs.MM, cs.SD 2023-09-06
Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts Hongyang Du, Guangyuan Liu, Dusit Niyato, Jiayi Zhang, Jiawen Kang, Zehui Xiong, Bo Ai, Dong In Kim eess.IV, cs.LG, cs.NI 2023-09-05
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan cs.LG, cs.CL, cs.CV 2023-09-05
Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface Atieh Taheri, Mohammad Izadi, Gururaj Shriram, Negar Rostamzadeh, Shaun Kane cs.HC, J.5; J.6; I.2.7 2023-09-05
PromptTTS 2: Describing and Generating Voices with Text Prompt Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian eess.AS, cs.CL, cs.LG, cs.SD 2023-09-05
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP Vedant Palit, Rohan Pandey, Aryaman Arora, Paul Pu Liang cs.CL, cs.AI, cs.CV 2023-08-27
GeoExplainer: A Visual Analytics Framework for Spatial Modeling Contextualization and Report Generation Fan Lei, Yuxin Ma, Stewart Fotheringham, Elizabeth Mack, Ziqi Li, Mehak Sachdeva, Sarah Bardin, Ross Maciejewski cs.HC, cs.LG 2023-08-25
Manipulating Embeddings of Stable Diffusion Prompts Niklas Deckers, Julia Peters, Martin Potthast cs.CV, cs.LG 2023-08-23
CgT-GAN: CLIP-guided Text GAN for Image Captioning Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He cs.CV, cs.AI, cs.CL, cs.MM 2023-08-23
Ceci n’est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings Eugene Bagdasaryan, Vitaly Shmatikov cs.CR, cs.AI, cs.LG 2023-08-22
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying Shan cs.SD, cs.AI, cs.CL, cs.MM, eess.AS 2023-08-22
Random Word Data Augmentation with CLIP for Zero-Shot Anomaly Detection Masato Tamura cs.CV, cs.LG 2023-08-22
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Zi-Yuan Hu, Yanyang Li, Michael R. Lyu, Liwei Wang cs.CV, cs.AI, cs.CL, cs.LG 2023-08-18
Can Knowledge Graphs Simplify Text? Anthony Colas, Haodi Ma, Xuanli He, Yang Bai, Daisy Zhe Wang cs.CL 2023-08-14
Mirror Diffusion Models Jaesung Tae cs.LG 2023-08-11
Generative Forests Richard Nock, Mathieu Guillame-Bert cs.LG, I.2.6 2023-08-07
FAST: Font-Agnostic Scene Text Editing Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein cs.CV, cs.MM 2023-08-05
Guiding Image Captioning Models Toward More Specific Captions Simon Kornblith, Lala Li, Zirui Wang, Thao Nguyen cs.CV, cs.LG 2023-07-31
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng cs.CV, cs.CL 2023-07-31
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin cs.MM, cs.CV 2023-07-31
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Kun Yuan, Vinkle Srivastav, Tong Yu, Joel Lavanchy, Pietro Mascagni, Nassir Navab, Nicolas Padoy cs.CV, cs.AI 2023-07-27
A Transformer-based Approach for Arabic Offline Handwritten Text Recognition Saleh Momeni, Bagher BabaAli cs.CV, cs.LG 2023-07-27
Evaluating Generative Models for Graph-to-Text Generation Shuzhou Yuan, Michael Färber cs.CL, cs.AI 2023-07-27
XDLM: Cross-lingual Diffusion Language Model for Machine Translation Linyao Chen, Aosong Feng, Boming Yang, Zihui Li cs.CL 2023-07-25
Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts Mayug Maniparambil, Chris Vorster, Derek Molloy, Noel Murphy, Kevin McGuinness, Noel E. O’Connor cs.CV, cs.AI, cs.CL, cs.LG 2023-07-21
OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr cs.CV, cs.CL 2023-07-21
Generating Image-Specific Text Improves Fine-grained Image Classification Emily Mu, Kathleen M. Lewis, Adrian V. Dalca, John Guttag cs.CV, cs.CL 2023-07-21
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi cs.CL, cs.CV, cs.LG 2023-07-20
Improving Multimodal Datasets with Image Captioning Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt cs.LG, cs.CV 2023-07-19
PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Minfeng Zhu, Baicheng Wang, Wei Chen cs.AI, cs.HC 2023-07-18
Reading Radiology Imaging Like The Radiologist Yuhao Wang cs.CV, cs.AI 2023-07-12
Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging Heejong Kim, Victor Ion Butoi, Adrian V. Dalca, Mert R. Sabuncu eess.IV, cs.CV, cs.LG 2023-07-06
Vision Language Transformers: A Survey Clayton Fields, Casey Kennington cs.CV, cs.AI, cs.CL, cs.LG 2023-07-06
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment Yongrae Jo, Seongyun Lee, Aiden SJ Lee, Hyunji Lee, Hanseok Oh, Minjoon Seo cs.CV, cs.CL 2023-07-05
A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis Jiaxiang Liu, Tianxiang Hu, Yan Zhang, Xiaotang Gai, Yang Feng, Zuozhu Liu eess.IV, cs.CV, cs.LG 2023-07-05
More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data Andrew Kean Gao cs.CV, cs.LG, I.4.9, I.2.10 2023-07-01
Concept-Oriented Deep Learning with Large Language Models Daniel T. Chang cs.LG, cs.CL 2023-06-29
Joint Level Generation and Translation Using Gameplay Videos Negar Mirgati, Matthew Guzdial cs.CV, cs.LG 2023-06-29
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles Haoqin Tu, Bowen Yang, Xianfeng Zhao cs.CL 2023-06-29
You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting Xuan Ren, Lingqiao Liu cs.CL, cs.AI, cs.LG 2023-06-28
FunQA: Towards Surprising Video Comprehension Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu cs.CV, cs.AI, cs.CL, cs.MM 2023-06-26
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin cs.CL 2023-06-23
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion Simone Bianco, Luigi Celona, Marco Donzella, Paolo Napoletano cs.CV, cs.AI, cs.CL, cs.DB, cs.LG 2023-06-20
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye cs.CV, cs.AI, cs.CL, cs.LG 2023-06-16
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, Hongsheng Li cs.CV, cs.AI, cs.DB 2023-06-15
GBSD: Generative Bokeh with Stage Diffusion Jieren Deng, Xin Zhou, Hao Tian, Zhihong Pan, Derek Aguiar cs.CV, cs.AI 2023-06-14
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models Raz Lapid, Moshe Sipper cs.CV, cs.NE 2023-06-13
Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation Yinda Chen, Che Liu, Wei Huang, Sibo Cheng, Rossella Arcucci, Zhiwei Xiong cs.CV, cs.AI 2023-06-07
On the Difference of BERT-style and CLIP-style Text Encoders Zhihong Chen, Guiming Hardy Chen, Shizhe Diao, Xiang Wan, Benyou Wang cs.CL 2023-06-06
Putting Humans in the Image Captioning Loop Aliki Anagnostopoulou, Mareike Hartmann, Daniel Sonntag cs.CL, cs.CV 2023-06-06
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Hang Zhang, Xin Li, Lidong Bing cs.CL, cs.CV, cs.SD, eess.AS 2023-06-05
Identifying the style by a qualified reader on a short fragment of generated poetry Boris Orekhov cs.CL, cs.AI, cs.LG 2023-06-05
Multilingual Conceptual Coverage in Text-to-Image Models Michael Saxon, William Yang Wang cs.CL, cs.AI, cs.CV, eess.IV 2023-06-02
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization Jung Hyun Lee, Jeonghoon Kim, Se Jung Kwon, Dongsoo Lee cs.LG, cs.AI 2023-06-01
CapText: Large Language Model-based Caption Generation From Image Context and Description Shinjini Ghosh, Sagnik Anupam cs.LG, cs.CL 2023-06-01
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting Rita Ramos, Bruno Martins, Desmond Elliott cs.CL, cs.CV 2023-05-31
Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards Guian Fang, Zutao Jiang, Jianhua Han, Guansong Lu, Hang Xu, Xiaodan Liang cs.CV, cs.AI 2023-05-31
Fine-grained Text Style Transfer with Diffusion-Based Language Models Yiwei Lyu, Tiange Luo, Jiacheng Shi, Todd C. Hollon, Honglak Lee cs.CL, cs.AI, cs.LG 2023-05-31
Learning to Imagine: Visually-Augmented Natural Language Generation Tianyi Tang, Yushuo Chen, Yifan Du, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen cs.CL 2023-05-26
Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Furu Wei cs.CL 2023-05-24
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors Tuhin Chakrabarty, Arkadiy Saakyan, Olivia Winn, Artemis Panagopoulou, Yue Yang, Marianna Apidianaki, Smaranda Muresan cs.CL, cs.AI, cs.CV, cs.HC 2023-05-24
Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng cs.CL 2023-05-24
Process-To-Text: A Framework for the Quantitative Description of Processes in Natural Language Yago Fontenla-Seco, Alberto Bugarín-Diz, Manuel Lama cs.CL 2023-05-23
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models Dao Xuan-Quy, Le Ngoc-Bich, Vo The-Duy, Phan Xuan-Dung, Ngo Bac-Bien, Nguyen Van-Tien, Nguyen Thi-My-Thanh, Nguyen Hong-Phuoc cs.CL 2023-05-20
STOAT: Structured Data to Analytical Text With Controls Deepanway Ghosal, Preksha Nema, Aravindan Raghuveer cs.CL, cs.AI 2023-05-19
Generating Visual Spatial Description via Holistic 3D Scene Understanding Yu Zhao, Hao Fei, Wei Ji, Jianguo Wei, Meishan Zhang, Min Zhang, Tat-Seng Chua cs.CV, cs.CL 2023-05-19
Brain Captioning: Decoding human brain activity into images and text Matteo Ferrante, Furkan Ozcelik, Tommaso Boccato, Rufin VanRullen, Nicola Toschi cs.CV, cs.AI 2023-05-19
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang cs.CL 2023-05-18
AIwriting: Relations Between Image Generation and Digital Writing Scott Rettberg, Talan Memmott, Jill Walker Rettberg, Jason Nelson, Patrick Lichty cs.AI, cs.CL, cs.HC, cs.MM, J.5 2023-05-18
ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, Chao Zhang cs.CL, cs.IR, cs.LG 2023-05-18
What You See is What You Read? Improving Text-Image Alignment Evaluation Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor cs.CL, cs.CV 2023-05-17
Equivariant Few-Shot Learning from Pretrained Models Sourya Basu, Pulkit Katdare, Prasanna Sattigeri, Vijil Chenthamarakshan, Katherine Driggs-Campbell, Payel Das, Lav R. Varshney cs.LG, cs.AI, cs.CL, cs.CV 2023-05-17
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation Tong Wu, Zhihao Fan, Xiao Liu, Yeyun Gong, Yelong Shen, Jian Jiao, Hai-Tao Zheng, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen cs.CL 2023-05-16
Generative AI: Implications and Applications for Education Anastasia Olga, Tzirides, Akash Saini, Gabriela Zapata, Duane Searsmith, Bill Cope, Mary Kalantzis, Vania Castro, Theodora Kourkoulou, John Jones, Rodrigo Abrantes da Silva, Jen Whiting, Nikoleta Polyxeni Kastania cs.CY, cs.AI 2023-05-12
Two-in-One: A Model Hijacking Attack Against Text Generation Models Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem cs.CR, cs.CL, cs.LG 2023-05-12
Vision-Language Models in Remote Sensing: Current Progress and Future Trends Congcong Wen, Yuan Hu, Xiang Li, Zhenghang Yuan, Xiao Xiang Zhu cs.CV, cs.AI 2023-05-09
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese Doanh C. Bui, Nghia Hieu Nguyen, Khang Nguyen cs.CV, cs.CL 2023-05-07
Simulating H.P. Lovecraft horror literature with the ChatGPT large language model Eduardo C. Garrido-Merchán, José Luis Arroyo-Barrigüete, Roberto Gozalo-Brihuela cs.CL 2023-05-05
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation Xilun Chen, Lili Yu, Wenhan Xiong, Barlas Oğuz, Yashar Mehdad, Wen-tau Yih cs.CV, cs.CL 2023-05-04
Image Captioners Sometimes Tell More Than Images They See Honori Udo, Takafumi Koshinaka cs.CV, cs.MM 2023-05-04
Governance of the AI, by the AI, and for the AI Andrew W. Torrance, Bill Tomlinson cs.CY, cs.AI 2023-05-04
Controlled Text Generation with Natural Language Instructions Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan cs.CL, cs.AI, cs.LG 2023-04-27
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang cs.CV, cs.CL, cs.LG 2023-04-26
RenderDiffusion: Text Generation as Image Generation Junyi Li, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen cs.CL, cs.CV, cs.LG 2023-04-25
Token Imbalance Adaptation for Radiology Report Generation Yuexin Wu, I-Chan Huang, Xiaolei Huang cs.CL, cs.AI 2023-04-18
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu cs.LG, cs.CL, cs.CV, cs.MM, eess.AS 2023-04-17
Improving Diffusion Models for Scene Text Editing with Dual Encoders Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian Price, Shiyu Chang cs.CV, cs.AI 2023-04-12
ImageCaptioner$^2$: Image Captioner for Image Captioning Bias Amplification Assessment Eslam Mohamed Bakr, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny cs.CV, cs.AI, cs.LG 2023-04-10
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny cs.CV, cs.AI 2023-04-09
Opinion Mining from YouTube Captions Using ChatGPT: A Case Study of Street Interviews Polling the 2023 Turkish Elections Tuğrulcan Elmas, İlker Gül cs.SI, cs.CY 2023-04-07
DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun cs.CV, cs.AI 2023-04-06
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data Vladislav Lialin, Stephen Rawls, David Chan, Shalini Ghosh, Anna Rumshisky, Wael Hamza cs.CV, cs.CL 2023-04-04
Cross-Domain Image Captioning with Discriminative Finetuning Roberto Dessì, Michele Bevilacqua, Eleonora Gualdoni, Nathanael Carraz Rakotonirina, Francesca Franzon, Marco Baroni cs.CV, cs.AI, cs.CL 2023-04-04
Can AI Put Gamma-Ray Astrophysicists Out of a Job? Samuel T. Spencer, Vikas Joshi, Alison M. W. Mitchell physics.pop-ph, astro-ph.HE, cs.CL 2023-03-31
Prefix tuning for automated audio captioning Minkyu Kim, Kim Sung-Bin, Tae-Hyun Oh eess.AS, cs.MM, cs.SD 2023-03-30
GPT is becoming a Turing machine: Here are some ways to program it Ana Jojic, Zhen Wang, Nebojsa Jojic cs.CL 2023-03-25
CoBIT: A Contrastive Bi-directional Image-Text Generation Model Haoxuan You, Mandy Guo, Zhecan Wang, Kai-Wei Chang, Jason Baldridge, Jiahui Yu cs.CV, cs.CL 2023-03-23
Open-Vocabulary Object Detection using Pseudo Caption Labels Han-Cheol Cho, Won Young Jhoo, Wooyoung Kang, Byungseok Roh cs.CV, cs.AI 2023-03-23
HIVE: Harnessing Human Feedback for Instructional Visual Editing Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, Ran Xu cs.CV, cs.AI, cs.CL, cs.HC, cs.LG 2023-03-16
Text-to-image Diffusion Model in Generative AI: A Survey Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, In So Kweon cs.CV, cs.AI, cs.LG 2023-03-14
Diffusion Models in NLP: A Survey Yuansong Zhu, Yu Zhao cs.CL, cs.AI 2023-03-14
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, Yaowei Wang, David A. Clifton cs.CL, cs.AI, cs.CV 2023-03-11
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions Bill Noble, Nikolai Ilinykh cs.CL 2023-03-07
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training Wei Li, Linchao Zhu, Longyin Wen, Yi Yang cs.CV, cs.AI, cs.CL 2023-03-06
Interactive Text Generation Felix Faltings, Michel Galley, Baolin Peng, Kianté Brantley, Weixin Cai, Yizhe Zhang, Jianfeng Gao, Bill Dolan cs.CL 2023-03-02
Few-Shot Table-to-Text Generation with Prompt-based Adapter Zhixin Guo, Minyxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Zhouhan Lin, Guanjie Zheng, Xinbing Wang cs.CL 2023-02-24
Improved Training of Mixture-of-Experts Language GANs Yekun Chai, Qiyue Yin, Junge Zhang cs.CL 2023-02-23
Improving User Controlled Table-To-Text Generation Robustness Hanxu Hu, Yunqing Liu, Zhongyi Yu, Laura Perez-Beltrachini cs.CL 2023-02-20
Large Scale Multi-Lingual Multi-Modal Summarization Dataset Yash Verma, Anubhav Jangra, Raghvendra Kumar, Sriparna Saha cs.CL, cs.MM 2023-02-13
Plan-then-Seam: Towards Efficient Table-to-Text Generation Liang Li, Ruiying Geng, Chengyang Fang, Bing Li, Can Ma, Binhua Li, Yongbin Li cs.CL 2023-02-10
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar cs.CV, cs.AI, cs.CL, cs.IR, cs.LG 2023-02-09
Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization Zhixin Guo, Minyxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Zhouhan Lin, Guanjie Zheng, Xinbing Wang cs.CL, cs.AI 2023-02-09
Adversarial Prompting for Black Box Foundation Models Natalie Maus, Patrick Chao, Eric Wong, Jacob Gardner cs.LG 2023-02-08
GPTScore: Evaluate as You Desire Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, Pengfei Liu cs.CL 2023-02-08
Grounding Language Models to Images for Multimodal Generation Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried cs.CL, cs.AI, cs.CV, cs.LG 2023-01-31
Semi-Parametric Video-Grounded Text Generation Sungdong Kim, Jin-Hwa Kim, Jiyoung Lee, Minjoon Seo cs.CV, cs.CL, cs.LG 2023-01-27
Explaining Visual Biases as Words by Generating Captions Younghyun Kim, Sangwoo Mo, Minkyu Kim, Kyungmin Lee, Jaeho Lee, Jinwoo Shin cs.LG, cs.CV 2023-01-26
MTTN: Multi-Pair Text to Text Narratives for Prompt Generation Archan Ghosh, Debgandhar Ghosh, Madhurima Maji, Suchinta Chanda, Kalporup Goswami cs.CL, cs.LG 2023-01-21
Regeneration Learning: A Learning Paradigm for Data Generation Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio cs.LG, cs.AI, cs.CL, cs.CV, eess.AS 2023-01-21
Universal Multimodal Representation for Language Understanding Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao cs.CL, cs.AI, cs.CV 2023-01-09
An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU) Rana Adnan Ahmad, Muhammad Azhar, Hina Sattar cs.CV, cs.AI 2023-01-06
Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach Miao Chen, Xinjiang Lu, Tong Xu, Yanyan Li, Jingbo Zhou, Dejing Dou, Hui Xiong cs.CL, cs.AI 2023-01-05
eVAE: Evolutionary Variational Autoencoder Zhangkai Wu, Longbing Cao, Lei Qi cs.NE, cs.LG 2023-01-01
MAUVE Scores for Generative Models: Theory and Practice Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui cs.LG, cs.AI, cs.CL 2022-12-30
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh cs.CV, cs.AI 2022-12-27
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective Ying Wen, Ziyu Wan, Ming Zhou, Shufang Hou, Zhe Cao, Chenyang Le, Jingxiao Chen, Zheng Tian, Weinan Zhang, Jun Wang cs.AI, cs.LG 2022-12-24
Do DALL-E and Flamingo Understand Each Other? Hang Li, Jindong Gu, Rajat Koner, Sahand Sharifzadeh, Volker Tresp cs.CV, cs.LG 2022-12-23
A survey on text generation using generative adversarial networks Gustavo Henrique de Rosa, João Paulo Papa cs.CL, cs.AI, cs.LG 2022-12-20
SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, Songfang Huang cs.CL 2022-12-20
One Embedder, Any Task: Instruction-Finetuned Text Embeddings Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu cs.CL 2022-12-19
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning Ukyo Honda, Taro Watanabe, Yuji Matsumoto cs.CV, cs.CL 2022-12-06
Towards Generating Diverse Audio Captions via Adversarial Training Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang eess.AS, cs.AI, cs.MM, cs.SD 2022-12-05
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation Faeze Brahman, Baolin Peng, Michel Galley, Sudha Rao, Bill Dolan, Snigdha Chaturvedi, Jianfeng Gao cs.CL 2022-12-04
Learning Automata-Based Task Knowledge Representation from Large-Scale Generative Language Models Yunhao Yang, Jean-Raphaël Gaglione, Ufuk Topcu cs.FL, cs.CL 2022-12-04
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation Zutao Jiang, Guangsong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun Chang, Hang Xu cs.CV, cs.AI 2022-12-02
On the Importance of Image Encoding in Automated Chest X-Ray Report Generation Otabek Nazarov, Mohammad Yaqub, Karthik Nandakumar cs.CV, cs.AI 2022-11-24
Retrieval-Augmented Multimodal Language Modeling Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih cs.CV, cs.CL, cs.LG 2022-11-22
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning Fenglin Liu, Xian Wu, Chenyu You, Shen Ge, Yuexian Zou, Xu Sun cs.CV, cs.LG 2022-11-22
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation Jie Ruan, Yue Wu, Xiaojun Wan, Yuesheng Zhu cs.CV, cs.CL 2022-11-20
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired Kazuya Ohata, Shunsuke Kitada, Hitoshi Iyatomi cs.CV, cs.AI, cs.CL, cs.HC, cs.LG 2022-11-17
PromptCap: Prompt-Guided Task-Aware Image Captioning Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo cs.CV, cs.CL 2022-11-15
CCPrompt: Counterfactual Contrastive Prompt-Tuning for Many-Class Classification Yang Li, Canran Xu, Tao Shen, Jing Jiang, Guodong Long cs.CL 2022-11-11
Self-conditioned Embedding Diffusion for Text Generation Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Rémi Leblond cs.CL, cs.LG 2022-11-08
Semantic Metadata Extraction from Dense Video Captioning Johannes Scherer, Ansgar Scherp, Deepayan Bhowmik cs.CV, cs.CL 2022-11-05
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu cs.CV, cs.LG 2022-11-02
CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation Yihong Dong, Ge Li cs.SE, cs.AI 2022-11-02
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov cs.CL, cs.LG 2022-10-31
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang eess.AS, cs.AI, cs.MM, cs.SD 2022-10-28
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Jean-Benoit Delbrouck, Pierre Chambon, Christian Bluethgen, Emily Tsai, Omar Almusa, Curtis P. Langlotz cs.CL, cs.AI 2022-10-21
Image Semantic Relation Generation Mingzhe Du cs.CV, cs.CL 2022-10-19
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu cs.CL, cs.AI 2022-10-19
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective Zheng Ma, Shi Zong, Mianzhi Pan, Jianbing Zhang, Shujian Huang, Xinyu Dai, Jiajun Chen cs.CL 2022-10-18
UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image Dani Valevski, Matan Kalman, Yossi Matias, Yaniv Leviathan cs.CV, cs.GR, cs.LG 2022-10-17
Social Biases in Automatic Evaluation Metrics for NLG Mingqi Gao, Xiaojun Wan cs.CL, cs.AI 2022-10-17
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training Wenliang Dai, Zihan Liu, Ziwei Ji, Dan Su, Pascale Fung cs.CL, cs.CV 2022-10-14
Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models Sourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, Payel Das cs.LG, cs.CL 2022-10-13
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis Wenda Xu, Yilin Tuan, Yujie Lu, Michael Saxon, Lei Li, William Yang Wang cs.CL, cs.AI 2022-10-10
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Shitong Xu cs.CV, cs.LG 2022-10-10
ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models Jiannan Xiang, Zhengzhong Liu, Yucheng Zhou, Eric P. Xing, Zhiting Hu cs.CL 2022-10-09
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang cs.CL, cs.AI 2022-10-07
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth cs.CL 2022-10-07
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, Richard Evans cs.HC, cs.CL 2022-09-29
XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma cs.CL 2022-09-22
Distribution Aware Metrics for Conditional Natural Language Generation David M Chan, Yiming Ni, Austin Myers, Sudheendra Vijayanarasimhan, David A Ross, John Canny cs.CL, cs.AI, cs.CV, cs.LG 2022-09-15
PaLI: A Jointly-Scaled Multilingual Language-Image Model Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut cs.CV, cs.CL 2022-09-14
Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori cs.CL, cs.AI 2022-09-13
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation Peining Zhang, Junliang Guo, Linli Xu, Mu You, Junming Yin cs.SD, cs.CL, eess.AS 2022-09-05
Every picture tells a story: Image-grounded controllable stylistic story generation Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy Chung, Pascale Fung cs.CL 2022-09-04
Understanding Attention for Vision-and-Language Tasks Feiqi Cao, Soyeon Caren Han, Siqu Long, Changwei Xu, Josiah Poon cs.CV, cs.CL 2022-08-17
ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model Rao Fu, Xiao Zhan, Yiwen Chen, Daniel Ritchie, Srinath Sridhar cs.CV, cs.AI 2022-07-19
A Baseline for Detecting Out-of-Distribution Examples in Image Captioning Gabi Shalev, Gal-Lev Shalev, Joseph Keshet cs.CV, cs.LG 2022-07-12
Towards Multimodal Vision-Language Models Generating Non-Generic Text Wes Robbins, Zanyar Zohourianshahzadi, Jugal Kalita cs.CV, cs.AI 2022-07-09
Dual-Stream Transformer for Generic Event Boundary Captioning Xin Gu, Hanhua Ye, Guang Chen, Yufei Wang, Libo Zhang, Longyin Wen cs.CV, cs.CL 2022-07-07
Syntax Controlled Knowledge Graph-to-Text Generation with Order and Semantic Consistency Jin Liu, Chongfeng Fan, Fengyu Zhou, Huijuan Xu cs.AI 2022-07-02
Automatic Controllable Product Copywriting for E-Commerce Xiaojie Guo, Qingkai Zeng, Meng Jiang, Yun Xiao, Bo Long, Lingfei Wu cs.AI, cs.LG 2022-06-21
niksss at HinglishEval: Language-agnostic BERT-based Contextual Embeddings with Catboost for Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text Nikhil Singh cs.CL 2022-06-17
Prefix Language Models are Unified Modal Learners Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang cs.CV, cs.CL, cs.LG 2022-06-15
Exploring industrial safety knowledge via Zipf law Zhenhua Wang, Ming Ren, Dong Gao, Zhuang Li cs.CL 2022-05-25
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training Gi-Cheon Kang, Sungdong Kim, Jin-Hwa Kim, Donghyun Kwak, Byoung-Tak Zhang cs.CV, cs.CL, cs.LG 2022-05-25
Rethinking Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh cs.CL, cs.AI, cs.CV, cs.LG 2022-05-24
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasovic cs.CL, cs.CV 2022-05-24
What Makes Data-to-Text Generation Hard for Pretrained Language Models? Moniba Keymanesh, Adrian Benton, Mark Dredze cs.CL, cs.AI, cs.IR, cs.LG 2022-05-23
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji cs.CV, cs.AI 2022-05-22
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics Elisa Kreiss, Cynthia Bennett, Shayan Hooshmand, Eric Zelikman, Meredith Ringel Morris, Christopher Potts cs.CL 2022-05-21
It Isn’t Sh!tposting, It’s My CAT Posting Parthsarthi Rawat, Sayan Das, Jorge Aguirre, Akhil Daphara cs.CV, cs.AI, cs.LG 2022-05-18
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training Constantin Seibold, Simon Reiß, M. Saquib Sarfraz, Rainer Stiefelhagen, Jens Kleesiek cs.CV, cs.LG 2022-05-14
Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning Fei Wang, Zhewei Xu, Pedro Szekely, Muhao Chen cs.CL, cs.AI, cs.LG 2022-05-08
RoViST:Learning Robust Metrics for Visual Storytelling Eileen Wang, Caren Han, Josiah Poon cs.CV, cs.AI 2022-05-08
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang cs.CL, cs.CV, cs.MM 2022-05-07
Language Models Can See: Plugging Visual Controls in Text Generation Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, Nigel Collier cs.CV, cs.CL 2022-05-05
Diverse Image Captioning with Grounded Style Franz Klein, Shweta Mahajan, Stefan Roth cs.CV, cs.LG 2022-05-03
Cross-modal Memory Networks for Radiology Report Generation Zhihong Chen, Yaling Shen, Yan Song, Xiang Wan cs.CL 2022-04-28
Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR) Amelie Wührl, Roman Klinger cs.CL, cs.IR 2022-04-21
Evaluating Mixed-initiative Conversational Search Systems via User Simulation Ivan Sekulić, Mohammad Aliannejadi, Fabio Crestani cs.CL, cs.IR 2022-04-17
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures Giovanni Bonetta, Matteo Ribero, Rossella Cancelliere cs.CL, cs.AI 2022-04-11
Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention Yu Yang, Seungbae Kim, Jungseock Joo cs.CV, cs.AI, cs.LG 2022-04-10
On Distinctive Image Captioning via Comparing and Reweighting Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan cs.CV, cs.AI 2022-04-08
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata cs.CV, cs.CL 2022-04-05
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Andy Zeng, Adrian Wong, Stefan Welker, Krzysztof Choromanski, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence cs.CV, cs.AI, cs.CL, cs.LG 2022-04-01
Neural Pipeline for Zero-Shot Data-to-Text Generation Zdeněk Kasner, Ondřej Dušek cs.CL 2022-03-30
GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models Changye Li, David Knopman, Weizhe Xu, Trevor Cohen, Serguei Pakhomov cs.CL 2022-03-25
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization Shankar Kanthara, Rixie Tiffany Ko Leong, Xiang Lin, Ahmed Masry, Megh Thakkar, Enamul Hoque, Shafiq Joty cs.CL 2022-03-12
Compilable Neural Code Generation with Compiler Feedback Xin Wang, Yasheng Wang, Yao Wan, Fei Mi, Yitong Li, Pingyi Zhou, Jin Liu, Hao Wu, Xin Jiang, Qun Liu cs.CL, cs.AI, cs.PL 2022-03-10
How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity Chengyue Gong, Lemeng Wu, Qiang Liu cs.LG, cs.CV 2022-02-16
Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation Ahmad Hammoudeh, Bastein Vanderplaetse, Stéphane Dupont cs.CV, cs.AI 2022-02-11
Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang cs.CV, cs.CL 2022-02-07
XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages Tushar Abhishek, Shivprasad Sagare, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma cs.CL 2022-02-01
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment Luis Lebron, Yvette Graham, Kevin McGuinness, Konstantinos Kouramas, Noel E. O’Connor cs.CV, cs.LG 2022-01-25
Pre-Trained Language Transformers are Universal Image Classifiers Rahul Goel, Modar Sulaiman, Kimia Noorbakhsh, Mahdi Sharifi, Rajesh Sharma, Pooyan Jamshidi, Kallol Roy cs.CV, cs.AI 2022-01-25
An Integrated Approach for Video Captioning and Applications Soheyla Amirian, Thiab R. Taha, Khaled Rasheed, Hamid R. Arabnia cs.CV, cs.AI 2022-01-23
Inferring Commonsense Explanations as Prompts for Future Event Generation Li Lin, Yixin Cao, Lifu Huang, Shuang Li, Xuming Hu, Lijie Wen, Jianmin Wang cs.CL, cs.LG, I.2.7; I.2.4 2022-01-18
Local Information Assisted Attention-free Decoder for Audio Captioning Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Haiyan Lan, Wenwu Wang cs.SD, cs.LG, eess.AS 2022-01-10
Self-Training Vision Language BERTs with a Unified Conditional Model Xiaofeng Yang, Fengmao Lv, Fayao Liu, Guosheng Lin cs.CV, cs.CL 2022-01-06
Compact Bidirectional Transformer for Image Captioning Yuanen Zhou, Zhenzhen Hu, Daqing Liu, Huixia Ben, Meng Wang cs.CV, cs.CL 2022-01-06
StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams Chengxi Li, Brent Harrison cs.CV, cs.AI, cs.CL 2022-01-04
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang cs.CV, cs.CL 2021-12-31
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment Shuxin Yang, Xian Wu, Shen Ge, Xingwang Wu, S. Kevin Zhou, Li Xiao eess.IV, cs.CL, cs.CV 2021-12-30
Automatic Product Copywriting for E-Commerce Xueying Zhang, Yanyan Zou, Hainan Zhang, Jing Zhou, Shiliang Diao, Jiajia Chen, Zhuoye Ding, Zhen He, Xueqi He, Yun Xiao, Bo Long, Han Yu, Lingfei Wu cs.CL, cs.AI 2021-12-15
Contextualized Scene Imagination for Generative Commonsense Reasoning PeiFeng Wang, Jonathan Zamora, Junfeng Liu, Filip Ilievski, Muhao Chen, Xiang Ren cs.CL 2021-12-12
Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation Ao Liu, Congjian Luo, Naoaki Okazaki cs.CL 2021-12-12
Show and Write: Entity-aware News Generation with Image Information Zhongping Zhang, Yiwen Gu, Bryan A. Plummer cs.CL 2021-12-11
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang cs.CV, cs.CL, cs.LG 2021-12-10
Self-Supervised Image-to-Text and Text-to-Image Synthesis Anindya Sundar Das, Sriparna Saha cs.CV, cs.CL, cs.LG 2021-12-09
Search and Learn: Improving Semantic Coverage for Data-to-Text Generation Shailza Jolly, Zi Xuan Zhang, Andreas Dengel, Lili Mou cs.CL 2021-12-06
Protecting Intellectual Property of Language Generation APIs with Lexical Watermark Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, Chenguang Wang cs.CR, cs.CL 2021-12-05
Representation Learning for Conversational Data using Discourse Mutual Information Maximization Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, Pawan Goyal cs.CL 2021-12-04
LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training Ningyu Zhang, Hongbin Ye, Jiacheng Yang, Shumin Deng, Chuanqi Tan, Mosha Chen, Songfang Huang, Fei Huang, Huajun Chen cs.CL, cs.AI 2021-12-02
Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi cs.CV, cs.CL, cs.LG 2021-12-01
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf cs.CV, cs.AI, cs.CL 2021-11-29
LAFITE: Towards Language-Free Training for Text-to-Image Generation Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun cs.CV, cs.LG 2021-11-27
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences Moritz Ibing, Gregor Kobsik, Leif Kobbelt cs.CV, cs.GR, cs.LG 2021-11-24
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan cs.CV, cs.AI 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang cs.CV, cs.CL 2021-11-24
L-Verse: Bidirectional Generation Between Image and Text Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae cs.CV, cs.CL, cs.LG 2021-11-22

< Previous