Image2text
25 Nov 2024written on 2024-11-28
title | authors | categories | displaydate |
---|---|---|---|
LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation | Steven Song, Anirudh Subramanyam, Irene Madejski, Robert L. Grossman | cs.CV, cs.CL | 2024-11-25 |
MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model | Yifan Wu, Min Zeng, Yang Li, Yang Zhang, Min Li | cs.ET, cs.CL | 2024-11-23 |
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation | Junhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul Ye | cs.CV, cs.LG, eess.IV | 2024-11-23 |
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains | Yurii Paniv, Artur Kiulian, Dmytro Chaplynskyi, Mykola Khandoga, Anton Polishko, Tetiana Bas, Guillermo Gabrielli | cs.CL | 2024-11-22 |
From Text to Pose to Image: Improving Diffusion Model Control and Quality | Clément Bonnet, Ariel N. Lee, Franck Wertel, Antoine Tamano, Tanguy Cizain, Pablo Ducru | cs.CV, cs.AI, cs.LG | 2024-11-19 |
Debias your Large Multi-Modal Model at Test-Time with Non-Contrastive Visual Attribute Steering | Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Estelle Aflalo, Shao-Yen Tseng, Vasudev Lal, Phillip Howard | cs.CV, cs.LG | 2024-11-15 |
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions | Moran Yanuka, Assaf Ben Kish, Yonatan Bitton, Idan Szpektor, Raja Giryes | cs.CV, cs.CL, cs.LG | 2024-11-13 |
Decoding Report Generators: A Cyclic Vision-Language Adapter for Counterfactual Explanations | Yingying Fang, Zihao Jin, Shaojie Guo, Jinda Liu, Yijian Gao, Junzhi Ning, Zhiling Yue, Zhi Li, Simon LF Walsh, Guang Yang | cs.CV, cs.AI, cs.CL, cs.LG | 2024-11-08 |
PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation | Daniel C. Castro, Aurelia Bustos, Shruthi Bannur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores Sánchez-Valverde, Lara Jaques-Pérez, Lourdes Pérez-Rodríguez, Kenji Takeda, José María Salinas, Javier Alvarez-Valle, Joaquín Galant Herrero, Antonio Pertusa | cs.AI, cs.CL, cs.CV | 2024-11-07 |
RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering | Hui Lin, Danfeng Hong, Shuhang Ge, Chuyao Luo, Kai Jiang, Hao Jin, Congcong Wen | cs.CV, cs.AI | 2024-11-03 |
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models | Georgia Gabriela Sampaio, Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu, Josh Susskind, Navdeep Jaitly, Yizhe Zhang | cs.CV, cs.AI | 2024-11-02 |
Using Multimodal Deep Neural Networks to Disentangle Language from Visual Aesthetics | Colin Conwell, Christopher Hamblin, Chelsea Boccagno, David Mayo, Jesse Cummings, Leyla Isik, Andrei Barbu | cs.CV, cs.CL | 2024-10-31 |
Private Synthetic Text Generation with Diffusion Models | Sebastian Ochs, Ivan Habernal | cs.CL | 2024-10-30 |
Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data | Badr AlKhamissi, Yingtian Tang, Abdülkadir Gökce, Johannes Mehrer, Martin Schrimpf | cs.CV, cs.LG | 2024-10-29 |
Towards Visual Text Design Transfer Across Languages | Yejin Choi, Jiwan Chung, Sumin Shim, Giyeong Oh, Youngjae Yu | cs.CV, cs.AI | 2024-10-24 |
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning | Soeun Lee, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim | cs.CV, cs.AI, cs.CL, cs.LG | 2024-09-26 |
MIO: A Foundation Model on Multimodal Tokens | Zekun Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang | cs.CL, cs.AI, cs.LG | 2024-09-26 |
Copying style, Extracting value: Illustrators’ Perception of AI Style Transfer and its Impact on Creative Labor | Julien Porquet, Sitong Wang, Lydia B. Chilton | cs.HC | 2024-09-25 |
Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning | Siddharth Betala, Ishan Chokshi | cs.CL, cs.AI | 2024-09-23 |
Recommendation with Generative Models | Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Arnau Ramisa, Rene Vidal, Maheswaran Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci | cs.IR | 2024-09-18 |
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models | Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko, Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li | cs.CV, cs.AI, cs.GR | 2024-09-16 |
Spatio-Temporal Context Prompting for Zero-Shot Action Detection | Wei-Jhe Huang, Min-Hung Chen, Shang-Hong Lai | cs.CV, cs.AI | 2024-08-28 |
DIAGen: Diverse Image Augmentation with Generative Models | Tobias Lingenberg, Markus Reuter, Gopika Sudhakaran, Dominik Gojny, Stefan Roth, Simone Schaub-Meyer | cs.CV, cs.AI | 2024-08-26 |
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization | Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara | cs.CV, cs.AI, cs.CL, cs.MM | 2024-08-26 |
Cap2Sum: Learning to Summarize Videos by Generating Captions | Cairong Zhao, Chutian Wang, Zifan Song, Guosheng Hu, Haonan Chen, Xiaofan Zhai | cs.MM | 2024-08-23 |
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Xiangyu Zhao, Yuehan Zhang, Wenlong Zhang, Xiao-Ming Wu | cs.CV, cs.AI | 2024-08-21 |
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang, Li Shen, Yong Luo, Shuai Xie, Han Hu, Lefei Zhang, Bo Du, Dacheng Tao | cs.LG, cs.AI | 2024-08-19 |
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description | Yizhang Jin, Jian Li, Jiangning Zhang, Jianlong Hu, Zhenye Gan, Xin Tan, Yong Liu, Yabiao Wang, Chengjie Wang, Lizhuang Ma | cs.CV, cs.AI | 2024-08-09 |
Wolf: Captioning Everything with a World Summarization Framework | Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone | cs.LG, cs.CL, cs.CV | 2024-07-26 |
Guided Latent Slot Diffusion for Object-Centric Learning | Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth | cs.CV, cs.LG | 2024-07-25 |
Generative artificial intelligence in dentistry: Current approaches and future challenges | Fabián Villena, Claudia Véliz, Rosario García-Huidobro, Sebastián Aguayo | cs.CL | 2024-07-24 |
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? | Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez | cs.CL, cs.AI, cs.CR, cs.CV, cs.LG | 2024-07-21 |
DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer | Jinfeng Wei, Xiaofeng Zhang | cs.CL, cs.AI | 2024-07-21 |
KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models | Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai | cs.AI | 2024-07-19 |
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks | Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai, Nathalie Baracaldo | cs.CR, cs.AI | 2024-07-17 |
Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts | Sam Yu-Te Lee, Aryaman Bahukhandi, Dongyu Liu, Kwan-Liu Ma | cs.HC | 2024-07-16 |
CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation | Kalliopi Basioti, Mohamed A. Abdelsalam, Federico Fancellu, Vladimir Pavlovic, Afsaneh Fazly | cs.CV, cs.AI, cs.CL, cs.LG | 2024-07-16 |
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion | Philipp Allgeuer, Kyra Ahrens, Stefan Wermter | cs.CV, cs.AI, cs.CL | 2024-07-15 |
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation | Seonghoon Yu, Paul Hongsuck Seo, Jeany Son | cs.CV, cs.AI | 2024-07-10 |
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification | Laura Fieback, Jakob Spiegelberg, Hanno Gottschalk | cs.CV, cs.CL, cs.LG, I.4 | 2024-05-29 |
Alt4Blind: A User Interface to Simplify Charts Alt-Text Creation | Omar Moured, Shahid Ali Farooqui, Karin Muller, Sharifeh Fadaeijouybari, Thorsten Schwarz, Mohammed Javed, Rainer Stiefelhagen | cs.CV, cs.HC | 2024-05-29 |
Automatic detection of cognitive impairment in elderly people using an entertainment chatbot with Natural Language Processing capabilities | Francisco de Arriba-Pérez, Silvia García-Méndez, Francisco J. González-Castaño, Enrique Costa-Montenegro | cs.AI, cs.CL, cs.HC, cs.LG | 2024-05-28 |
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text | Han Yu, Peikun Guo, Akane Sano | eess.SP, cs.AI | 2024-05-26 |
VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap | Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha | cs.CV, cs.AI, cs.CL | 2024-05-24 |
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence | Ali Kashefi | physics.flu-dyn, cs.LG | 2024-05-24 |
Calibrated Self-Rewarding Vision Language Models | Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao | cs.LG, cs.CL, cs.CV | 2024-05-23 |
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Young Kyun Jang, Donghyun Kim, Zihang Meng, Dat Huynh, Ser-Nam Lim | cs.CV, cs.AI | 2024-04-23 |
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications | Charith Chandra Sai Balne, Sreyoshi Bhaduri, Tamoghna Roy, Vinija Jain, Aman Chadha | cs.LG, cs.AI, cs.CL | 2024-04-21 |
Data Alignment for Zero-Shot Concept Generation in Dermatology AI | Soham Gadgil, Mahtab Bigverdi | cs.CV, cs.CL, cs.LG | 2024-04-19 |
Incubating Text Classifiers Following User Instruction with Nothing but LLM | Letian Peng, Jingbo Shang | cs.CL | 2024-04-16 |
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun | cs.AI, cs.CL, cs.CV | 2024-04-16 |
Contextual Chart Generation for Cyber Deception | David D. Nguyen, David Liebowitz, Surya Nepal, Salil S. Kanhere, Sharif Abuadbba | cs.LG, cs.AI, cs.CR | 2024-04-07 |
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching | Yang Yang | cs.CV, cs.AI, cs.LG | 2024-03-26 |
The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge | Dian Chao, Xin Song, Shupeng Zhong, Boyuan Wang, Xiangyu Wu, Chen Zhu, Yang Yang | cs.CV, cs.AI | 2024-03-26 |
UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Foundation Model for Urban Indicator Prediction | Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen, Yuxuan Liang | cs.CV, cs.AI | 2024-03-25 |
Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT | Rohit Raju, Peeta Basa Pati, SA Gandheesh, Gayatri Sanjana Sannala, Suriya KS | cs.CL | 2024-03-25 |
Visually Guided Generative Text-Layout Pre-training for Document Intelligence | Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong | cs.CL, cs.CV | 2024-03-25 |
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation | Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo | cs.CV, cs.AI | 2024-03-25 |
Dia-LLaMA: Towards Large Language Model-driven CT Report Generation | Zhixuan Chen, Luyang Luo, Yequan Bie, Hao Chen | cs.CV, cs.AI | 2024-03-25 |
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models | Minchan Kim, Minyeong Kim, Junik Bae, Suhwan Choi, Sungkyung Kim, Buru Chang | cs.CV, cs.CL | 2024-03-24 |
Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content | Zhicheng Du, Zhaotian Xie, Huazhang Ying, Likun Zhang, Peiwu Qin | cs.CV, cs.AI | 2024-03-23 |
InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection | Thales Bertaglia, Lily Heisig, Rishabh Kaushal, Adriana Iamnitchi | cs.CY, cs.CL, cs.SI | 2024-03-22 |
SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes | Timothee Mickus, Elaine Zosa, Raúl Vázquez, Teemu Vahtola, Jörg Tiedemann, Vincent Segonne, Alessandro Raganato, Marianna Apidianaki | cs.CL | 2024-03-12 |
Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning | Mark D. McDonnell, Dong Gong, Ehsan Abbasnejad, Anton van den Hengel | cs.CV, cs.LG | 2024-03-12 |
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains | Marvin Zammit, Antonios Liapis, Georgios N. Yannakakis | cs.NE | 2024-03-11 |
One Category One Prompt: Dataset Distillation using Diffusion Models | Ali Abbasi, Ashkan Shahbazi, Hamed Pirsiavash, Soheil Kolouri | cs.CV, cs.CL, cs.LG | 2024-03-11 |
Narrating Causal Graphs with Large Language Models | Atharva Phatak, Vijay K. Mago, Ameeta Agrawal, Aravind Inbasekaran, Philippe J. Giabbanelli | cs.CL | 2024-03-11 |
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Adarsh N L, Arun P V, Aravindh N L | cs.CV, cs.AI | 2024-03-11 |
Defending Against Unforeseen Failure Modes with Latent Adversarial Training | Stephen Casper, Lennart Schulze, Oam Patel, Dylan Hadfield-Menell | cs.CR, cs.AI, cs.LG | 2024-03-08 |
Enhancing Court View Generation with Knowledge Injection and Guidance | Ang Li, Yiquan Wu, Yifei Liu, Fei Wu, Ming Cai, Kun Kuang | cs.AI | 2024-03-07 |
Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity | Hagyeong Lee, Minkyu Kim, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Lee | cs.CV, cs.LG | 2024-03-05 |
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models | Neta Shaul, Uriel Singer, Ricky T. Q. Chen, Matthew Le, Ali Thabet, Albert Pumarola, Yaron Lipman | cs.LG, cs.AI, cs.CV | 2024-03-02 |
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples | Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee | cs.CV, cs.AI, cs.CL, cs.LG | 2024-02-20 |
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models | Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo Wang, Ruichao Yang | cs.CL, cs.AI | 2024-01-24 |
Large Language Models for Scientific Information Extraction: An Empirical Study for Virology | Mahsa Shamsabadi, Jennifer D’Souza, Sören Auer | cs.CL, cs.AI, cs.DL, cs.IT, math.IT | 2024-01-18 |
Textual Summarisation of Large Sets: Towards a General Approach | Kittipitch Kuptavanich, Ehud Reiter, Kees Van Deemter, Advaith Siddharthan | cs.CL | 2024-01-17 |
Jewelry Recognition via Encoder-Decoder Models | José M. Alcalde-Llergo, Enrique Yeguas-Bolívar, Andrea Zingoni, Alejandro Fuerte-Jurado | cs.CV, cs.AI | 2024-01-15 |
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic | Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng | cs.CL, cs.AI | 2024-01-14 |
PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes | Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller | cs.CL | 2024-01-12 |
Zur Darstellung eines mehrstufigen Prototypbegriffs in der multilingualen automatischen Sprachgenerierung: vom Korpus über word embeddings bis hin zum automatischen Wörterbuch | María José Domínguez Vázquez | cs.CL | 2023-12-26 |
Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models | Ling Li, Shaohua Li, Winda Marantika, Alex C. Kot, Huijing Zhan | cs.IR, cs.AI | 2023-12-24 |
Continuous Diffusion for Mixed-Type Tabular Data | Markus Mueller, Kathrin Gruber, Dennis Fok | cs.LG, stat.ML | 2023-12-16 |
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning | Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji | cs.CL | 2023-12-15 |
Fast Sampling via De-randomization for Discrete Diffusion Models | Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu | cs.LG, cs.AI, stat.ML | 2023-12-14 |
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer | Xinpeng Wang, Xiaoyuan Yi, Han Jiang, Shanlin Zhou, Zhihua Wei, Xing Xie | cs.CL, cs.AI | 2023-12-13 |
Multimodal Sentiment Analysis: Perceived vs Induced Sentiments | Aditi Aggarwal, Deepika Varshney, Saurabh Patel | cs.CV, cs.LG, cs.SI | 2023-12-12 |
Adaptive Compression of the Latent Space in Variational Autoencoders | Gabriela Sejnova, Michal Vavrecka, Karla Stepanova | cs.LG, cs.AI | 2023-12-11 |
Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation | Atoosa Chegini, Soheil Feizi | cs.CV, cs.LG | 2023-12-09 |
Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks | Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Ling Cai, Nathalie Baracaldo | cs.CR, cs.AI, cs.CL | 2023-12-07 |
Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation | Yifu Qiu, Varun Embar, Shay B. Cohen, Benjamin Han | cs.CL, cs.AI | 2023-11-16 |
GRIM: GRaph-based Interactive narrative visualization for gaMes | Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan | cs.CL | 2023-11-15 |
Zero-shot audio captioning with audio-language model guidance and audio context keywords | Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata | eess.AS, cs.AI, cs.CL, cs.SD | 2023-11-14 |
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion | Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia | cs.LG, cs.AI, cs.CV | 2023-11-07 |
Grounded Intuition of GPT-Vision’s Abilities with Scientific Images | Alyssa Hwang, Andrew Head, Chris Callison-Burch | cs.CL | 2023-11-03 |
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images | Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andrés Hernández, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Andrés Link, Pablo Arbeláez, Rahul Dodhia, Juan Lavista Ferres | cs.CV, cs.LG | 2023-11-02 |
Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning | Gaoang Wang, Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li | cs.CV, cs.AI | 2023-11-02 |
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements | Peter A. Zachares, Vahan Hovhannisyan, Alan Mosca, Yarin Gal | cs.LG | 2023-11-01 |
Woodpecker: Hallucination Correction for Multimodal Large Language Models | Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen | cs.CV, cs.AI, cs.CL, cs.LG | 2023-10-24 |
GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions | Ting-Yao Hsu, Chieh-Yang Huang, Ryan Rossi, Sungchul Kim, C. Lee Giles, Ting-Hao K. Huang | cs.CL | 2023-10-23 |
HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models | Vibhor Agarwal, Yu Chen, Nishanth Sastry | cs.CL | 2023-10-21 |
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering | Yuduo Wang, Pedram Ghamisi | cs.CV, cs.LG | 2023-10-19 |
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter | Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua | cs.CL, cs.MM | 2023-10-19 |
Motion2Language, Unsupervised learning of synchronized semantic motion segmentation | Karim Radouane, Andon Tchechmedjiev, Sylvie Ranwez, Julien Lagarde | cs.CV, cs.CL | 2023-10-16 |
BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation | Ji Qi, Kaixuan Ji, Jifan Yu, Duokang Wang, Bin Xu, Lei Hou, Juanzi Li | cs.CV, cs.CL | 2023-10-16 |
Prompting for Discovery: Flexible Sense-Making for AI Art-Making with Dreamsheets | Shm Garanganao Almeda, J. D. Zamfirescu-Pereira, Kyu Won Kim, Pradeep Mani Rathnam, Bjoern Hartmann | cs.HC | 2023-10-15 |
VLIS: Unimodal Language Models Guide Multimodal Language Generation | Jiwan Chung, Youngjae Yu | cs.CL, cs.AI | 2023-10-15 |
GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models | Yuanchun Shen, Ruotong Liao, Zhen Han, Yunpu Ma, Volker Tresp | cs.CL | 2023-10-12 |
CP-KGC: Constrained-Prompt Knowledge Graph Completion with Large Language Models | Rui Yang, Li Fang, Yi Zhou | cs.CL, cs.AI | 2023-10-12 |
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning | Junyu Lu, Dixiang Zhang, Xiaojun Wu, Xinyu Gao, Ruyi Gan, Jiaxing Zhang, Yan Song, Pingjian Zhang | cs.CL | 2023-10-12 |
Multimodal Graph Learning for Generative Tasks | Minji Yoon, Jing Yu Koh, Bryan Hooi, Ruslan Salakhutdinov | cs.AI | 2023-10-11 |
Video-CSR: Complex Video Digest Creation for Visual-Language Models | Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang | cs.CV, cs.AI | 2023-10-08 |
InstructProtein: Aligning Human and Protein Language via Knowledge Instruction | Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, Huajun Chen | q-bio.BM, cs.CL | 2023-10-05 |
Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning | Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, Yanqing Guo | cs.CV, cs.AI, cs.CL | 2023-09-10 |
Zero-Shot Audio Captioning via Audibility Guidance | Tal Shaharabany, Ariel Shaulov, Lior Wolf | cs.SD, cs.CL, eess.AS | 2023-09-07 |
Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation | Arvind Krishna Sridhar, Yinyi Guo, Erik Visser, Rehana Mahfuz | cs.CL, cs.MM, cs.SD | 2023-09-06 |
Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts | Hongyang Du, Guangyuan Liu, Dusit Niyato, Jiayi Zhang, Jiawen Kang, Zehui Xiong, Bo Ai, Dong In Kim | eess.IV, cs.LG, cs.NI | 2023-09-05 |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan | cs.LG, cs.CL, cs.CV | 2023-09-05 |
Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface | Atieh Taheri, Mohammad Izadi, Gururaj Shriram, Negar Rostamzadeh, Shaun Kane | cs.HC, J.5; J.6; I.2.7 | 2023-09-05 |
PromptTTS 2: Describing and Generating Voices with Text Prompt | Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian | eess.AS, cs.CL, cs.LG, cs.SD | 2023-09-05 |
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP | Vedant Palit, Rohan Pandey, Aryaman Arora, Paul Pu Liang | cs.CL, cs.AI, cs.CV | 2023-08-27 |
GeoExplainer: A Visual Analytics Framework for Spatial Modeling Contextualization and Report Generation | Fan Lei, Yuxin Ma, Stewart Fotheringham, Elizabeth Mack, Ziqi Li, Mehak Sachdeva, Sarah Bardin, Ross Maciejewski | cs.HC, cs.LG | 2023-08-25 |
Manipulating Embeddings of Stable Diffusion Prompts | Niklas Deckers, Julia Peters, Martin Potthast | cs.CV, cs.LG | 2023-08-23 |
CgT-GAN: CLIP-guided Text GAN for Image Captioning | Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He | cs.CV, cs.AI, cs.CL, cs.MM | 2023-08-23 |
Ceci n’est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings | Eugene Bagdasaryan, Vitaly Shmatikov | cs.CR, cs.AI, cs.LG | 2023-08-22 |
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning | Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying Shan | cs.SD, cs.AI, cs.CL, cs.MM, eess.AS | 2023-08-22 |
Random Word Data Augmentation with CLIP for Zero-Shot Anomaly Detection | Masato Tamura | cs.CV, cs.LG | 2023-08-22 |
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control | Zi-Yuan Hu, Yanyang Li, Michael R. Lyu, Liwei Wang | cs.CV, cs.AI, cs.CL, cs.LG | 2023-08-18 |
Can Knowledge Graphs Simplify Text? | Anthony Colas, Haodi Ma, Xuanli He, Yang Bai, Daisy Zhe Wang | cs.CL | 2023-08-14 |
Mirror Diffusion Models | Jaesung Tae | cs.LG | 2023-08-11 |
Generative Forests | Richard Nock, Mathieu Guillame-Bert | cs.LG, I.2.6 | 2023-08-07 |
FAST: Font-Agnostic Scene Text Editing | Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein | cs.CV, cs.MM | 2023-08-05 |
Guiding Image Captioning Models Toward More Specific Captions | Simon Kornblith, Lala Li, Zirui Wang, Thao Nguyen | cs.CV, cs.LG | 2023-07-31 |
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning | Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng | cs.CV, cs.CL | 2023-07-31 |
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences | Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin | cs.MM, cs.CV | 2023-07-31 |
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures | Kun Yuan, Vinkle Srivastav, Tong Yu, Joel Lavanchy, Pietro Mascagni, Nassir Navab, Nicolas Padoy | cs.CV, cs.AI | 2023-07-27 |
A Transformer-based Approach for Arabic Offline Handwritten Text Recognition | Saleh Momeni, Bagher BabaAli | cs.CV, cs.LG | 2023-07-27 |
Evaluating Generative Models for Graph-to-Text Generation | Shuzhou Yuan, Michael Färber | cs.CL, cs.AI | 2023-07-27 |
XDLM: Cross-lingual Diffusion Language Model for Machine Translation | Linyao Chen, Aosong Feng, Boming Yang, Zihui Li | cs.CL | 2023-07-25 |
Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts | Mayug Maniparambil, Chris Vorster, Derek Molloy, Noel Murphy, Kevin McGuinness, Noel E. O’Connor | cs.CV, cs.AI, cs.CL, cs.LG | 2023-07-21 |
OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? | Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr | cs.CV, cs.CL | 2023-07-21 |
Generating Image-Specific Text Improves Fine-grained Image Classification | Emily Mu, Kathleen M. Lewis, Adrian V. Dalca, John Guttag | cs.CV, cs.CL | 2023-07-21 |
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback | Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi | cs.CL, cs.CV, cs.LG | 2023-07-20 |
Improving Multimodal Datasets with Image Captioning | Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt | cs.LG, cs.CV | 2023-07-19 |
PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation | Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Minfeng Zhu, Baicheng Wang, Wei Chen | cs.AI, cs.HC | 2023-07-18 |
Reading Radiology Imaging Like The Radiologist | Yuhao Wang | cs.CV, cs.AI | 2023-07-12 |
Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging | Heejong Kim, Victor Ion Butoi, Adrian V. Dalca, Mert R. Sabuncu | eess.IV, cs.CV, cs.LG | 2023-07-06 |
Vision Language Transformers: A Survey | Clayton Fields, Casey Kennington | cs.CV, cs.AI, cs.CL, cs.LG | 2023-07-06 |
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment | Yongrae Jo, Seongyun Lee, Aiden SJ Lee, Hyunji Lee, Hanseok Oh, Minjoon Seo | cs.CV, cs.CL | 2023-07-05 |
A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis | Jiaxiang Liu, Tianxiang Hu, Yan Zhang, Xiaotang Gai, Yang Feng, Zuozhu Liu | eess.IV, cs.CV, cs.LG | 2023-07-05 |
More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data | Andrew Kean Gao | cs.CV, cs.LG, I.4.9, I.2.10 | 2023-07-01 |
Concept-Oriented Deep Learning with Large Language Models | Daniel T. Chang | cs.LG, cs.CL | 2023-06-29 |
Joint Level Generation and Translation Using Gameplay Videos | Negar Mirgati, Matthew Guzdial | cs.CV, cs.LG | 2023-06-29 |
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles | Haoqin Tu, Bowen Yang, Xianfeng Zhao | cs.CL | 2023-06-29 |
You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting | Xuan Ren, Lingqiao Liu | cs.CL, cs.AI, cs.LG | 2023-06-28 |
FunQA: Towards Surprising Video Comprehension | Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu | cs.CV, cs.AI, cs.CL, cs.MM | 2023-06-26 |
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation | Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin | cs.CL | 2023-06-23 |
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion | Simone Bianco, Luigi Celona, Marco Donzella, Paolo Napoletano | cs.CV, cs.AI, cs.CL, cs.DB, cs.LG | 2023-06-20 |
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models | Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye | cs.CV, cs.AI, cs.CL, cs.LG | 2023-06-16 |
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis | Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, Hongsheng Li | cs.CV, cs.AI, cs.DB | 2023-06-15 |
GBSD: Generative Bokeh with Stage Diffusion | Jieren Deng, Xin Zhou, Hao Tian, Zhihong Pan, Derek Aguiar | cs.CV, cs.AI | 2023-06-14 |
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models | Raz Lapid, Moshe Sipper | cs.CV, cs.NE | 2023-06-13 |
Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation | Yinda Chen, Che Liu, Wei Huang, Sibo Cheng, Rossella Arcucci, Zhiwei Xiong | cs.CV, cs.AI | 2023-06-07 |
On the Difference of BERT-style and CLIP-style Text Encoders | Zhihong Chen, Guiming Hardy Chen, Shizhe Diao, Xiang Wan, Benyou Wang | cs.CL | 2023-06-06 |
Putting Humans in the Image Captioning Loop | Aliki Anagnostopoulou, Mareike Hartmann, Daniel Sonntag | cs.CL, cs.CV | 2023-06-06 |
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding | Hang Zhang, Xin Li, Lidong Bing | cs.CL, cs.CV, cs.SD, eess.AS | 2023-06-05 |
Identifying the style by a qualified reader on a short fragment of generated poetry | Boris Orekhov | cs.CL, cs.AI, cs.LG | 2023-06-05 |
Multilingual Conceptual Coverage in Text-to-Image Models | Michael Saxon, William Yang Wang | cs.CL, cs.AI, cs.CV, eess.IV | 2023-06-02 |
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization | Jung Hyun Lee, Jeonghoon Kim, Se Jung Kwon, Dongsoo Lee | cs.LG, cs.AI | 2023-06-01 |
CapText: Large Language Model-based Caption Generation From Image Context and Description | Shinjini Ghosh, Sagnik Anupam | cs.LG, cs.CL | 2023-06-01 |
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting | Rita Ramos, Bruno Martins, Desmond Elliott | cs.CL, cs.CV | 2023-05-31 |
Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards | Guian Fang, Zutao Jiang, Jianhua Han, Guansong Lu, Hang Xu, Xiaodan Liang | cs.CV, cs.AI | 2023-05-31 |
Fine-grained Text Style Transfer with Diffusion-Based Language Models | Yiwei Lyu, Tiange Luo, Jiacheng Shi, Todd C. Hollon, Honglak Lee | cs.CL, cs.AI, cs.LG | 2023-05-31 |
Learning to Imagine: Visually-Augmented Natural Language Generation | Tianyi Tang, Yushuo Chen, Yifan Du, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen | cs.CL | 2023-05-26 |
Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing | Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Furu Wei | cs.CL | 2023-05-24 |
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors | Tuhin Chakrabarty, Arkadiy Saakyan, Olivia Winn, Artemis Panagopoulou, Yue Yang, Marianna Apidianaki, Smaranda Muresan | cs.CL, cs.AI, cs.CV, cs.HC | 2023-05-24 |
Gender Biases in Automatic Evaluation Metrics: A Case Study on Image Captioning | Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng | cs.CL | 2023-05-24 |
Process-To-Text: A Framework for the Quantitative Description of Processes in Natural Language | Yago Fontenla-Seco, Alberto Bugarín-Diz, Manuel Lama | cs.CL | 2023-05-23 |
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models | Dao Xuan-Quy, Le Ngoc-Bich, Vo The-Duy, Phan Xuan-Dung, Ngo Bac-Bien, Nguyen Van-Tien, Nguyen Thi-My-Thanh, Nguyen Hong-Phuoc | cs.CL | 2023-05-20 |
STOAT: Structured Data to Analytical Text With Controls | Deepanway Ghosal, Preksha Nema, Aravindan Raghuveer | cs.CL, cs.AI | 2023-05-19 |
Generating Visual Spatial Description via Holistic 3D Scene Understanding | Yu Zhao, Hao Fei, Wei Ji, Jianguo Wei, Meishan Zhang, Min Zhang, Tat-Seng Chua | cs.CV, cs.CL | 2023-05-19 |
Brain Captioning: Decoding human brain activity into images and text | Matteo Ferrante, Furkan Ozcelik, Tommaso Boccato, Rufin VanRullen, Nicola Toschi | cs.CV, cs.AI | 2023-05-19 |
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation | Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang | cs.CL | 2023-05-18 |
AIwriting: Relations Between Image Generation and Digital Writing | Scott Rettberg, Talan Memmott, Jill Walker Rettberg, Jason Nelson, Patrick Lichty | cs.AI, cs.CL, cs.HC, cs.MM, J.5 | 2023-05-18 |
ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval | Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, Chao Zhang | cs.CL, cs.IR, cs.LG | 2023-05-18 |
What You See is What You Read? Improving Text-Image Alignment Evaluation | Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor | cs.CL, cs.CV | 2023-05-17 |
Equivariant Few-Shot Learning from Pretrained Models | Sourya Basu, Pulkit Katdare, Prasanna Sattigeri, Vijil Chenthamarakshan, Katherine Driggs-Campbell, Payel Das, Lav R. Varshney | cs.LG, cs.AI, cs.CL, cs.CV | 2023-05-17 |
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation | Tong Wu, Zhihao Fan, Xiao Liu, Yeyun Gong, Yelong Shen, Jian Jiao, Hai-Tao Zheng, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen | cs.CL | 2023-05-16 |
Generative AI: Implications and Applications for Education | Anastasia Olga, Tzirides, Akash Saini, Gabriela Zapata, Duane Searsmith, Bill Cope, Mary Kalantzis, Vania Castro, Theodora Kourkoulou, John Jones, Rodrigo Abrantes da Silva, Jen Whiting, Nikoleta Polyxeni Kastania | cs.CY, cs.AI | 2023-05-12 |
Two-in-One: A Model Hijacking Attack Against Text Generation Models | Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem | cs.CR, cs.CL, cs.LG | 2023-05-12 |
Vision-Language Models in Remote Sensing: Current Progress and Future Trends | Congcong Wen, Yuan Hu, Xiang Li, Zhenghang Yuan, Xiao Xiang Zhu | cs.CV, cs.AI | 2023-05-09 |
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese | Doanh C. Bui, Nghia Hieu Nguyen, Khang Nguyen | cs.CV, cs.CL | 2023-05-07 |
Simulating H.P. Lovecraft horror literature with the ChatGPT large language model | Eduardo C. Garrido-Merchán, José Luis Arroyo-Barrigüete, Roberto Gozalo-Brihuela | cs.CL | 2023-05-05 |
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation | Xilun Chen, Lili Yu, Wenhan Xiong, Barlas Oğuz, Yashar Mehdad, Wen-tau Yih | cs.CV, cs.CL | 2023-05-04 |
Image Captioners Sometimes Tell More Than Images They See | Honori Udo, Takafumi Koshinaka | cs.CV, cs.MM | 2023-05-04 |
Governance of the AI, by the AI, and for the AI | Andrew W. Torrance, Bill Tomlinson | cs.CY, cs.AI | 2023-05-04 |
Controlled Text Generation with Natural Language Instructions | Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan | cs.CL, cs.AI, cs.LG | 2023-04-27 |
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping | Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang | cs.CV, cs.CL, cs.LG | 2023-04-26 |
RenderDiffusion: Text Generation as Image Generation | Junyi Li, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen | cs.CL, cs.CV, cs.LG | 2023-04-25 |
Token Imbalance Adaptation for Radiology Report Generation | Yuexin Wu, I-Chan Huang, Xiaolei Huang | cs.CL, cs.AI | 2023-04-18 |
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset | Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu | cs.LG, cs.CL, cs.CV, cs.MM, eess.AS | 2023-04-17 |
Improving Diffusion Models for Scene Text Editing with Dual Encoders | Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian Price, Shiyu Chang | cs.CV, cs.AI | 2023-04-12 |
ImageCaptioner$^2$: Image Captioner for Image Captioning Bias Amplification Assessment | Eslam Mohamed Bakr, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny | cs.CV, cs.AI, cs.LG | 2023-04-10 |
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions | Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny | cs.CV, cs.AI | 2023-04-09 |
Opinion Mining from YouTube Captions Using ChatGPT: A Case Study of Street Interviews Polling the 2023 Turkish Elections | Tuğrulcan Elmas, İlker Gül | cs.SI, cs.CY | 2023-04-07 |
DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model | Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun | cs.CV, cs.AI | 2023-04-06 |
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data | Vladislav Lialin, Stephen Rawls, David Chan, Shalini Ghosh, Anna Rumshisky, Wael Hamza | cs.CV, cs.CL | 2023-04-04 |
Cross-Domain Image Captioning with Discriminative Finetuning | Roberto Dessì, Michele Bevilacqua, Eleonora Gualdoni, Nathanael Carraz Rakotonirina, Francesca Franzon, Marco Baroni | cs.CV, cs.AI, cs.CL | 2023-04-04 |
Can AI Put Gamma-Ray Astrophysicists Out of a Job? | Samuel T. Spencer, Vikas Joshi, Alison M. W. Mitchell | physics.pop-ph, astro-ph.HE, cs.CL | 2023-03-31 |
Prefix tuning for automated audio captioning | Minkyu Kim, Kim Sung-Bin, Tae-Hyun Oh | eess.AS, cs.MM, cs.SD | 2023-03-30 |
GPT is becoming a Turing machine: Here are some ways to program it | Ana Jojic, Zhen Wang, Nebojsa Jojic | cs.CL | 2023-03-25 |
CoBIT: A Contrastive Bi-directional Image-Text Generation Model | Haoxuan You, Mandy Guo, Zhecan Wang, Kai-Wei Chang, Jason Baldridge, Jiahui Yu | cs.CV, cs.CL | 2023-03-23 |
Open-Vocabulary Object Detection using Pseudo Caption Labels | Han-Cheol Cho, Won Young Jhoo, Wooyoung Kang, Byungseok Roh | cs.CV, cs.AI | 2023-03-23 |
HIVE: Harnessing Human Feedback for Instructional Visual Editing | Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, Ran Xu | cs.CV, cs.AI, cs.CL, cs.HC, cs.LG | 2023-03-16 |
Text-to-image Diffusion Model in Generative AI: A Survey | Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, In So Kweon | cs.CV, cs.AI, cs.LG | 2023-03-14 |
Diffusion Models in NLP: A Survey | Yuansong Zhu, Yu Zhao | cs.CL, cs.AI | 2023-03-14 |
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation | Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, Yaowei Wang, David A. Clifton | cs.CL, cs.AI, cs.CV | 2023-03-11 |
Describe me an Aucklet: Generating Grounded Perceptual Category Descriptions | Bill Noble, Nikolai Ilinykh | cs.CL | 2023-03-07 |
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training | Wei Li, Linchao Zhu, Longyin Wen, Yi Yang | cs.CV, cs.AI, cs.CL | 2023-03-06 |
Interactive Text Generation | Felix Faltings, Michel Galley, Baolin Peng, Kianté Brantley, Weixin Cai, Yizhe Zhang, Jianfeng Gao, Bill Dolan | cs.CL | 2023-03-02 |
Few-Shot Table-to-Text Generation with Prompt-based Adapter | Zhixin Guo, Minyxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Zhouhan Lin, Guanjie Zheng, Xinbing Wang | cs.CL | 2023-02-24 |
Improved Training of Mixture-of-Experts Language GANs | Yekun Chai, Qiyue Yin, Junge Zhang | cs.CL | 2023-02-23 |
Improving User Controlled Table-To-Text Generation Robustness | Hanxu Hu, Yunqing Liu, Zhongyi Yu, Laura Perez-Beltrachini | cs.CL | 2023-02-20 |
Large Scale Multi-Lingual Multi-Modal Summarization Dataset | Yash Verma, Anubhav Jangra, Raghvendra Kumar, Sriparna Saha | cs.CL, cs.MM | 2023-02-13 |
Plan-then-Seam: Towards Efficient Table-to-Text Generation | Liang Li, Ruiying Geng, Chengyang Fang, Bing Li, Can Ma, Binhua Li, Yongbin Li | cs.CL | 2023-02-10 |
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning | Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar | cs.CV, cs.AI, cs.CL, cs.IR, cs.LG | 2023-02-09 |
Few-Shot Table-to-Text Generation with Prompt Planning and Knowledge Memorization | Zhixin Guo, Minyxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Zhouhan Lin, Guanjie Zheng, Xinbing Wang | cs.CL, cs.AI | 2023-02-09 |
Adversarial Prompting for Black Box Foundation Models | Natalie Maus, Patrick Chao, Eric Wong, Jacob Gardner | cs.LG | 2023-02-08 |
GPTScore: Evaluate as You Desire | Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, Pengfei Liu | cs.CL | 2023-02-08 |
Grounding Language Models to Images for Multimodal Generation | Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried | cs.CL, cs.AI, cs.CV, cs.LG | 2023-01-31 |
Semi-Parametric Video-Grounded Text Generation | Sungdong Kim, Jin-Hwa Kim, Jiyoung Lee, Minjoon Seo | cs.CV, cs.CL, cs.LG | 2023-01-27 |
Explaining Visual Biases as Words by Generating Captions | Younghyun Kim, Sangwoo Mo, Minkyu Kim, Kyungmin Lee, Jaeho Lee, Jinwoo Shin | cs.LG, cs.CV | 2023-01-26 |
MTTN: Multi-Pair Text to Text Narratives for Prompt Generation | Archan Ghosh, Debgandhar Ghosh, Madhurima Maji, Suchinta Chanda, Kalporup Goswami | cs.CL, cs.LG | 2023-01-21 |
Regeneration Learning: A Learning Paradigm for Data Generation | Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio | cs.LG, cs.AI, cs.CL, cs.CV, eess.AS | 2023-01-21 |
Universal Multimodal Representation for Language Understanding | Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao | cs.CL, cs.AI, cs.CV | 2023-01-09 |
An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU) | Rana Adnan Ahmad, Muhammad Azhar, Hina Sattar | cs.CV, cs.AI | 2023-01-06 |
Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach | Miao Chen, Xinjiang Lu, Tong Xu, Yanyan Li, Jingbo Zhou, Dejing Dou, Hui Xiong | cs.CL, cs.AI | 2023-01-05 |
eVAE: Evolutionary Variational Autoencoder | Zhangkai Wu, Longbing Cao, Lei Qi | cs.NE, cs.LG | 2023-01-01 |
MAUVE Scores for Generative Models: Theory and Practice | Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui | cs.LG, cs.AI, cs.CL | 2022-12-30 |
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning | Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh | cs.CV, cs.AI | 2022-12-27 |
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective | Ying Wen, Ziyu Wan, Ming Zhou, Shufang Hou, Zhe Cao, Chenyang Le, Jingxiao Chen, Zheng Tian, Weinan Zhang, Jun Wang | cs.AI, cs.LG | 2022-12-24 |
Do DALL-E and Flamingo Understand Each Other? | Hang Li, Jindong Gu, Rajat Koner, Sahand Sharifzadeh, Volker Tresp | cs.CV, cs.LG | 2022-12-23 |
A survey on text generation using generative adversarial networks | Gustavo Henrique de Rosa, João Paulo Papa | cs.CL, cs.AI, cs.LG | 2022-12-20 |
SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers | Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, Songfang Huang | cs.CL | 2022-12-20 |
One Embedder, Any Task: Instruction-Finetuned Text Embeddings | Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu | cs.CL | 2022-12-19 |
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning | Ukyo Honda, Taro Watanabe, Yuji Matsumoto | cs.CV, cs.CL | 2022-12-06 |
Towards Generating Diverse Audio Captions via Adversarial Training | Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang | eess.AS, cs.AI, cs.MM, cs.SD | 2022-12-05 |
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation | Faeze Brahman, Baolin Peng, Michel Galley, Sudha Rao, Bill Dolan, Snigdha Chaturvedi, Jianfeng Gao | cs.CL | 2022-12-04 |
Learning Automata-Based Task Knowledge Representation from Large-Scale Generative Language Models | Yunhao Yang, Jean-Raphaël Gaglione, Ufuk Topcu | cs.FL, cs.CL | 2022-12-04 |
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation | Zutao Jiang, Guangsong Lu, Xiaodan Liang, Jihua Zhu, Wei Zhang, Xiaojun Chang, Hang Xu | cs.CV, cs.AI | 2022-12-02 |
On the Importance of Image Encoding in Automated Chest X-Ray Report Generation | Otabek Nazarov, Mohammad Yaqub, Karthik Nandakumar | cs.CV, cs.AI | 2022-11-24 |
Retrieval-Augmented Multimodal Language Modeling | Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih | cs.CV, cs.CL, cs.LG | 2022-11-22 |
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning | Fenglin Liu, Xian Wu, Chenyu You, Shen Ge, Yuexian Zou, Xu Sun | cs.CV, cs.LG | 2022-11-22 |
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation | Jie Ruan, Yue Wu, Xiaojun Wan, Yuesheng Zhu | cs.CV, cs.CL | 2022-11-20 |
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired | Kazuya Ohata, Shunsuke Kitada, Hitoshi Iyatomi | cs.CV, cs.AI, cs.CL, cs.HC, cs.LG | 2022-11-17 |
PromptCap: Prompt-Guided Task-Aware Image Captioning | Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo | cs.CV, cs.CL | 2022-11-15 |
CCPrompt: Counterfactual Contrastive Prompt-Tuning for Many-Class Classification | Yang Li, Canran Xu, Tao Shen, Jing Jiang, Guodong Long | cs.CL | 2022-11-11 |
Self-conditioned Embedding Diffusion for Text Generation | Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Rémi Leblond | cs.CL, cs.LG | 2022-11-08 |
Semantic Metadata Extraction from Dense Video Captioning | Johannes Scherer, Ansgar Scherp, Deepayan Bhowmik | cs.CV, cs.CL | 2022-11-05 |
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers | Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu | cs.CV, cs.LG | 2022-11-02 |
CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation | Yihong Dong, Ge Li | cs.SE, cs.AI | 2022-11-02 |
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control | Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov | cs.CL, cs.LG | 2022-10-31 |
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention | Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang | eess.AS, cs.AI, cs.MM, cs.SD | 2022-10-28 |
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards | Jean-Benoit Delbrouck, Pierre Chambon, Christian Bluethgen, Emily Tsai, Omar Almusa, Curtis P. Langlotz | cs.CL, cs.AI | 2022-10-21 |
Image Semantic Relation Generation | Mingzhe Du | cs.CV, cs.CL | 2022-10-19 |
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining | Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu | cs.CL, cs.AI | 2022-10-19 |
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective | Zheng Ma, Shi Zong, Mianzhi Pan, Jianbing Zhang, Shujian Huang, Xinyu Dai, Jiajun Chen | cs.CL | 2022-10-18 |
UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image | Dani Valevski, Matan Kalman, Yossi Matias, Yaniv Leviathan | cs.CV, cs.GR, cs.LG | 2022-10-17 |
Social Biases in Automatic Evaluation Metrics for NLG | Mingqi Gao, Xiaojun Wan | cs.CL, cs.AI | 2022-10-17 |
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training | Wenliang Dai, Zihan Liu, Ziwei Ji, Dan Su, Pascale Fung | cs.CL, cs.CV | 2022-10-14 |
Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models | Sourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, Payel Das | cs.LG, cs.CL | 2022-10-13 |
Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis | Wenda Xu, Yilin Tuan, Yujie Lu, Michael Saxon, Lei Li, William Yang Wang | cs.CL, cs.AI | 2022-10-10 |
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning | Shitong Xu | cs.CV, cs.LG | 2022-10-10 |
ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models | Jiannan Xiang, Zhengzhong Liu, Yucheng Zhou, Eric P. Xing, Zhiting Hu | cs.CL | 2022-10-09 |
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation | Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang | cs.CL, cs.AI | 2022-10-07 |
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters | Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth | cs.CL | 2022-10-07 |
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals | Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, Richard Evans | cs.HC, cs.CL | 2022-09-29 |
XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages | Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma | cs.CL | 2022-09-22 |
Distribution Aware Metrics for Conditional Natural Language Generation | David M Chan, Yiming Ni, Austin Myers, Sudheendra Vijayanarasimhan, David A Ross, John Canny | cs.CL, cs.AI, cs.CV, cs.LG | 2022-09-15 |
PaLI: A Jointly-Scaled Multilingual Language-Image Model | Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut | cs.CV, cs.CL | 2022-09-14 |
Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows | Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori | cs.CL, cs.AI | 2022-09-13 |
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation | Peining Zhang, Junliang Guo, Linli Xu, Mu You, Junming Yin | cs.SD, cs.CL, eess.AS | 2022-09-05 |
Every picture tells a story: Image-grounded controllable stylistic story generation | Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy Chung, Pascale Fung | cs.CL | 2022-09-04 |
Understanding Attention for Vision-and-Language Tasks | Feiqi Cao, Soyeon Caren Han, Siqu Long, Changwei Xu, Josiah Poon | cs.CV, cs.CL | 2022-08-17 |
ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model | Rao Fu, Xiao Zhan, Yiwen Chen, Daniel Ritchie, Srinath Sridhar | cs.CV, cs.AI | 2022-07-19 |
A Baseline for Detecting Out-of-Distribution Examples in Image Captioning | Gabi Shalev, Gal-Lev Shalev, Joseph Keshet | cs.CV, cs.LG | 2022-07-12 |
Towards Multimodal Vision-Language Models Generating Non-Generic Text | Wes Robbins, Zanyar Zohourianshahzadi, Jugal Kalita | cs.CV, cs.AI | 2022-07-09 |
Dual-Stream Transformer for Generic Event Boundary Captioning | Xin Gu, Hanhua Ye, Guang Chen, Yufei Wang, Libo Zhang, Longyin Wen | cs.CV, cs.CL | 2022-07-07 |
Syntax Controlled Knowledge Graph-to-Text Generation with Order and Semantic Consistency | Jin Liu, Chongfeng Fan, Fengyu Zhou, Huijuan Xu | cs.AI | 2022-07-02 |
Automatic Controllable Product Copywriting for E-Commerce | Xiaojie Guo, Qingkai Zeng, Meng Jiang, Yun Xiao, Bo Long, Lingfei Wu | cs.AI, cs.LG | 2022-06-21 |
niksss at HinglishEval: Language-agnostic BERT-based Contextual Embeddings with Catboost for Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text | Nikhil Singh | cs.CL | 2022-06-17 |
Prefix Language Models are Unified Modal Learners | Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang | cs.CV, cs.CL, cs.LG | 2022-06-15 |
Exploring industrial safety knowledge via Zipf law | Zhenhua Wang, Ming Ren, Dong Gao, Zhuang Li | cs.CL | 2022-05-25 |
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training | Gi-Cheon Kang, Sungdong Kim, Jin-Hwa Kim, Donghyun Kwak, Byoung-Tak Zhang | cs.CV, cs.CL, cs.LG | 2022-05-25 |
Rethinking Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization | Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh | cs.CL, cs.AI, cs.CV, cs.LG | 2022-05-24 |
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization | Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasovic | cs.CL, cs.CV | 2022-05-24 |
What Makes Data-to-Text Generation Hard for Pretrained Language Models? | Moniba Keymanesh, Adrian Benton, Mark Dredze | cs.CL, cs.AI, cs.IR, cs.LG | 2022-05-23 |
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners | Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji | cs.CV, cs.AI | 2022-05-22 |
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics | Elisa Kreiss, Cynthia Bennett, Shayan Hooshmand, Eric Zelikman, Meredith Ringel Morris, Christopher Potts | cs.CL | 2022-05-21 |
It Isn’t Sh!tposting, It’s My CAT Posting | Parthsarthi Rawat, Sayan Das, Jorge Aguirre, Akhil Daphara | cs.CV, cs.AI, cs.LG | 2022-05-18 |
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training | Constantin Seibold, Simon Reiß, M. Saquib Sarfraz, Rainer Stiefelhagen, Jens Kleesiek | cs.CV, cs.LG | 2022-05-14 |
Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning | Fei Wang, Zhewei Xu, Pedro Szekely, Muhao Chen | cs.CL, cs.AI, cs.LG | 2022-05-08 |
RoViST:Learning Robust Metrics for Visual Storytelling | Eileen Wang, Caren Han, Josiah Poon | cs.CV, cs.AI | 2022-05-08 |
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information | Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang | cs.CL, cs.CV, cs.MM | 2022-05-07 |
Language Models Can See: Plugging Visual Controls in Text Generation | Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, Nigel Collier | cs.CV, cs.CL | 2022-05-05 |
Diverse Image Captioning with Grounded Style | Franz Klein, Shweta Mahajan, Stefan Roth | cs.CV, cs.LG | 2022-05-03 |
Cross-modal Memory Networks for Radiology Report Generation | Zhihong Chen, Yaling Shen, Yan Song, Xiang Wan | cs.CL | 2022-04-28 |
Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR) | Amelie Wührl, Roman Klinger | cs.CL, cs.IR | 2022-04-21 |
Evaluating Mixed-initiative Conversational Search Systems via User Simulation | Ivan Sekulić, Mohammad Aliannejadi, Fabio Crestani | cs.CL, cs.IR | 2022-04-17 |
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures | Giovanni Bonetta, Matteo Ribero, Rossella Cancelliere | cs.CL, cs.AI | 2022-04-11 |
Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention | Yu Yang, Seungbae Kim, Jungseock Joo | cs.CV, cs.AI, cs.LG | 2022-04-10 |
On Distinctive Image Captioning via Comparing and Reweighting | Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan | cs.CV, cs.AI | 2022-04-08 |
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations | Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata | cs.CV, cs.CL | 2022-04-05 |
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language | Andy Zeng, Adrian Wong, Stefan Welker, Krzysztof Choromanski, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence | cs.CV, cs.AI, cs.CL, cs.LG | 2022-04-01 |
Neural Pipeline for Zero-Shot Data-to-Text Generation | Zdeněk Kasner, Ondřej Dušek | cs.CL | 2022-03-30 |
GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models | Changye Li, David Knopman, Weizhe Xu, Trevor Cohen, Serguei Pakhomov | cs.CL | 2022-03-25 |
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization | Shankar Kanthara, Rixie Tiffany Ko Leong, Xiang Lin, Ahmed Masry, Megh Thakkar, Enamul Hoque, Shafiq Joty | cs.CL | 2022-03-12 |
Compilable Neural Code Generation with Compiler Feedback | Xin Wang, Yasheng Wang, Yao Wan, Fei Mi, Yitong Li, Pingyi Zhou, Jin Liu, Hao Wu, Xin Jiang, Qun Liu | cs.CL, cs.AI, cs.PL | 2022-03-10 |
How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity | Chengyue Gong, Lemeng Wu, Qiang Liu | cs.LG, cs.CV | 2022-02-16 |
Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation | Ahmad Hammoudeh, Bastein Vanderplaetse, Stéphane Dupont | cs.CV, cs.AI | 2022-02-11 |
Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang | cs.CV, cs.CL | 2022-02-07 |
XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages | Tushar Abhishek, Shivprasad Sagare, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma | cs.CL | 2022-02-01 |
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment | Luis Lebron, Yvette Graham, Kevin McGuinness, Konstantinos Kouramas, Noel E. O’Connor | cs.CV, cs.LG | 2022-01-25 |
Pre-Trained Language Transformers are Universal Image Classifiers | Rahul Goel, Modar Sulaiman, Kimia Noorbakhsh, Mahdi Sharifi, Rajesh Sharma, Pooyan Jamshidi, Kallol Roy | cs.CV, cs.AI | 2022-01-25 |
An Integrated Approach for Video Captioning and Applications | Soheyla Amirian, Thiab R. Taha, Khaled Rasheed, Hamid R. Arabnia | cs.CV, cs.AI | 2022-01-23 |
Inferring Commonsense Explanations as Prompts for Future Event Generation | Li Lin, Yixin Cao, Lifu Huang, Shuang Li, Xuming Hu, Lijie Wen, Jianmin Wang | cs.CL, cs.LG, I.2.7; I.2.4 | 2022-01-18 |
Local Information Assisted Attention-free Decoder for Audio Captioning | Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Haiyan Lan, Wenwu Wang | cs.SD, cs.LG, eess.AS | 2022-01-10 |
Self-Training Vision Language BERTs with a Unified Conditional Model | Xiaofeng Yang, Fengmao Lv, Fayao Liu, Guosheng Lin | cs.CV, cs.CL | 2022-01-06 |
Compact Bidirectional Transformer for Image Captioning | Yuanen Zhou, Zhenzhen Hu, Daqing Liu, Huixia Ben, Meng Wang | cs.CV, cs.CL | 2022-01-06 |
StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams | Chengxi Li, Brent Harrison | cs.CV, cs.AI, cs.CL | 2022-01-04 |
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation | Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang | cs.CV, cs.CL | 2021-12-31 |
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment | Shuxin Yang, Xian Wu, Shen Ge, Xingwang Wu, S. Kevin Zhou, Li Xiao | eess.IV, cs.CL, cs.CV | 2021-12-30 |
Automatic Product Copywriting for E-Commerce | Xueying Zhang, Yanyan Zou, Hainan Zhang, Jing Zhou, Shiliang Diao, Jiajia Chen, Zhuoye Ding, Zhen He, Xueqi He, Yun Xiao, Bo Long, Han Yu, Lingfei Wu | cs.CL, cs.AI | 2021-12-15 |
Contextualized Scene Imagination for Generative Commonsense Reasoning | PeiFeng Wang, Jonathan Zamora, Junfeng Liu, Filip Ilievski, Muhao Chen, Xiang Ren | cs.CL | 2021-12-12 |
Improving Logical-Level Natural Language Generation with Topic-Conditioned Data Augmentation and Logical Form Generation | Ao Liu, Congjian Luo, Naoaki Okazaki | cs.CL | 2021-12-12 |
Show and Write: Entity-aware News Generation with Image Information | Zhongping Zhang, Yiwen Gu, Bryan A. Plummer | cs.CL | 2021-12-11 |
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation | Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang | cs.CV, cs.CL, cs.LG | 2021-12-10 |
Self-Supervised Image-to-Text and Text-to-Image Synthesis | Anindya Sundar Das, Sriparna Saha | cs.CV, cs.CL, cs.LG | 2021-12-09 |
Search and Learn: Improving Semantic Coverage for Data-to-Text Generation | Shailza Jolly, Zi Xuan Zhang, Andreas Dengel, Lili Mou | cs.CL | 2021-12-06 |
Protecting Intellectual Property of Language Generation APIs with Lexical Watermark | Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, Chenguang Wang | cs.CR, cs.CL | 2021-12-05 |
Representation Learning for Conversational Data using Discourse Mutual Information Maximization | Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, Pawan Goyal | cs.CL | 2021-12-04 |
LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training | Ningyu Zhang, Hongbin Ye, Jiacheng Yang, Shumin Deng, Chuanqi Tan, Mosha Chen, Songfang Huang, Fei Huang, Huajun Chen | cs.CL, cs.AI | 2021-12-02 |
Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation | Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi | cs.CV, cs.CL, cs.LG | 2021-12-01 |
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic | Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf | cs.CV, cs.AI, cs.CL | 2021-11-29 |
LAFITE: Towards Language-Free Training for Text-to-Image Generation | Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun | cs.CV, cs.LG | 2021-11-27 |
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences | Moritz Ibing, Gregor Kobsik, Leif Kobbelt | cs.CV, cs.GR, cs.LG | 2021-11-24 |
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion | Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan | cs.CV, cs.AI | 2021-11-24 |
Scaling Up Vision-Language Pre-training for Image Captioning | Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang | cs.CV, cs.CL | 2021-11-24 |
L-Verse: Bidirectional Generation Between Image and Text | Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae | cs.CV, cs.CL, cs.LG | 2021-11-22 |