Image2text
20 Nov 2021written on 2024-11-28
title | authors | categories | displaydate |
---|---|---|---|
RDF-to-Text Generation with Reinforcement Learning Based Graph-augmented Structural Neural Encoders | Hanning Gao, Lingfei Wu, Po Hu, Zhihua Wei, Fangli Xu, Bo Long | cs.CL, cs.AI | 2021-11-20 |
Transparent Human Evaluation for Image Captioning | Jungo Kasai, Keisuke Sakaguchi, Lavinia Dunagan, Jacob Morrison, Ronan Le Bras, Yejin Choi, Noah A. Smith | cs.CL, cs.CV | 2021-11-17 |
Explaining Face Presentation Attack Detection Using Natural Language | Hengameh Mirzaalian, Mohamed E. Hussein, Leonidas Spinoulas, Jonathan May, Wael Abd-Almageed | cs.CV, cs.AI, cs.CL, cs.CR | 2021-11-08 |
Machine-in-the-Loop Rewriting for Creative Image Captioning | Vishakh Padmakumar, He He | cs.CL | 2021-11-07 |
Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image Captioning | Yang Yang, Hongchen Wei, Hengshu Zhu, Dianhai Yu, Hui Xiong, Jian Yang | cs.CV, cs.AI | 2021-10-22 |
SciCap: Generating Captions for Scientific Figures | Ting-Yao Hsu, C. Lee Giles, Ting-Hao ‘Kenneth’ Huang | cs.CL, cs.AI, cs.CV | 2021-10-22 |
SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation | Hong Chen, Hiroya Takamura, Hideki Nakayama | cs.CL | 2021-10-20 |
A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation | Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu | cs.CV, cs.CL, cs.MM | 2021-10-19 |
Unifying Multimodal Transformer for Bi-directional Image and Text Generation | Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu | cs.CV, cs.CL, cs.MM | 2021-10-19 |
Self-Annotated Training for Controllable Image Captioning | Zhangzi Zhu, Tianlei Wang, Hong Qu | cs.AI, cs.CV | 2021-10-16 |
How Well Do You Know Your Audience? Reader-aware Question Generation | Ian Stewart, Rada Mihalcea | cs.CL, I.7 | 2021-10-16 |
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation | Aditya Sanghi, Hang Chu, Joseph G. Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero | cs.CV, cs.AI, 68T07, I.2.10 | 2021-10-06 |
CIDEr-R: Robust Consensus-based Image Description Evaluation | Gabriel Oliveira dos Santos, Esther Luna Colombini, Sandra Avila | cs.CV, cs.CL | 2021-09-28 |
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation | An Yan, Zexue He, Xing Lu, Jiang Du, Eric Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu | cs.CL | 2021-09-25 |
An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog | Xingyao Wang, David Jurgens | cs.CL, cs.CV, cs.CY | 2021-09-24 |
Style Control for Schema-Guided Natural Language Generation | Alicia Y. Tsai, Shereen Oraby, Vittorio Perera, Jiun-Yu Kao, Yuheng Du, Anjali Narayan-Chen, Tagyoung Chung, Dilek Hakkani-Tur | cs.CL | 2021-09-24 |
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models | Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei | cs.CL, cs.CV | 2021-09-21 |
PluGeN: Multi-Label Conditional Generation From Pre-Trained Models | Maciej Wołczyk, Magdalena Proszewska, Łukasz Maziarka, Maciej Zięba, Patryk Wielopolski, Rafał Kurczab, Marek Śmieja | cs.LG | 2021-09-18 |
Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning | Shikha Dubey, Farrukh Olimov, Muhammad Aasim Rafique, Joonmo Kim, Moongu Jeon | cs.CV, cs.AI | 2021-09-16 |
UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation | Zhengkun Zhang, Xiaojun Meng, Yasheng Wang, Xin Jiang, Qun Liu, Zhenglu Yang | cs.CL | 2021-09-13 |
COSMic: A Coherence-Aware Generation Metric for Image Descriptions | Mert İnan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone, Malihe Alikhani | cs.CL, cs.AI, cs.CV, cs.LG | 2021-09-11 |
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models | Steven Y. Feng, Kevin Lu, Zhuofu Tao, Malihe Alikhani, Teruko Mitamura, Eduard Hovy, Varun Gangal | cs.CL, cs.AI, cs.LG | 2021-09-08 |
Sequence Level Contrastive Learning for Text Summarization | Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei | cs.CL | 2021-09-08 |
Multimodal Conditionality for Natural Language Generation | Michael Sollami, Aashish Jain | cs.CL, cs.LG | 2021-09-02 |
Goal-driven text descriptions for images | Ruotian Luo | cs.CV, cs.CL | 2021-08-28 |
Automatic Text Evaluation through the Lens of Wasserstein Barycenters | Pierre Colombo, Guillaume Staerman, Chloe Clavel, Pablo Piantanida | cs.CL, cs.AI | 2021-08-27 |
CGEMs: A Metric Model for Automatic Code Generation using GPT-3 | Aishwarya Narasimhan, Krishna Prasad Agara Venkatesha Rao, Veena M B | cs.AI | 2021-08-23 |
Group-based Distinctive Image Captioning with Memory Attention | Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan | cs.CV, cs.CL, cs.LG | 2021-08-20 |
CIGLI: Conditional Image Generation from Language & Image | Xiaopeng Lu, Lynnette Ng, Jared Fernandez, Hao Zhu | cs.CV, cs.CL | 2021-08-20 |
Table Caption Generation in Scholarly Documents Leveraging Pre-trained Language Models | Junjie H. Xu, Kohei Shinden, Makoto P. Kato | cs.CL | 2021-08-18 |
Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards | Angelina McMillan-Major, Salomey Osei, Juan Diego Rodriguez, Pawan Sasanka Ammanamanchi, Sebastian Gehrmann, Yacine Jernite | cs.DB, cs.CL | 2021-08-16 |
AutoChart: A Dataset for Chart-to-Text Generation Task | Jiawen Zhu, Jinye Ran, Roy Ka-wei Lee, Kenny Choo, Zhi Li | cs.CL, cs.AI, cs.MM | 2021-08-16 |
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation | Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang | cs.CL, cs.IR | 2021-08-15 |
Generating Diverse Descriptions from Semantic Graphs | Jiuzhou Han, Daniel Beck, Trevor Cohn | cs.CL | 2021-08-12 |
ICECAP: Information Concentrated Entity-aware Image Captioning | Anwen Hu, Shizhe Chen, Qin Jin | cs.CV, cs.MM | 2021-08-04 |
Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation | Zaid Khan, Yun Fu | cs.CL, cs.CV | 2021-08-03 |
Logic-Consistency Text Generation from Semantic Parses | Chang Shu, Yusen Zhang, Xiangyu Dong, Peng Shi, Tao Yu, Rui Zhang | cs.CL | 2021-08-02 |
Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining | Guowei Xu, Wenbiao Ding, Weiping Fu, Zhongqin Wu, Zitao Liu | cs.CL, cs.AI | 2021-07-15 |
From Show to Tell: A Survey on Image Captioning | Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara | cs.CV, cs.CL | 2021-07-14 |
Between Flexibility and Consistency: Joint Generation of Captions and Subtitles | Alina Karakanta, Marco Gaido, Matteo Negri, Marco Turchi | cs.CL | 2021-07-13 |
Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals | Guillaume Cabanac, Cyril Labbé, Alexander Magazinov | cs.DL, cs.CL, cs.CY, cs.IR | 2021-07-12 |
Structured Denoising Diffusion Models in Discrete State-Spaces | Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, Rianne van den Berg | cs.LG, cs.AI, cs.CL, cs.CV | 2021-07-07 |
Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation | Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Zhen Li, Bowen Zhou, Shuguang Cui, Zhiting Hu | cs.CL, cs.AI | 2021-06-29 |
UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning | Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Kyomin Jung | cs.CL, cs.CV | 2021-06-26 |
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation | Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang | cs.CL, cs.AI, cs.CV | 2021-06-10 |
Sketch and Refine: Towards Faithful and Informative Table-to-Text Generation | Peng Wang, Junyang Lin, An Yang, Chang Zhou, Yichang Zhang, Jingren Zhou, Hongxia Yang | cs.CL | 2021-05-31 |
Dependent Multi-Task Learning with Causal Intervention for Image Captioning | Wenqing Chen, Jidong Tian, Caoyun Fan, Hao He, Yaohui Jin | cs.LG, cs.CV, eess.IV | 2021-05-18 |
Multi-Modal Image Captioning for the Visually Impaired | Hiba Ahsan, Nikita Bhalla, Daivat Bhatt, Kaivankumar Shah | cs.CL | 2021-05-17 |
Passage Retrieval for Outside-Knowledge Visual Question Answering | Chen Qu, Hamed Zamani, Liu Yang, W. Bruce Croft, Erik Learned-Miller | cs.IR | 2021-05-09 |
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks | Maxime Kayser, Oana-Maria Camburu, Leonard Salewski, Cornelius Emde, Virginie Do, Zeynep Akata, Thomas Lukasiewicz | cs.CV, cs.CL, cs.LG | 2021-05-08 |
Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning | Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto | cs.CL, cs.CV | 2021-04-28 |
MusCaps: Generating Captions for Music Audio | Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas | cs.SD, cs.CL, cs.LG, eess.AS | 2021-04-24 |
Imaginative Walks: Generative Random Walk Deviation Loss for Improved Unseen Learning Representation | Mohamed Elhoseiny, Divyansh Jha, Kai Yi, Ivan Skorokhodov | cs.CV, cs.AI | 2021-04-20 |
Learning to Reason for Text Generation from Scientific Tables | Nafise Sadat Moosavi, Andreas Rücklé, Dan Roth, Iryna Gurevych | cs.CL | 2021-04-16 |
IGA : An Intent-Guided Authoring Assistant | Simeng Sun, Wenlong Zhao, Varun Manjunatha, Rajiv Jain, Vlad Morariu, Franck Dernoncourt, Balaji Vasan Srinivasan, Mohit Iyyer | cs.CL | 2021-04-14 |
Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning | Soheyla Amirian, Khaled Rasheed, Thiab R. Taha, Hamid R. Arabnia | cs.CV, cs.AI, cs.LG | 2021-04-07 |
On Hallucination and Predictive Uncertainty in Conditional Language Generation | Yijun Xiao, William Yang Wang | cs.CL | 2021-03-28 |
Relationship-based Neural Baby Talk | Fan Fu, Tingting Xie, Ioannis Patras, Sepehr Jalali | cs.CV, cs.AI | 2021-03-08 |
Controllable and Diverse Text Generation in E-commerce | Huajie Shao, Jun Wang, Haohong Lin, Xuezhou Zhang, Aston Zhang, Heng Ji, Tarek Abdelzaher | cs.LG | 2021-02-23 |
Progressive Transformer-Based Generation of Radiology Reports | Farhad Nooralahzadeh, Nicolas Perez Gonzalez, Thomas Frauenfelder, Koji Fujimoto, Michael Krauthammer | cs.CL | 2021-02-19 |
Annotation Cleaning for the MSR-Video to Text Dataset | Haoran Chen, Jianmin Li, Simone Frintrop, Xiaolin Hu | cs.CV, cs.LG, 68T45, 68T50, I.2.10; I.2.7 | 2021-02-12 |
SG2Caps: Revisiting Scene Graphs for Image Captioning | Subarna Tripathi, Kien Nguyen, Tanaya Guha, Bang Du, Truong Q. Nguyen | cs.CV, cs.CL | 2021-02-09 |
Controlling Hallucinations at Word Level in Data-to-Text Generation | Clément Rebuffel, Marco Roberti, Laure Soulier, Geoffrey Scoutheeten, Rossella Cancelliere, Patrick Gallinari | cs.CL, cs.AI, cs.LG, cs.NE, 68T50 (Primary), 68T07 (Secondary), 68T05, I.2.6; I.2.7 | 2021-02-04 |
Unifying Vision-and-Language Tasks via Text Generation | Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal | cs.CL, cs.AI, cs.CV, cs.LG | 2021-02-04 |
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics | Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou | cs.CL, cs.AI, cs.LG | 2021-02-02 |
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search | Federico A. Galatolo, Mario G. C. A. Cimino, Gigliola Vaglini | cs.NE, cs.AI, cs.LG | 2021-02-02 |
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs | Xudong Lin, Gedas Bertasius, Jue Wang, Shih-Fu Chang, Devi Parikh, Lorenzo Torresani | cs.CV, cs.CL | 2021-01-28 |
GUIGAN: Learning to Generate GUI Designs Using Generative Adversarial Networks | Tianming Zhao, Chunyang Chen, Yuanning Liu, Xiaodong Zhu | cs.HC, cs.CV, cs.LG, cs.SE | 2021-01-25 |
Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line Charts | Dae Hyun Kim, Vidya Setlur, Maneesh Agrawala | cs.HC | 2021-01-20 |
Narration Generation for Cartoon Videos | Nikos Papasarantopoulos, Shay B. Cohen | cs.CL | 2021-01-17 |
Zero-shot Learning by Generating Task-specific Adapters | Qinyuan Ye, Xiang Ren | cs.CL, cs.LG | 2021-01-02 |
Neural Text Generation with Artificial Negative Examples | Keisuke Shirai, Kazuma Hashimoto, Akiko Eriguchi, Takashi Ninomiya, Shinsuke Mori | cs.CL, cs.AI | 2020-12-28 |
Few-Shot Text Generation with Pattern-Exploiting Training | Timo Schick, Hinrich Schütze | cs.CL, cs.LG | 2020-12-22 |
Fork or Fail: Cycle-Consistent Training with Many-to-One Mappings | Qipeng Guo, Zhijing Jin, Ziyu Wang, Xipeng Qiu, Weinan Zhang, Jun Zhu, Zheng Zhang, David Wipf | cs.LG, cs.AI, cs.CL | 2020-12-14 |
Video Generative Adversarial Networks: A Review | Nuha Aldausari, Arcot Sowmya, Nadine Marcus, Gelareh Mohammadi | cs.CV, cs.LG, eess.IV | 2020-11-04 |
Personalized Multimodal Feedback Generation in Education | Haochen Liu, Zitao Liu, Zhongqin Wu, Jiliang Tang | cs.CL, cs.AI | 2020-10-31 |
Fusion Models for Improved Visual Captioning | Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow | cs.CV, cs.AI, cs.CL, cs.LG | 2020-10-28 |
Safe Handover in Mixed-Initiative Control for Cyber-Physical Systems | Frederik Wiehr, Anke Hirsch, Florian Daiber, Antonio Kruger, Alisa Kovtunova, Stefan Borgwardt, Ernie Chang, Vera Demberg, Marcel Steinmetz, Hoffmann Jorg | cs.HC | 2020-10-21 |
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation | Yasuhide Miura, Yuhao Zhang, Curtis P. Langlotz, Dan Jurafsky | cs.CL | 2020-10-20 |
RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling | Jun Quan, Shian Zhang, Qian Cao, Zizhong Li, Deyi Xiong | cs.CL | 2020-10-17 |
Dissecting the components and factors of Neural Text Generation | Khyathi Raghavi Chandu, Alan W Black | cs.CL | 2020-10-14 |
Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays | Jianmo Ni, Chun-Nan Hsu, Amilcare Gentili, Julian McAuley | cs.CV, cs.CL | 2020-10-06 |
Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network | Junyi Li, Siqing Li, Wayne Xin Zhao, Gaole He, Zhicheng Wei, Nicholas Jing Yuan, Ji-Rong Wen | cs.CL, cs.AI | 2020-10-04 |