AICoE Project

  • Learning from Cross-Modality Data for Image Semantics Understanding, Description, Synthesis, & Manipulation

Project Name

Learning from Cross-Modality Data for Image Semantics Understanding, Description, Synthesis, & Manipulation

Project Goal

• Learning from data across modalities such as image and text typically requires proper data and label supervision. How to learn AI models from cross-modality data without observing such supervision would be the challenging yet practical problem to tackle.

• As a core technology project, we target at four different and mutually related vision-and-language tasks in this project: novel image captioning, unsupervised image manipulation, image scene graph understanding & completion, and semantics-guided image completion.


Project Description

1.Scene Graph (SG) Expansion

*Unknown semantic inference

*Self-attention and masked language modelling

2.Semantics-Guided Image Completion

*Scene graph to layout and layout to image

*Conditional graph convolutional network (GCN)

3.Novel Object Captioning

*Pseudo-caption and Self-retrieval Cycle Consistency

*Self-attention

4.Text-to-Image Manipulation

*Learning how to modified images

*Learning where to modified images