In this paper, we propose an unsupervised prompt learning method to improve Generalization of Image Captioning (GeneIC), which learns a domain-specific prompt vector for the target domain without requiring annotated data by aligning visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model. Read more...
In this work, we present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator that generates high-quality videos with a wide range of motions, from stochastic dynamics to highly controllable movements. Read more...
A deep learning library to perform inference in pure C++. Models in ONNX format can be converted to a simple format compatible with the library. Read more...
With industrial partiners, we have deveoped various technologies for AI Generated Content (AIGC), including Text-Image-to-Video Generation, Image-Video-to-Text Generation, etc. Read more...
We have deveoped various technologies for AI based image and video compression, including NNVC, JPEG-AI, Video Coding for Machine, IEEE 1857.11, etc. Read more...
Versatile Video Coding (VVC), also known as ITU-T H.266, is the most recent international video-compression standard of ITU-T and ISO/IEC. Read more...
Inspired by Human Vision, we have developed computational models which have achieved superior performance in various Machine Vision tasks. Read more...