In this paper, we propose an unsupervised prompt learning method to improve Generalization of Image Captioning (GeneIC), which learns a domain-specific prompt vector for the target domain without requiring annotated data by aligning visual and language modalities with a pre-trained Contrastive Language-Image Pre-Training (CLIP) model. Read more...
In this work, we present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator that generates high-quality videos with a wide range of motions, from stochastic dynamics to highly controllable movements. Read more...
A deep learning library to perform inference in pure C++. Models in ONNX format can be converted to a simple format compatible with the library. Read more...
Inspired by Human Vision, we have developed computational models which have achieved superior performance in various Machine Vision tasks. Read more...
With industrial partiners, we have deveoped various technologies for AI Generated Content (AIGC), including Text-Image-to-Video Generation, Image-Video-to-Text Generation, etc. Read more...
We have deveoped various technologies for AI based image and video compression, including NNVC, JPEG-AI, Video Coding for Machine, IEEE 1857.11, etc. Read more...
Versatile Video Coding (VVC), also known as ITU-T H.266, is the most recent international video-compression standard of ITU-T and ISO/IEC. Read more...