Changqian Yu
I build multimodal AI that sees, understands, and creates.
I lead Kling Image at Kuaishou Technology, overseeing all image foundation models powering Kling AI’s visual generation and understanding products. My research focuses on Diffusion Models, Vision-Language Models, and making AI see, think, and create. PhD from HUST (🏆 CSIG Top-10 Dissertation Award), Stanford’s Top 2% Scientists (3 consecutive years).
As Director of Kling Image, I oversee all image foundation models powering visual generation and understanding at scale. Current focus: unified multimodal models that bridge understanding and generation in a single architecture.
Led the Kling-Image-Omni team, shipping products that power visual generation and understanding at scale. Key launches include Kling-Image-O1 — bringing visual reasoning to image generation — and Kling-Image 3.0 & 3.0 Omni, the latest generation of Kling AI’s omni-image foundation models.
Led multimodal AI research. Shipped Skywork-VL-32B, a vision-language model integrating vision encoders with large language models, and built the storyboard generation model powering cinematic shot planning in SkyReels. Also built a scalable Diffusion training pipeline (MoE) for text-to-image generation.
Research Scientist at the Autonomous Delivery Department, developing trajectory prediction and motion planning models for the autonomous delivery fleet. The transformer-based prediction model was deployed on real vehicles serving millions of orders.
PhD focusing on semantic and panoptic segmentation. Won 1st place in the COCO & Mapillary Panoptic Segmentation Challenge 2018 at ECCV. Built TorchSeg, a widely-used PyTorch segmentation codebase (2 000+ GitHub stars). Visiting student at the University of Adelaide. Interned at Microsoft Research Asia (Stars of Tomorrow) and Megvii (Face++) Research.
We are building unified multimodal models that bridge visual understanding and generation, and extending toward Visual Agentic Intelligence — models that perceive, reason, and act in the visual world.
- Unified Understanding & Generation
- Visual Tokenizer & Representation Learning
- Visual Agentic Intelligence
- Multimodal Data & Infrastructure
We are continuously seeking outstanding talents to join us. Feel free to reach out! yuchangqian@kuaishou.com
- VQRAE [paper] — Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction.
- SkyReels-V1 — Human-Centric Video Foundation Model. 2 700+ ⭐
- SkyReels-A1 [paper] — Expressive Portrait Animation in Video Diffusion Transformers. 500+ ⭐
- LiteHRNet [paper] — A Lightweight High-Resolution Network. 900+ ⭐
- TorchSeg — PyTorch semantic segmentation codebase — BiSeNet, DFN, DenseASPP and more. 1 400+ ⭐
-
2026
Paper GCPO (Group Chunking Policy Optimization) accepted at ICML 2026 on May 1.
-
2026
Paper CoTyle accepted as an Oral presentation at CVPR 2026.
-
2026
Paper VQRAE accepted as a Poster presentation at CVPR 2026.
-
2025
Received the Second Prize in Natural Science of the Science and Technology Award from the Chinese Institute of Electronics (CIE).