Changqian Yu

Changqian Yu

I build multimodal AI that sees, understands, and creates.

I lead the Kling-Image-Omni team at Kling AI, Kuaishou Technology, building multimodal foundation models for visual understanding and generation. My research focuses on Diffusion Models, Vision-Language Models, and making AI see, think, and create. PhD from HUST (🏆 CSIG Top-10 Dissertation Award), Stanford’s Top 2% Scientists (3 consecutive years).


Timeline

I lead the Kling-Image-Omni team at Kling AI, Kuaishou Technology, shipping products that power visual generation and understanding at scale. Key launches include Kling-Image-O1 — bringing visual reasoning to image generation — and Kling-Image 3.0 & 3.0 Omni, the latest generation of Kling AI’s omni-image foundation models.

I led multimodal AI research at Kunlun Tech (Skywork). Shipped Skywork-VL-32B, a vision-language model integrating vision encoders with large language models, and built the storyboard generation model powering cinematic shot planning in SkyReels. Also built a scalable Diffusion training pipeline (MoE) for text-to-image generation.

Research Scientist at Meituan’s Autonomous Delivery Department, developing trajectory prediction and motion planning models for the autonomous delivery fleet. The transformer-based prediction model was deployed on real vehicles serving millions of orders.

PhD at HUST, focusing on semantic and panoptic segmentation. Won 1st place in the COCO & Mapillary Panoptic Segmentation Challenge 2018 at ECCV. Built TorchSeg, a widely-used PyTorch segmentation codebase (2 000+ GitHub stars). Visiting student at the University of Adelaide. Interned at Microsoft Research Asia (Stars of Tomorrow) and Megvii (Face++) Research.

Open Source
  • VQRAE [paper] Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction.
  • SkyReels-V1 Human-Centric Video Foundation Model. 2 700+ ⭐
  • SkyReels-A1 [paper] Expressive Portrait Animation in Video Diffusion Transformers. 500+ ⭐
  • LiteHRNet [paper] A Lightweight High-Resolution Network. 900+ ⭐
  • TorchSeg PyTorch semantic segmentation codebase — BiSeNet, DFN, DenseASPP and more. 1 400+ ⭐
Selected Publications View all →
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang
arXiv arXiv Code
Scalable Diffusion Models with State Space Backbone
Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang
arXiv arXiv Code
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu, Changxin Gao†, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
IJCV ESI Highly Cited Paper 1900+ citations Paper arXiv Code
Lite-HRNet: A Lightweight High-Resolution Network
Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang
CVPR 600+ citations Paper arXiv Code
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
Changqian Yu*, Jingbo Wang*, Chao Peng, Changxin Gao†, Gang Yu, Nong Sang
ECCV ECCV 2018 Top-10 Influential Papers 3700+ citations Paper arXiv Code
Learning a Discriminative Feature Network for Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
CVPR 1000+ citations Paper arXiv Code
Latest Posts View all →
Feb 26, 2026 General

Hello World: A New Beginning

Welcome to my new blog. I'll be sharing thoughts on AI research, engineering insights, and lessons learned from building large-scale models.

News