Changqian Yu

Changqian Yu

I build multimodal AI that sees, understands, and creates.

I lead Kling Image at Kuaishou Technology, overseeing all image foundation models powering Kling AI’s visual generation and understanding products. My research focuses on Diffusion Models, Vision-Language Models, and making AI see, think, and create. PhD from HUST (🏆 CSIG Top-10 Dissertation Award), Stanford’s Top 2% Scientists (3 consecutive years).


Timeline
Kling AI, Kuaishou Technology 2026.04 – Present
Director

As Director of Kling Image, I oversee all image foundation models powering visual generation and understanding at scale. Current focus: unified multimodal models that bridge understanding and generation in a single architecture.

Head of Algorithms

Led the Kling-Image-Omni team, shipping products that power visual generation and understanding at scale. Key launches include Kling-Image-O1 — bringing visual reasoning to image generation — and Kling-Image 3.0 & 3.0 Omni, the latest generation of Kling AI’s omni-image foundation models.

Kunlun Tech (Skywork) 2023 – 2025

Led multimodal AI research. Shipped Skywork-VL-32B, a vision-language model integrating vision encoders with large language models, and built the storyboard generation model powering cinematic shot planning in SkyReels. Also built a scalable Diffusion training pipeline (MoE) for text-to-image generation.

Meituan 2021 – 2023

Research Scientist at the Autonomous Delivery Department, developing trajectory prediction and motion planning models for the autonomous delivery fleet. The transformer-based prediction model was deployed on real vehicles serving millions of orders.

Huazhong University of Science and Technology 2016 – 2021

PhD focusing on semantic and panoptic segmentation. Won 1st place in the COCO & Mapillary Panoptic Segmentation Challenge 2018 at ECCV. Built TorchSeg, a widely-used PyTorch segmentation codebase (2 000+ GitHub stars). Visiting student at the University of Adelaide. Interned at Microsoft Research Asia (Stars of Tomorrow) and Megvii (Face++) Research.

We're Hiring

We are building unified multimodal models that bridge visual understanding and generation, and extending toward Visual Agentic Intelligence — models that perceive, reason, and act in the visual world.

  • Unified Understanding & Generation
  • Visual Tokenizer & Representation Learning
  • Visual Agentic Intelligence
  • Multimodal Data & Infrastructure

We are continuously seeking outstanding talents to join us. Feel free to reach out! yuchangqian@kuaishou.com

Open Source
  • VQRAE [paper] Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction.
  • SkyReels-V1 Human-Centric Video Foundation Model. 2 700+ ⭐
  • SkyReels-A1 [paper] Expressive Portrait Animation in Video Diffusion Transformers. 500+ ⭐
  • LiteHRNet [paper] A Lightweight High-Resolution Network. 900+ ⭐
  • TorchSeg PyTorch semantic segmentation codebase — BiSeNet, DFN, DenseASPP and more. 1 400+ ⭐
Selected Publications View all →
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang
arXiv arXiv Code
Scalable Diffusion Models with State Space Backbone
Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang
arXiv 49 citations arXiv Code
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
Changqian Yu, Changxin Gao†, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
IJCV ESI Highly Cited Paper 1560 citations Paper arXiv Code
Lite-HRNet: A Lightweight High-Resolution Network
Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang
CVPR 401 citations Paper arXiv Code
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
Changqian Yu*, Jingbo Wang*, Chao Peng, Changxin Gao†, Gang Yu, Nong Sang
ECCV ECCV 2018 Top-10 Influential Papers 2629 citations Paper arXiv Code
Learning a Discriminative Feature Network for Semantic Segmentation
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
CVPR 812 citations Paper arXiv Code
Latest Posts View all →
Feb 26, 2026 General

Hello World: A New Beginning

Welcome to my new blog. I'll be sharing thoughts on AI research, engineering insights, and lessons learned from building large-scale models.

News