Changqian Yu

I build multimodal AI that sees, understands, and creates.

I lead Kling Image at Kuaishou Technology, overseeing all image foundation models powering Kling AI’s visual generation and understanding products. My research focuses on Diffusion Models, Vision-Language Models, and making AI see, think, and create. PhD from HUST (🏆 CSIG Top-10 Dissertation Award), Stanford’s Top 2% Scientists (3 consecutive years).

Timeline

As Director of Kling Image, I oversee all image foundation models powering visual generation and understanding at scale. Current focus: unified multimodal models that bridge understanding and generation in a single architecture.

Led the Kling-Image-Omni team, shipping products that power visual generation and understanding at scale. Key launches include Kling-Image-O1 — bringing visual reasoning to image generation — and Kling-Image 3.0 & 3.0 Omni, the latest generation of Kling AI’s omni-image foundation models.

Led multimodal AI research. Shipped Skywork-VL-32B, a vision-language model integrating vision encoders with large language models, and built the storyboard generation model powering cinematic shot planning in SkyReels. Also built a scalable Diffusion training pipeline (MoE) for text-to-image generation.

Research Scientist at the Autonomous Delivery Department, developing trajectory prediction and motion planning models for the autonomous delivery fleet. The transformer-based prediction model was deployed on real vehicles serving millions of orders.

PhD focusing on semantic and panoptic segmentation. Won 1st place in the COCO & Mapillary Panoptic Segmentation Challenge 2018 at ECCV. Built TorchSeg, a widely-used PyTorch segmentation codebase (2 000+ GitHub stars). Visiting student at the University of Adelaide. Interned at Microsoft Research Asia (Stars of Tomorrow) and Megvii (Face++) Research.

We're Hiring

We are building unified multimodal models that bridge visual understanding and generation, and extending toward Visual Agentic Intelligence — models that perceive, reason, and act in the visual world.

Unified Understanding & Generation
Visual Tokenizer & Representation Learning
Visual Agentic Intelligence
Multimodal Data & Infrastructure

We are continuously seeking outstanding talents to join us. Feel free to reach out! yuchangqian@kuaishou.com

Open Source

CoTyle [demo] — CVPR 2026 Award Candidate. Open-source code-to-style image generation with a discrete style space.
VQRAE [paper] — Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction.
SkyReels-V1 — Human-Centric Video Foundation Model. 2 700+ ⭐
SkyReels-A1 [paper] — Expressive Portrait Animation in Video Diffusion Transformers. 500+ ⭐
LiteHRNet [paper] — A Lightweight High-Resolution Network. 900+ ⭐
TorchSeg — PyTorch semantic segmentation codebase — BiSeNet, DFN, DenseASPP and more. 1 400+ ⭐

Selected Publications View all →

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Huijie Liu, Shuhao Cui, Haoxiang Cao, Shuai Ma, Kai Wu†, Guoliang Kang†

CVPR CVPR 2026 Award Candidate Paper arXiv Code Demo

Scaling Diffusion Transformers to 16 Billion Parameters

Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang

arXiv 49 citations arXiv Code

Scalable Diffusion Models with State Space Backbone

Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

arXiv 49 citations arXiv Code

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

Changqian Yu, Changxin Gao†, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

IJCV ESI Highly Cited Paper 1592 citations Paper arXiv Code

Lite-HRNet: A Lightweight High-Resolution Network

Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang

CVPR 405 citations Paper arXiv Code

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Changqian Yu*, Jingbo Wang*, Chao Peng, Changxin Gao†, Gang Yu, Nong Sang

ECCV ECCV 2018 Top-10 Influential Papers 2670 citations Paper arXiv Code

Hello World: A New Beginning

Welcome to my new blog. I'll be sharing thoughts on AI research, engineering insights, and lessons learned from building large-scale models.

News

2026
Paper GCPO (Group Chunking Policy Optimization) accepted at ICML 2026 on May 1.
2026
Paper CoTyle was selected as a CVPR 2026 Award Candidate. Code · Demo · Kling official announcement.
2026
Paper VQRAE accepted as a Poster presentation at CVPR 2026.
2025
Received the Second Prize in Natural Science of the Science and Technology Award from the Chinese Institute of Electronics (CIE).