Qihang ZHANG

Qihang Zhang is a final-year Ph.D. student at Multimedia Lab (MMLab), The Chinese University of Hong Kong, advised by Bolei Zhou and Dahua Lin. He is currently a visiting student at Stanford advised by Gordon Wetzstein.

His research interests focus on generative models, particularly in the 3D and video domains. He has been fortunate to gain extensive industry experience, including research internship at Apple MLR, Snap Research, Shanghai AI Lab, and SenseTime Research.

I will graduate in 2025’s summer. Feel free to reach out if you have suitable openings.

News

Feb 25, 2025	WVD (video and 3D joint foundation model) is accepted to CVPR25 as a Highlight.
Jan 22, 2025	3DitScene (3D-aware image editing) and DART (diffusion + auto regressive model) are accepted to ICLR 25.
May 13, 2024	I start my intern at Apple! See you in New York City. 🗽
Feb 27, 2024	Two papers (BerfScene, SceneWiz3D) focused on 3D scene generation are accepted to CVPR24.
Sep 30, 2023	One paper on GAN’s architecture is accepted to NeurIPS23.
Jun 30, 2023	One paper on 3D-aware pretraining is accepted to ICCV23.
Jun 18, 2023	I start my intern at Snap Inc.! See you in Los Angeles.🌴
Jan 21, 2023	Our paper on video generation (StyleSV) is accepted to ICLR 23. 🎞
Jul 5, 2022	Our paper on policy pretraining and visuomotor policy learning (ACO) is accepted to ECCV 22. 🛣

Selected Publications

Report

Causal World Modeling for Robot Control

Lin Li, Qihang Zhang, Yiming Luo, Shuai Yang, Ruilin Wang, Fei Han, Mingrui Yu, Zelin Gao, Nan Xue, Xing Zhu, Yujun Shen, and Yinghao Xu

Report (Report) , 2026

PDF Website
CVPR Highlight

World-consistent Video Diffusion with Explicit 3D Modeling

Qihang Zhang, Shuangfei Zhai, Miguel Angel Bautista Martin, Kevin Miao, Alexander Toshev, Josh Susskind, and Jiatao Gu

Computer Vision and Pattern Recognition (CVPR Highlight) , 2025

PDF Website
ICLR

Dart: Denoising autoregressive transformer for scalable text-to-image generation

Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, and Shuangfei Zhai

International Conference on Learning Representations (ICLR) , 2025

PDF
ICLR

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, and Ceyuan Yang

International Conference on Learning Representations (ICLR) , 2025

PDF Code Website
CVPR

SceneWiz3D: Towards Text-guided 3D Scene Composition

Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, and Hsin-Ying Lee

Computer Vision and Pattern Recognition (CVPR) , 2024

PDF Code Website
CVPR

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Qihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, and Ceyuan Yang

Computer Vision and Pattern Recognition (CVPR) , 2024

PDF Code Website
NeurIPS

Learning Modulated Transformation in GANs

Ceyuan Yang, Qihang Zhang, Yinghao Xu, Jiapeng Zhu, Yujun Shen, and Bo Dai

Neural Information Processing Systems (NeurIPS) , 2023

PDF Code
ICLR

Towards Smooth Video Composition

Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, and Bolei Zhou

International Conference on Learning Representations (ICLR) , 2023

PDF Code Website
ECCV

Learn to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining

Qihang Zhang, Zhenghao Peng, and Bolei Zhou

European Conference on Computer Vision (ECCV) , 2022

PDF Code Video Website
TPAMI

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Learning

Quanyi Li*, Zhenghao Peng*, Lan Feng, Qihang Zhang, Zhenghai Xue, and Bolei Zhou

In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , 2022

PDF Code Video Website
IEEE TIP

F^3A-GAN: Facial Flow for Face Animation With Generative Adversarial Networks

Xintian Wu, Qihang Zhang, Yiming Wu, Huanyu Wang, Songyuan Li, Lingyun Sun, and Xi Li

IEEE Transactions on Image Processing (IEEE TIP) , 2021

PDF