Publications

2025

CVPR Highlight

World-consistent Video Diffusion with Explicit 3D Modeling

Qihang Zhang, Shuangfei Zhai, Miguel Angel Bautista Martin, Kevin Miao, Alexander Toshev, Josh Susskind, and Jiatao Gu

Computer Vision and Pattern Recognition (CVPR Highlight) , 2025

PDF Website
ICLR

Dart: Denoising autoregressive transformer for scalable text-to-image generation

Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, and Shuangfei Zhai

International Conference on Learning Representations (ICLR) , 2025

PDF
ICLR

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, and Ceyuan Yang

International Conference on Learning Representations (ICLR) , 2025

PDF Code Website

2024

Preprint

DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models

Kevin Miao, Harsh Agrawal, Qihang Zhang, Federico Semeraro, Marco Cavallo, Jiatao Gu, and Alexander Toshev

Preprint (Preprint) , 2024

PDF
Preprint

Urban Scene Diffusion through Semantic Occupancy Map

Junge Zhang, Qihang Zhang, Li Zhang, Ramana Rao Kompella, Gaowen Liu, and Bolei Zhou

Preprint (Preprint) , 2024

PDF Website
CVPR

SceneWiz3D: Towards Text-guided 3D Scene Composition

Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, and Hsin-Ying Lee

Computer Vision and Pattern Recognition (CVPR) , 2024

PDF Code Website
CVPR

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Qihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, and Ceyuan Yang

Computer Vision and Pattern Recognition (CVPR) , 2024

PDF Code Website

2023

NeurIPS

Learning Modulated Transformation in GANs

Ceyuan Yang, Qihang Zhang, Yinghao Xu, Jiapeng Zhu, Yujun Shen, and Bo Dai

Neural Information Processing Systems (NeurIPS) , 2023

PDF Code
ICCV

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, and Hongsheng Li

International Conference on Computer Vision (ICCV) , 2023

PDF Code
ICLR

Towards Smooth Video Composition

Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, and Bolei Zhou

International Conference on Learning Representations (ICLR) , 2023

PDF Code Website

2022

CORL

Generative Category-Level Shape and Pose Estimation with Semantic Primitives

Guanglin Li, Yifeng Li, Zhichao Ye, Qihang Zhang, Tao Kong, Zhaopeng Cui, and Guofeng Zhang

Conference on Robotics Learning (CORL) , 2022

PDF Code Website
ECCV

Learn to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining

Qihang Zhang, Zhenghao Peng, and Bolei Zhou

European Conference on Computer Vision (ECCV) , 2022

PDF Code Video Website
TPAMI

MetaDrive: Composing Diverse Driving Scenarios for Generalizable Learning

Quanyi Li*, Zhenghao Peng*, Lan Feng, Qihang Zhang, Zhenghai Xue, and Bolei Zhou

In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , 2022

PDF Code Video Website

2021

IEEE TIP

F^3A-GAN: Facial Flow for Face Animation With Generative Adversarial Networks

Xintian Wu, Qihang Zhang, Yiming Wu, Huanyu Wang, Songyuan Li, Lingyun Sun, and Xi Li

IEEE Transactions on Image Processing (IEEE TIP) , 2021

PDF
CVPR workshop

Improving the Generalization of End-to-End Driving through Procedural Generation

Quanyi Li*, Zhenghao Peng*, Qihang Zhang, Chunxiao Liu, and Bolei Zhou

In CVPR embodied AI workshop (CVPR workshop) , 2021

PDF Poster