Publications

* equal contribution · † project lead

2026
RepWAM: World Action Modeling with Representation Visual-Action Tokenizers
RepWAM: World Action Modeling with Representation Visual-Action Tokenizers Junke Wang, Qihang Zhang, Shuai Yang, Yiming Luo, Yujun Shen, Zuxuan Wu, Yu-Gang Jiang, Yinghao Xu Preprintpaperwebsite
Next Forcing: Causal World Modeling with Multi-Chunk Prediction
Next Forcing: Causal World Modeling with Multi-Chunk Prediction Gangwei Xu, Qihang Zhang†, Jiaming Zhou, Xing Zhu, Yujun Shen, Xin Yang, Yinghao Xu Preprintpaperwebsite
Causal World Modeling for Robot Control
Causal World Modeling for Robot Control Lin Li*, Qihang Zhang*†, Yiming Luo*, Shuai Yang, Ruilin Wang, Luyao Zhang, Mingrui Yu, Zelin Gao, Nan Xue, Boyu Zhou, Xing Zhu, Mingyu Ding, Yujun Shen, Yinghao Xu RSS 2026paperwebsite
2025
World-consistent Video Diffusion with Explicit 3D Modeling
World-consistent Video Diffusion with Explicit 3D Modeling Qihang Zhang, Shuangfei Zhai, Miguel Angel Bautista Martin, Kevin Miao, Alexander Toshev, Josh Susskind, Jiatao Gu CVPR 2025, Highlightpaperwebsite
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang ICLR 2025paperwebsitecode
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, Shuangfei Zhai ICLR 2025paper
2024
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models Kevin Miao, Harsh Agrawal, Qihang Zhang, Federico Semeraro, Marco Cavallo, Jiatao Gu, Alexander Toshev Preprintpaper
Urban Scene Diffusion through Semantic Occupancy Map
Urban Scene Diffusion through Semantic Occupancy Map Junge Zhang, Qihang Zhang, Li Zhang, Ramana Rao Kompella, Gaowen Liu, Bolei Zhou Preprintpaperwebsite
SceneWiz3D: Towards Text-guided 3D Scene Composition
SceneWiz3D: Towards Text-guided 3D Scene Composition Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee CVPR 2024paperwebsitecode
BerfScene: BEV-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
BerfScene: BEV-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation Qihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, Ceyuan Yang CVPR 2024paperwebsitecode
2023
Learning Modulated Transformation in GANs
Learning Modulated Transformation in GANs Ceyuan Yang, Qihang Zhang, Yinghao Xu, Jiapeng Zhu, Yujun Shen, Bo Dai NeurIPS 2023papercode
Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, Hongsheng Li ICCV 2023papercode
Towards Smooth Video Composition
Towards Smooth Video Composition Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou ICLR 2023paperwebsitecode
2022
Generative Category-Level Shape and Pose Estimation with Semantic Primitives
Generative Category-Level Shape and Pose Estimation with Semantic Primitives Guanglin Li, Yifeng Li, Zhichao Ye, Qihang Zhang, Tao Kong, Zhaopeng Cui, Guofeng Zhang CoRL 2022paperwebsitecode
Learn to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining
Learn to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining Qihang Zhang, Zhenghao Peng, Bolei Zhou ECCV 2022paperwebsitecode
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Learning
MetaDrive: Composing Diverse Driving Scenarios for Generalizable Learning Quanyi Li*, Zhenghao Peng*, Lan Feng, Qihang Zhang, Zhenghai Xue, Bolei Zhou TPAMI 2022paperwebsitecode
2021
F³A-GAN: Facial Flow for Face Animation with Generative Adversarial Networks
F³A-GAN: Facial Flow for Face Animation with Generative Adversarial Networks Xintian Wu, Qihang Zhang, Yiming Wu, Huanyu Wang, Songyuan Li, Lingyun Sun, Xi Li IEEE TIP 2021paper
Improving the Generalization of End-to-End Driving through Procedural Generation
Improving the Generalization of End-to-End Driving through Procedural Generation Quanyi Li*, Zhenghao Peng*, Qihang Zhang, Chunxiao Liu, Bolei Zhou CVPR 2021, Embodied AI Workshoppaper