Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis technique since 3D scenes usually hold complex spatial configurations and consist of a number of objects at varying scales. We thus propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view (BEV) map. Concretely, objects of synthesized 3D scenes could be easily manipulated through steering the corresponding BEV maps. Moreover, by adequately incorporating positional encoding and low-pass filters into the generator, the representation becomes equivariant to the given BEV map. Such equivariance allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.
Our method can synthesize high-quality local 3D scenes:
Beyond local 3D scenes, our method can synthesize large-scale global 3D scenes:
As our 3D representation is conditioned on Bird-Eye-View (BEV) map, we can edit the scene flexibiliy:
@inproceedings{zhang2023berfscene,
author = {Qihang Zhang and Yinghao Xu and Yujun Shen and Bo Dai and Bolei Zhou and Ceyuan Yang},
title = {{BerfScene}: Generative Novel View Synthesis with {3D}-Aware Diffusion Models},
booktitle = {CVPR},
year = {2024}
}
We borrow the source code of this website from GeNVS.