BerfScene: Bev-conditioned Equivariant Radiance Fields
for Infinite 3D Scene Generation

Abstract

Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis technique since 3D scenes usually hold complex spatial configurations and consist of a number of objects at varying scales. We thus propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view (BEV) map. Concretely, objects of synthesized 3D scenes could be easily manipulated through steering the corresponding BEV maps. Moreover, by adequately incorporating positional encoding and low-pass filters into the generator, the representation becomes equivariant to the given BEV map. Such equivariance allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Local Scene Synthesis

Our method can synthesize high-quality local 3D scenes:



Global Scene Synthesis

Beyond local 3D scenes, our method can synthesize large-scale global 3D scenes:

 

Scene Editting

As our 3D representation is conditioned on Bird-Eye-View (BEV) map, we can edit the scene flexibiliy:

 

 

As BEV can be easily modified, we can easily enable user participation in the generation process through a user-friendly UI:

 

Real-time editting on CLEVR

Citation

@inproceedings{zhang2023berfscene,
              author = {Qihang Zhang and Yinghao Xu and Yujun Shen and Bo Dai and Bolei Zhou and Ceyuan Yang},
              title = {{BerfScene}: Generative Novel View Synthesis with {3D}-Aware Diffusion Models},
              booktitle = {CVPR},
              year = {2024}
            }

Acknowledgments

We borrow the source code of this website from GeNVS.