BerfScene: Bev-conditioned Equivariant Radiance Fields
for Infinite 3D Scene Generation


Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis technique since 3D scenes usually hold complex spatial configurations and consist of a number of objects at varying scales. We thus propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view (BEV) map. Concretely, objects of synthesized 3D scenes could be easily manipulated through steering the corresponding BEV maps. Moreover, by adequately incorporating positional encoding and low-pass filters into the generator, the representation becomes equivariant to the given BEV map. Such equivariance allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Local Scene Synthesis

Our method can synthesize high-quality local 3D scenes:

Global Scene Synthesis

Beyond local 3D scenes, our method can synthesize large-scale global 3D scenes:


Scene Editting

As our 3D representation is conditioned on Bird-Eye-View (BEV) map, we can edit the scene flexibiliy:



As BEV can be easily modified, we can easily enable user participation in the generation process through a user-friendly UI:


Real-time editting on CLEVR


              author = {Qihang Zhang and Yinghao Xu and Yujun Shen and Bo Dai and Bolei Zhou and Ceyuan Yang},
              title = {{BerfScene}: Generative Novel View Synthesis with {3D}-Aware Diffusion Models},
              booktitle = {arXiv},
              year = {2023}


