3DitScene

Method

Optimization Framework: Given input view, we initialize 3DGS by lifting pixels to 3D space and expand it over novel views by RGB and depth inpainting. Then, we distill semantic features into 3D Gaussians to achieve object-level disentanglement.

Inference Framework: User can query object of interest via language prompt. Enabled by the disentangled 3D representation, user can change camera viewpoint, and manipulate the object of interest in a flexible manner.

We propose 3DitScene, a novel and unified scene editing framework leveraging language-guided disentangled Gaussian Splatting that enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects. We first incorporate 3D Gaussians that are refined through generative priors and optimization techniques. Language features from CLIP then introduce semantics into 3D geometry for object disentanglement. With the disentangled Gaussians, our method allows for manipulation at both the global and individual levels, revolutionizing creative expression and empowering control over scenes and objects. Experimental results demonstrate the effectiveness and versatility of 3DitScene in scene image editing.

Semantic Feature Visualization

In our 3DGS-based scene representation, each Gaussian includes a semantic embedding. Integrating semantics into 3DGS helps disentangle the overall scene, providing precise control over local objects.

More Samples: Camera Control

Camera control over various scenes.

More Samples: Local Object Control

Various control over the boy in the scene.

More Samples: Local Object Editing + Camera Control

Move the bear, and control the camera.

Move / remove the person, and control the camera.

Remove the headscarf / hat, and control the camera.

Rotate the Samoye, and control the camera.

Rotate / move / remove the bear, and control the camera.

Remove the sheep, and control the camera.

Citation

@inproceedings{zhang20243DitScene,
              author = {Qihang Zhang and Yinghao Xu and Chaoyang Wang and Hsin-Ying Lee and Gordon Wetzstein and Bolei Zhou and Ceyuan Yang},
              title = {{3DitScene}: Editing Any Scene via Language-guided Disentangled Gaussian Splatting},
              booktitle = {arXiv},
              year = {2024}
            }

Acknowledgments

We borrow the source code of this website from GeNVS.

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting