DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

Abstract

We present DreamScene4D, the first method capable of lifting multi-object monocular videos to 4D using dynamic Gaussian Splatting. It can handle large and complex motions observed in challenging real-life videos, thanks to object-scene decomposition and a motion factorization scheme.

DreamScene4D can generate arbitrary novel views for dynamic multi-object scenes across occlusions, as well as enable 2D point motion tracking by projecting the inferred 3D Gaussian trajectories to 2D, while never explicitly trained to do so.

Arbitrary View Synthesis

DreamScene4D can synthesize views from arbitrary camera poses at any given timestep. We present some examples where we render the scene using the reference camera (outlined in blue) and an orbital camera that changes the azimuth and elevation angle (outlined in orange). The original input video is presented in the leftmost column for reference.

Input Video

Original View

Novel View 1

Novel View 2

Gaussian Motion Trajectories

We visualize the 3D Gaussian trajectories from DreamScene4D corresponding to the pixels rendered from camera views by projecting them to 2D. We show that the Gaussian deformations provide reasonable 2D point motion trajectories in both the original view (outlined in blue) and in novel unseen views (outlined in orange). The rendered 3D Gaussians are selected independently per view.

Original View

Novel View 1

Novel View 2

Qualitative Comparisons

We show some comparisons between DreamScene4D and other video-to-4d baselines. Our method can handle challenging videos with multiple objects undergoing large and complex motion. The original input video is presented in the leftmost column for reference.

Input Video

Consistent4D

DreamGaussian4D

DreamScene4D (Ours)

@article{dreamscene4d, title={DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos}, author={Chu, Wen-Hsuan and Ke, Lei and Fragkiadaki, Katerina}, journal={NeurIPS}, year={2024} }

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

DreamScene4D can produce dynamic scene-level Gaussian representations (bottom row) from monocular
videos (top row) with multiple objects undergoing large and complex motions.

Abstract

Video

Arbitrary View Synthesis

Gaussian Motion Trajectories

Qualitative Comparisons

BibTeX

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

DreamScene4D can produce dynamic scene-level Gaussian representations (bottom row) from monocular videos (top row) with multiple objects undergoing large and complex motions.

Abstract

Video

Arbitrary View Synthesis

Gaussian Motion Trajectories

Qualitative Comparisons

BibTeX

DreamScene4D can produce dynamic scene-level Gaussian representations (bottom row) from monocular
videos (top row) with multiple objects undergoing large and complex motions.