DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

Carnegie Mellon University
*Equal Contribution

DreamScene4D can produce dynamic scene-level Gaussian representations (bottom row) from monocular
videos (top row) with multiple objects undergoing large and complex motions.

Abstract

We present DreamScene4D, the first method capable of lifting multi-object monocular videos to 4D using dynamic Gaussian Splatting. It can handle large and complex motions observed in challenging real-life videos, thanks to object-scene decomposition and a motion factorization scheme.

DreamScene4D can generate arbitrary novel views for dynamic multi-object scenes across occlusions, as well as enable 2D point motion tracking by projecting the inferred 3D Gaussian trajectories to 2D, while never explicitly trained to do so.


(a) We decompose and amodally complete objects and the background in the video, then use DreamGaussian to obtain static 3D Gaussian representations. (b) Next, we factorize the object motion into multiple components and optimize them independently. (c) Finally, we re-compose the objects using monocular depth prediction guidance.

Arbitrary View Synthesis

DreamScene4D can synthesize views from arbitrary camera poses at any given timestep. We present some examples where we render the scene using the reference camera (outlined in blue) and an orbital camera that changes the azimuth and elevation angle (outlined in orange). The original input video is presented in the leftmost column for reference.

Input Video Original View Novel View 1 Novel View 2

Gaussian Motion Trajectories

We visualize the 3D Gaussian trajectories from DreamScene4D corresponding to the pixels rendered from camera views by projecting them to 2D. We show that the Gaussian deformations provide reasonable 2D point motion trajectories in both the original view (outlined in blue) and in novel unseen views (outlined in orange). The rendered 3D Gaussians are selected independently per view.

Original View Novel View 1 Novel View 2

Qualitative Comparisons

We show some comparisons between DreamScene4D and other video-to-4d baselines. Our method can handle challenging videos with multiple objects undergoing large and complex motion. The original input video is presented in the leftmost column for reference.

Input Video Consistent4D DreamGaussian4D DreamScene4D (Ours)

BibTeX

@article{dreamscene4d,
  title={DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos},
  author={Chu, Wen-Hsuan and Ke, Lei and Fragkiadaki, Katerina},
  journal={arXiv preprint arXiv:2405.02280},
  year={2024}
}