3D Gaussian Splatting for 3D reconstruction: progress and challenges

In the realm of 3D reconstruction, the concept of a radiance field has witnessed a surge in research activity in recent years, marked by the publication of numerous papers since the foundational NeRF paper in 2020. Several methods, including Instant-NGP, MeRF, Mip-NeRF, and Mobile-NeRF, have endeavored to enhance overall quality while also addressing training and rendering speed. However, achieving optimization across all these aspects simultaneously has remained a challenge. 

3D Gaussian Splatting (3DGS) is a game-changer, offering a unique combination of superior quality, real-time interactive rendering, and training speed that rivals the fastest state-of-the-art methods.  

In 3D Gaussian Splatting, you sample the volume with a 3D point cloud. During rendering, each sample is projected onto the 2D viewing plane, effectively deciding its position on the screen.  

When you splat the projected point onto the 2D viewing plane, instead of simply putting a dot there, you “spread” its value using a Gaussian curve. The center of the splat (where the original projected point was) will have the highest value (the peak of the Gaussian), and the values will taper off as you move away from the center, following the Gaussian curve.  

As you project more and more points from the 3D volume onto the 2D viewing plane, some of the Gaussian splats will overlap. When this happens, the splats are combined based on their weights (from the Gaussian function) and transparency. This amalgamation yields a smooth and faithful image.  

The 3D Gaussian Splat reconstruction (or training) process is therefore the process that generates the set of gaussians and their parameters (orientation, size, color, …). The process is iterative. It starts from an initialization point cloud in which each point in the point cloud is the center of a gaussian. During the iterative training step, the position of the gaussians is refined together with the other parameters to minimize a loss function which measures an error between rendered images and the input images. In this process a gaussian may be split into 2 gaussians, thereby adding details to the representation.  

The advantages of 3D Gaussian Splatting are so profound that this technique is, in essence, supplanting all prior NeRF-based reconstruction methods. However, numerous challenges persist and await resolution:  

  1. Precise Camera Poses: During the iterative reconstruction process, the accuracy of camera poses is critical. The loss function assesses the disparity between a rendered image and an input image. Misalignment of the poses between the two images results in incorrect loss calculations, which, in turn, propagate errors to the parameters of the Gaussians. 
  1. Simplified Representation: 3D Gaussian Splatting essentially encodes intricate scene details as geometry. This approach can lead to a substantial number of Gaussians, impacting file size and rendering speed. 
  1. Initialization and Iteration Count: The quality of the output is heavily reliant on the initial parameters and the number of iterations. Improved initialization parameters can lead to more efficient training and higher-quality results. 
  1. Edition and Post-processing: Much like in traditional photography, capturing the image is just the beginning of the process. Additional tasks such as cropping, denoising, recoloring, relighting, and more are required in the 3D context. 
  1. Motion and Dynamic Scenes: Dynamic scenes introduce a host of challenges, including the capture process, especially in monocular scenarios, the precise estimation of camera poses, maintaining temporal consistency in the representation, and managing the substantial data volume, to name a few. 

Stay tuned as progress is made on all these fronts. These challenges are the focal points of ongoing research and innovation, and as they are addressed, we can anticipate even more remarkable advancements in the realm of 3D reconstruction and visualization.