I'm not familiar at all with the topic (nor have I read this particular paper) but I remember that the original 3DGS paper took pride in the fact that this was not “IA” or “deep learning”. There's still a gradient descent process to get the Gaussian splats from the data, but as I understood it, there is no “training on a large dataset then inference”, building the GS from your data is the “training phase” and then rendering it is the equivalent of inference.
Maybe I understood it all wrong though, or maybe new variants of Gaussian splatting use a deep learning network in addition to what was done in the original work, so I'll be happy to be corrected/clarified by someone with actual knowledge here.
The output of Gaussian Splat "training" is a set of 3d gaussians, which can be rendered very quickly. No ML involved at all (only optimisation)!
They usually require running COLMAP first (to get the relative location of camera between different images), but NVIDIA's InstantSplat doesn't (it however does use a ML model instead!)
We’ve been using pretty similar technology for decades in areas like Renderman radiance caches before RIS.
Every gaussian splat repo I have looked at doesn't mention how to use the pre-trained models to "simply" take MY images as input and output a GS. They all talk about evaluation, but the CMD interface requires the eval datasets as input.
Is training/fine-tuning on my data the only way to get the output?