Preferences

My friend doesn't quite grasp this yet, can someone explain? Is the reconstructed detail all "real" and extracted from the blurred input, or is there some model at work here, filling in the image with plausible details, but basically making up stuff that was not really there to start with?

That's accurate. What's worth nothing though is that everything we 'see' with our own eyes is constructed from sampling our environment. The image we construct is what we expected to see given the sample data. This is one reason why eyewitness testimony can be vivid and false without any foul play.
No it does not "make up things" using generative AI. Current GS implementations assume camera poses are static. This paper assigns a linear motion trajectory to camera during training.
So can it handle when both camera and multiple objects in scene are moving in different trajectories?
Not with traditional 3D Gaussian splatting, but it is potentially possible to separate the time axis and do a 4D Gaussian splatting with some regularization to accommodate dynamic scenes.

Here's some early work in this area which seems promising: https://guanjunwu.github.io/4dgs/

I skimmed the Overview and am not an expert.

It seems to me they don't use any ML at all. They use backpropagation to jointly optimise the entire physics/motion model, which models camera motion and the generated blurry images (they generate multiple images for each camera frame along the path of motion of the camera, and then merge them, simulating motion blur)

It is ML in the sense of optimizing a nonconvex loss function over a dataset. It is not a fancy diffusion model or even a generative model, but it is no less a machine learning problem.
“Not ML” as in “not learning from data to apply in new situations” but rather they do “mathematical optimisation”.

The data they optimise over is just the images of the current camera trajectory (as far as I understand)

Gaussian Splatting creates an "approximation" of a 3D scene (captured from a video) using hundreds of thousands (or even millions) of tiny gaussian clouds. Each gaussian might be as small as a couple of pixels, and all these 3D gaussians get projected onto the 2D image plane (fast in GPU) to realize a single image (i.e. a single pose of the video camera). These gaussians are in 3D, so they explicitly represent the scene geometry e.g. real physical surfaces, and an approximation of physical textures. When a camera blurs an image, the physical surface / object gets blurred across many pixels. But if you can reconstruct the 3D scene accurately, then you can re-project the 3D gaussians into 2D images that end up not blurry. Another way to view the OP is that this technique is a tweak to the "sharp images only" Gaussian Splatting work from last year to deal with blurry images.

The OP paper is cool but isn't alone, here's some concurrent work: https://github.com/SpectacularAI/3dgs-deblur

Also related from a couple years ago, using NeRF methods (another area of current 3D research) to denoise night images and recover HDR: https://bmild.github.io/rawnerf/ NeRF, like Gaussian Splatting, seeks to reconstruct the scene in 3D, and RawNeRF adapts the approach to deal with noisy images as well as large exposure variation.

In terms of Gaussian Splats vs GenAI, usually GenAI models have been trained on a prior of millions of images so that they can impute / inference some part of the 3D scene or some part of the input images. However Gaussian Splats (and NeRF) lack those priors.

Gaussian blur is a reversible operation, but in practice it's not possible on still images. With multiple pictures you might have enough information.
Both. The paper mentions using a deblurrer and novel view synthesis model(ExBluRF).

This item has no comments currently.