Comment by daemonologist

daemonologist Dec 9, 2025 parent

I agree - the results on the finetunes are not very surprising. The trained-from-scratch ResNets (Figure 2 and Section 3.2.1) are definitely more interesting, though somewhat limited in scope.

In any case, my impression is that this is not immediately more useful than a LoRA (and is probably not intended to be), but is maybe an avenue for further research.

augment_me Dec 9, 2025

I don't think its that surprising actually. And I think the paper in general completely oversells the idea.

The ResNet results hold from scratch because strict local constraints (e.g., 3x3 convolutions) force the emergence of fundamental signal-processing features (Gabor/Laplacian filters) regardless of the dataset. The architecture itself enforces the subspace.

The Transformer/ViT results rely on fine-tunes because of permutation symmetry. If you trained two ViTs from scratch, "Attention Head 4" in Model A might be functionally identical to "Head 7" in Model B, but mathematically orthogonal.

Because the authors' method (SVD) lacks a neuron-alignment step, scratch-trained ViTs would not look aligned. They had to use pre-trained models to ensure the weights shared a coordinate system. Effectively, I think that they proved that CNNs converge due to it's arch, but for Transformers, they mostly just confirmed that fine-tuning doesn't drift far from the parent model.

mlpro Dec 9, 2025

I think its very surprising, although I would like the paper to show more experiments (they already have a lot, i know).

The ViT models are never really trained from scratch - they are always finetuned as they require large amounts of data to converge nicely. The pretraining just provides a nice initialization. Why would one expect two ViT's finetuned on two different things - image and text classification end up in the same subspace as they show? I think this is groundbreaking.

I don't really agree with the drift far from the parent model idea. I think they drift pretty far in terms of their norms. Even the small LoRA adapters drift pretty far from the base model.

rhaps0dy Dec 9, 2025

Thank you for saving me a skim

swivelmaster Dec 9, 2025

You’ve explained this in plain and simple language far more directly than the linked study. Score yet another point for the theory that academic papers are deliberately written to be obtuse to laypeople rather than striving for accessibility.

bmacho Dec 9, 2025

Vote for the Party that promises academic grants for people that write 1k character long forum posts for the laypeople instead of other experts of the field.

mapt Dec 9, 2025

We have this already. It's called an abstract. Some do it better than others.

Perhaps we need to revisit the concept and have a narrow abstract and a lay abstract, given how niche science has become.

rocqua Dec 9, 2025

I don't think the parent post is complaining that academics are writing proposals (e.g as opposed to people with common sense). Instead, it seems to me that he is complaining that academics are writing proposals and papers to impress funding committees and journal editors, and to some extend to increase their own clout among their peers. Instead of writing to communicate clearly and honestly to their peers, or occasionally to laymen.

And this critique is likely not aimed at academics so much as the systems and incentives of academia. This is partially on the parties managing grants (caring much more about impact and visibility than actually moving science forwards, which means everyone is scrounging for or lying about low hanging fruit). It is partially on those who set (or rather maintain) the culture at academic institutions of gathering clout by getting 'impactful' publications. And those who manage journals also share blame, by trying to defend their moat, very much hamming up "high impact", and aggressively rent-seeking.

swivelmaster Dec 9, 2025

I’m not sure that’s something we get to vote on.

eru Dec 9, 2025

On the margin, you can let anything influence your voting decision.

2 More Comments →

shoubidouwah Dec 9, 2025

and hope for a president that can do both

mlpro Dec 11, 2025 (dead)

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous