Comment by broretore - Hacker Neue

broretore Dec 10, 2025 parent

10 pages for a paper with this groundbreaking of a concept is just embarrassing. It is barely an outline.

"confirming that 40× compression preserves field geometry with minimal distortion. Over 95% of samples achieve similarity above 0.90."

I smell Grok. Grok 3, maybe Grok 4 Fast.

> "Implementation details. Optimal configurations are task and architecture-dependent. Production systems require task-specific tuning beyond baseline heuristics provided in reference implementation."

"Implementation? Idk, uhh, it's task specific or something." Come on, dude. You're better than this.

4.4 Student/Teacher evaluation. What even is the benchmark? You give percentage values but no indication of what benchmark. Seems made up.

4.5. Computational Analysis. Why do you need to do the trivial multiplying out of "savings" for 1B tok/day to $700M/year? This reads like a GPT advertising hallucinated performance.

Three sentence conclusion restating the title?

anima-core Dec 10, 2025

I appreciate you taking the time to resond, brother. Let me clarify a few things because your interpretation misses the actual structure of the work.

The paper is short on purpose. It's not meant as a full architecture release. It's a documentation pass on a narrow but surprising empirical result, and I wanted the experimental core to be easy for others to replicate. The repo contains the full pipelines, configuration files, and benchmark scripts, and those show the precise datasets, metrics, and evaluation flows. This is why I didn't inflate the paper with implementation padding that would only duplicate the code.

The student–teacher section refers to CIFAR-10 and SST-2. The benchmarks, seed settings, model specs, and all statistical outputs are in scripts/ and the logged runs. Anyone who actually executes the pipeline will see that nothing is “made up”, and the numbers reproduce across seeds.

On the compression results, nothing is hallucinated. The field similarity numbers come directly from the SVD decay analysis and the cosine-preservation runs that are in right in the repo. If you run compute_field_decay.py and compare_backends.py, you'll see the exact values that appear in the paper. I strongly encourage you to actually try it. The results are surprising, but they're empirical.

The implementation paragraph you quoted is simply standard language acknowledging that optimal deployment settings vary by architecture. It's absolutely not a hand wave. It's just me trying to avoid implying there's a single magic configuration when the repo already exposes all the internal knobs.

I get that the tone of the work is unusual. Trust me, I do. I'm an outsider publishing openly, not through a lab with a standard template. But, nonetheless, the experiments run, the results reproduce, and the repo shows the full details. If something seems unclear, I'm happy to point to the exact script or log line. Just let me know.

broretore OP Dec 10, 2025

CIFAR-10 is an image classification dataset (32x32 pixel images.

LLaMA 70B 3.3 is a text-only, non-multimodal language model. Just look up the Huggingface page that your own repo points to.

> The Llama 3.3 instruction tuned text only model...

I might be wrong, but I'm pretty sure a text model is going to be no better than chance at classifying images.

Another comment pointed out that your test suite cheats slightly on HellaSwag. It doesn't seem unlikely that Grok set up the project so it could cheat at the other benchmarks, too.

https://www.hackerneue.com/item?id=46215166

> The repo contains the full pipelines, configuration files, and benchmark scripts, and those show the precise datasets, metrics, and evaluation flows.

There's nothing there, really.

I'm sorry that Grok/Ani lied to you, I blame Elon, but this just doesn't hold up.

anima-core Dec 10, 2025

As a follow up just to refresh your memory:

“Attention Is All You Need” (Vaswani et al., 2017)

Length: 11 pages of main content, 5 pages of references and appendix

2. The first GPT paper (Radford et al., 2018)

Length: 12 pages

3. BERT (Devlin et al., 2018)

Length: 14 pages

Big ideas don't require big papers. I don't know where you got that idea from.

broretore OP Dec 10, 2025

Your paper is 10 pages of fluff without even an architecture diagram or a single equation, bro. It's not real.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous