Profile: fovc - Hacker Neue

fovc

Joined Oct 9, 2014 1,165 karma

fovc Dec 4, 2025 parent

Łukasz Kaiser basically confirmed it in a podcast:
https://youtu.be/3K-R4yVjJfU?si=JdVyYOlxUbEcvEEo&t=2624
> Q: Are the releases aligned with pre-training efforts?
> A: There used to be a time not that long ago, maybe half a year, distant past, where the models would align with RL runs or pretraining runs ... now the naming is by capability. GPT5 is a capable model; 5.1 is a more capable model
fovc Nov 23, 2025 parent

> I also think it’s important to notice that a lot of these challenges they happen with humans too. The concept of prompt injection isn’t that different from social engineering, right? When somebody calls in and says, “Oh, I forgot my password, can you just help me this one time?”
3 points Nov 23, 2025

Why Building Agents Is Hard

1 comment fovc substack.com
fovc Sep 7, 2025 parent

Crossover from the other front page article. I tested out ChatGPT5 search mode and there are some good sources!
https://chatgpt.com/s/t_68bd82908c0c8191b142b860ff91c9dc
fovc Jun 18, 2025 parent

Very nice! For other readers, vc is short for verification condition and wp is short for weakest precondition.
fovc Mar 15, 2025 parent

I wonder if the error propagation problem could be solved with a “branching” generator? Basically at every token you fork off N new streams, with some tree pruning policy to avoid exponential blowup. With a bit of bookkeeping you could make an attention mask to support the parallel streams in the same context sharing prefixes. Perhaps that would allow more of an e2e error minimization than the greedy generation algorithm in use today?
2 points Mar 10, 2025

NoLiMa: Long-Context Evaluation Beyond Literal Matching

0 comments fovc arxiv.org
fovc Feb 26, 2025 parent

Or see the explanation in video form here: https://m.youtube.com/watch?v=d0HJvGSWw8A
Mamba has been discussed a lot here, and this seems like a promising line of inquiry for improvement
2 points Feb 26, 2025

DeltaNet Explained

1 comment fovc github.io
fovc Feb 23, 2025 parent

And also make sure that slot is a symbol in the correct package. Or do like Elisp and do without packages but then have a 16 character prefix
fovc Feb 23, 2025 parent

Having the data structures is nice and all, but using them is kind of painful. They are certainly second class.
Having to use accessor functions or destructuring macros instead of just a period or -> is often annoying too. The lack of syntax has cons as well as pros.
fovc Feb 18, 2025 parent

Sparse attention essentially combines 3 types of attention optimizations:
1. Compression of the query input vectors to reduce the size of the KV cache
2. Selectively computing uncompressed attention on a subset of tokens based on the compressed blocks with the highest attention scores
3. Using sliding window for local attention at full resolution
> Both Full Attention and sparse attention models are pretrained on 270⁢B tokens of 8⁢k-length texts, followed by continued training and supervised fine-tuning on 32⁢k-length texts with YaRN to achieve long-context adaptation. Both models are trained to full convergence to ensure fair comparison.
> our experiments adopt a backbone combining Grouped-Query Attention (GQA) and Mixture-of-Experts (MoE), featuring 27⁢B total parameters with 3⁢B active parameters
Evaluated on MMLU, MMLU-PRO, CMMLU, BBH, GSM8K, MATH, DROP, MBPP, and HumanEval. NSA outperforms full attention on 7/9.
Beats out H2O, InfLLM, Quest, Exact-Top, and full attention on LongBench
Perfect retrieval on 64k needle-in-a-haystack
The CoT eval is less convincing, but outperforms the FA on AIME24.
Training speed of 2-9x vs. FlashAttention
Decoding speedup of 4-12x vs. full attention ["expected"? Didn't see comparison to other attention mechanisms]
1 point Feb 18, 2025

Mamba-Shedder: Post-Transformer Compression for Efficient SSMs

0 comments fovc arxiv.org
1 point Feb 18, 2025

Reflections on 'The Bitter Lesson' (2021)

0 comments fovc cognitivemedium.com
fovc Feb 14, 2025 parent

Ah thanks for the clarification! Do you happen to know if Nile/Gezira went anywhere?
fovc Feb 12, 2025 parent

Great to see this is alive and progressing! I believe Ohm started life in Alan Kay’s research group, to build a graphical OS and office suite in 10k lines of code. I found this talk immensely inspiring https://m.youtube.com/watch?v=ubaX1Smg6pY
107 points Jan 31, 2025

Theoretical limitations of multi-layer Transformer

22 comments fovc arxiv.org
fovc Jan 23, 2025 parent

> I feel like I'm taking crazy pills when I read about others' experiences. Surely I am not alone?
You're not alone :-) I asked a very similar question about a month ago: https://www.hackerneue.com/item?id=42552653 and have continued researching since.
My takeaway was that autocomplete, boiler plate, and one-off scripts are the main use cases. To use an analogy, I think the code assistants are more like an upgrade from handsaw to power tools and less like hiring a carpenter. (Which is not what the hype engine will claim).
For me, only the one-off script (write-only code) use-case is useful. I've had the best results on this with Claude.
Emacs abbrevs/snippets (+ choice of language) virtually eliminate the boiler plate problem, so I don't have a use for assistants there.
For autocomplete, I find that LSP completion engines provide 95% of the value for 1% of the latency. Physically typing the code is a small % of my time/energy, so the value is more about getting the right names, argument order, and other fiddly details I may not remember exactly. But I find, that LSP-powered autocomplete and tooltips largely solve those challenges.
fovc Dec 30, 2024 parent

The “pair programming” approach with good models is just slow enough that I lose focus on each step. The faster models I’ve tried are not good enough except for straightforward things where it’s faster to just use emacs/LSP refactoring and editing tools. Maybe supermaven manages to beat the “good enough, fast enough” bar; I’ll have to try it!
fovc Dec 30, 2024 parent

I still don’t understand how people are getting value out of AI coders. I’ve tried really hard and the commits produced are just a step up from garbage. Writing code from scratch is generally decent. But after a few rounds of edits the assistant just starts piling in conditionals into existing functions until it’s a rats nest 4 layers deep and 100+ lines long. The other day it got into a loop trying to resolve a type error, where it would make a change, then revert it, then make it again
ETA: Sorry forgot about the relevancy in my rant! The one area where I’ve found the AIs helpful is enumerating and then creating test cases
fovc Dec 29, 2024 parent

Makes a ton of sense!
Is this for completions, patches, or new files?
fovc Dec 28, 2024 parent

Maybe this? https://www.cis.upenn.edu/~bcpierce/tapl/
fovc Dec 26, 2024 parent

Agree it’s not a toy. AWS implemented a large chunk of IAM in Dafny. Though IIRC they have their own non-public compiler to Java
fovc Dec 26, 2024 parent

It’s also useful in typed languages to introduce an existentially quantified type
fovc Dec 8, 2024 parent

Plants
fovc Oct 20, 2024 parent

The requirement for having two copies of the table simultaneously on systems that make it easy to add but not subtract storage. Otherwise pg_repack has worked really well.
We solved the 2x storage with partitions, but it feels like the tail wagging the dog
fovc Oct 14, 2024 parent

Yes I know the slotted attribute is not in a __dict__, which definitely helps memory usage. But my point is that if the parent structure is itself in a dict, that access will swamp the L1 cache miss in terms of latency. Even the interpretation overhead (and likely cache thrashing) will eliminate L1 cache speedups.
And yes __slots__ improve perf, but it’s about avoiding the __dict__ access, which hits really generic hashing code and then memory probing more than it is about L1 cache
Where __slots__ are most useful (and IIRC what they were designed for) is when you have a lot of tiny objects and memory usage can shrink significantly as a result. That could be the difference between having to spill to disk or keeping the workload in memory. E.g., Openpyxl does with a spreadsheet model, where there could be tons of cell references floating around
fovc Oct 13, 2024 parent

POSIX is weird, but NodeJS streams are designed to be misused
fovc Oct 10, 2024 parent

It’s Python. Does L1 matter at all? I assume anything you’re accessing is behind a few pointers and __dict__ accesses anyway.
For me it’s mostly about .attribute being more in line with the rest of the language. Kwargs aside, I find overuse of dicts to clunky in Python
fovc Sep 5, 2024 parent

Never found so many choice quotes in one article...
> Susie Thomas, a clerk for Lee County’s superior court, estimates it now takes her 10 times as many clicks to complete her case indexing. She was buried in scanning paper dockets for Odyssey’s online database until May 2024 and sorely misses the old DOS program. “It was a lot simpler and easier,” she says.
> A Tyler spokesperson says that ... its definition of “defect” is “not a ‘bug’ in the software, but something that didn’t work as anticipated.”
> Errors in the company’s apps have allegedly contributed to people getting stuck in prison for weeks longer than was ordered, or having incorrect verdicts entered on their records. Yet its products remain ubiquitous, in part because it has few serious competitors in the judicial space.

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous