Preferences

fovc
Joined 1,165 karma

  1. Łukasz Kaiser basically confirmed it in a podcast:

    https://youtu.be/3K-R4yVjJfU?si=JdVyYOlxUbEcvEEo&t=2624

    > Q: Are the releases aligned with pre-training efforts?

    > A: There used to be a time not that long ago, maybe half a year, distant past, where the models would align with RL runs or pretraining runs ... now the naming is by capability. GPT5 is a capable model; 5.1 is a more capable model

  2. > I also think it’s important to notice that a lot of these challenges they happen with humans too. The concept of prompt injection isn’t that different from social engineering, right? When somebody calls in and says, “Oh, I forgot my password, can you just help me this one time?”
  3. Crossover from the other front page article. I tested out ChatGPT5 search mode and there are some good sources!

    https://chatgpt.com/s/t_68bd82908c0c8191b142b860ff91c9dc

  4. Very nice! For other readers, vc is short for verification condition and wp is short for weakest precondition.
  5. I wonder if the error propagation problem could be solved with a “branching” generator? Basically at every token you fork off N new streams, with some tree pruning policy to avoid exponential blowup. With a bit of bookkeeping you could make an attention mask to support the parallel streams in the same context sharing prefixes. Perhaps that would allow more of an e2e error minimization than the greedy generation algorithm in use today?
  6. Or see the explanation in video form here: https://m.youtube.com/watch?v=d0HJvGSWw8A

    Mamba has been discussed a lot here, and this seems like a promising line of inquiry for improvement

  7. And also make sure that slot is a symbol in the correct package. Or do like Elisp and do without packages but then have a 16 character prefix
  8. Having the data structures is nice and all, but using them is kind of painful. They are certainly second class.

    Having to use accessor functions or destructuring macros instead of just a period or -> is often annoying too. The lack of syntax has cons as well as pros.

  9. Sparse attention essentially combines 3 types of attention optimizations:

    1. Compression of the query input vectors to reduce the size of the KV cache

    2. Selectively computing uncompressed attention on a subset of tokens based on the compressed blocks with the highest attention scores

    3. Using sliding window for local attention at full resolution

    > Both Full Attention and sparse attention models are pretrained on 270⁢B tokens of 8⁢k-length texts, followed by continued training and supervised fine-tuning on 32⁢k-length texts with YaRN to achieve long-context adaptation. Both models are trained to full convergence to ensure fair comparison.

    > our experiments adopt a backbone combining Grouped-Query Attention (GQA) and Mixture-of-Experts (MoE), featuring 27⁢B total parameters with 3⁢B active parameters

    Evaluated on MMLU, MMLU-PRO, CMMLU, BBH, GSM8K, MATH, DROP, MBPP, and HumanEval. NSA outperforms full attention on 7/9.

    Beats out H2O, InfLLM, Quest, Exact-Top, and full attention on LongBench

    Perfect retrieval on 64k needle-in-a-haystack

    The CoT eval is less convincing, but outperforms the FA on AIME24.

    Training speed of 2-9x vs. FlashAttention

    Decoding speedup of 4-12x vs. full attention ["expected"? Didn't see comparison to other attention mechanisms]

  10. Ah thanks for the clarification! Do you happen to know if Nile/Gezira went anywhere?
  11. Great to see this is alive and progressing! I believe Ohm started life in Alan Kay’s research group, to build a graphical OS and office suite in 10k lines of code. I found this talk immensely inspiring https://m.youtube.com/watch?v=ubaX1Smg6pY
  12. > I feel like I'm taking crazy pills when I read about others' experiences. Surely I am not alone?

    You're not alone :-) I asked a very similar question about a month ago: https://www.hackerneue.com/item?id=42552653 and have continued researching since.

    My takeaway was that autocomplete, boiler plate, and one-off scripts are the main use cases. To use an analogy, I think the code assistants are more like an upgrade from handsaw to power tools and less like hiring a carpenter. (Which is not what the hype engine will claim).

    For me, only the one-off script (write-only code) use-case is useful. I've had the best results on this with Claude.

    Emacs abbrevs/snippets (+ choice of language) virtually eliminate the boiler plate problem, so I don't have a use for assistants there.

    For autocomplete, I find that LSP completion engines provide 95% of the value for 1% of the latency. Physically typing the code is a small % of my time/energy, so the value is more about getting the right names, argument order, and other fiddly details I may not remember exactly. But I find, that LSP-powered autocomplete and tooltips largely solve those challenges.

  13. The “pair programming” approach with good models is just slow enough that I lose focus on each step. The faster models I’ve tried are not good enough except for straightforward things where it’s faster to just use emacs/LSP refactoring and editing tools. Maybe supermaven manages to beat the “good enough, fast enough” bar; I’ll have to try it!
  14. I still don’t understand how people are getting value out of AI coders. I’ve tried really hard and the commits produced are just a step up from garbage. Writing code from scratch is generally decent. But after a few rounds of edits the assistant just starts piling in conditionals into existing functions until it’s a rats nest 4 layers deep and 100+ lines long. The other day it got into a loop trying to resolve a type error, where it would make a change, then revert it, then make it again

    ETA: Sorry forgot about the relevancy in my rant! The one area where I’ve found the AIs helpful is enumerating and then creating test cases

  15. Makes a ton of sense!

    Is this for completions, patches, or new files?

  16. Agree it’s not a toy. AWS implemented a large chunk of IAM in Dafny. Though IIRC they have their own non-public compiler to Java
  17. It’s also useful in typed languages to introduce an existentially quantified type
  18. The requirement for having two copies of the table simultaneously on systems that make it easy to add but not subtract storage. Otherwise pg_repack has worked really well.

    We solved the 2x storage with partitions, but it feels like the tail wagging the dog

  19. Yes I know the slotted attribute is not in a __dict__, which definitely helps memory usage. But my point is that if the parent structure is itself in a dict, that access will swamp the L1 cache miss in terms of latency. Even the interpretation overhead (and likely cache thrashing) will eliminate L1 cache speedups.

    And yes __slots__ improve perf, but it’s about avoiding the __dict__ access, which hits really generic hashing code and then memory probing more than it is about L1 cache

    Where __slots__ are most useful (and IIRC what they were designed for) is when you have a lot of tiny objects and memory usage can shrink significantly as a result. That could be the difference between having to spill to disk or keeping the workload in memory. E.g., Openpyxl does with a spreadsheet model, where there could be tons of cell references floating around

  20. POSIX is weird, but NodeJS streams are designed to be misused
  21. It’s Python. Does L1 matter at all? I assume anything you’re accessing is behind a few pointers and __dict__ accesses anyway.

    For me it’s mostly about .attribute being more in line with the rest of the language. Kwargs aside, I find overuse of dicts to clunky in Python

  22. Never found so many choice quotes in one article...

    > Susie Thomas, a clerk for Lee County’s superior court, estimates it now takes her 10 times as many clicks to complete her case indexing. She was buried in scanning paper dockets for Odyssey’s online database until May 2024 and sorely misses the old DOS program. “It was a lot simpler and easier,” she says.

    > A Tyler spokesperson says that ... its definition of “defect” is “not a ‘bug’ in the software, but something that didn’t work as anticipated.”

    > Errors in the company’s apps have allegedly contributed to people getting stuck in prison for weeks longer than was ordered, or having incorrect verdicts entered on their records. Yet its products remain ubiquitous, in part because it has few serious competitors in the judicial space.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal