Profile: killerstorm - Hacker Neue

killerstorm

Joined Apr 20, 2007 1,114 karma

killerstorm 1 day ago

A bit of backstory:
I got really interested in LLMs in 2020 after GPT-3 release demonstrated in-context learning. But I tried running a LLM a year before: trying out AI Dungeon 2 (based on GPT-2).
Back in 2020 people were discussing how transformer-based language model are limited in all sorts of ways (operating on a tiny context, etc). But as I learned about how transformers work, I got really excited: it's possible to use raw vectors as input, not just text. So I got this idea that all kinds of modules can be implemented on top of pre-trained transformers via adapters which translate any data into representations of a particular model. E.g. you can make a new token representing some command, etc.
A lack of memory was one of hot topics, so I did a little experiment: since KV cache has to encode 'run-time' memory, I tried transplanting parts of KV cache from one model forward pass into another - and apparently only few mid layers were sufficient to make model recall a name from prior pass. But I didn't go further as it was too time consuming for a hobby project. So that's where I left it.
Over the years, academic researchers got through same ideas as I had and gave them names:
* arbitrary vectors injected in place of fixed token embeddings are called a "soft prompt" * custom KV-prefix added before normal context is called "prefix tuning" * "soft prompt" to generate KV prefix which encodes a memory is called "gisting" * KV prefix encoding a specific collection of documents was recently called "cartridge"
Opus 4.5 running in Claude Code can pretty much run an experiment of this kind on its own, starting from a general idea. But it still needs some help - to make sure we use prompts and formats which actually make sense, look for best data set, etc.
1 point 1 day ago

Show HN: Skill capsules" for LLMs, a "poor man's continual learning"

1 comment killerstorm github.com
killerstorm 1 day ago

Well, language is a subject to 'fashion' one-upmanship game: people want to demonstrate their sophistication, often by copying some "cool" patterns, but then over-used patterns become "uncool" cliche.
So it might be just a natural reaction to over-use of a particular pattern. This kind of stuff have been driving language evolution for millennia. Besides that, pompous style is often used in 'copy' (slogans and ads) which is something most people don't like.
killerstorm 1 day ago

It is not a decoration. Karpathy juxtaposes ChatGPT (which feels like a "better google" to most people) to Claude Code, which, apparently, feels different to him. It's a comparison between the two.
You might find this statement non-informative, but without two parts there's no comparison. That's really the semantics of the statement which Karpathy is trying to express.
ChatGPT-ish "it's not just" is annoying because the first part is usually a strawman, something reader considers trite. But it's not the case here.
killerstorm 2 days ago

I have seen this part. In fact I checked the paper itself where they provide more detailed numbers: it's still almost a double of the base Gemma, reuse of embeddings and attention doesn't make that much difference as most weights are in MLP s
killerstorm 2 days ago

They are comparing 1B Gemma to 1+1B T5Gemma 2. Obviously a model with twice more parameters can do more better. Says absolutely nothing about benefits of the architecture.
killerstorm 2 days ago

> On desktop (using a mouse or trackpad), drag and drop actually works quite well.
Strong disagree here. It is intuitive, it is easy to demonstrate. But it's not really convenient, especially on a trackpad. I have enough mouse agility to play RTS games but not to do a reliable drag-and-drop, especially in a complicated case - across windows, with scroll, etc.
killerstorm 2 days ago

Nice. I actually did something similar 25 years ago, I called by think "pick-and-put".
At that time I switched from MS-DOS environment to Windows 98. And as I was trying new UI features, I found drag-and-drop incredibly annoying. Especially if you do it between different windows, it requires a lot of movements, etc.
I had an idea that going further into skeuomorphism can make things better, so I started experimenting with 3D UI, particularly, a file manager with 3D UI. And as an alternative to drag-and-drop I designed pick-and-put.
It's actually very simple: right mouse button picks up an object and you get a symbol of that object next to the cursor. Then a click onto empty space puts it there. Or you can click a copy button which would copy it, etc.
I think it could work really well if we got a convention that some mouse button always picks an object. But we don't.
I don't think there's a way to make it works in the same way on desktop and mobile in a way which would be good. On desktop you have a mouse pointer, and you can easily represent point of insertion.
For mobile you came up with this scroll trick, but I think many people would find it unintuitive and annoying - especially on desktop.
killerstorm Dec 15, 2025

Hmm, suppose Claude Code can make a prototype.
But who's going to deploy it, make backups, integrate authentication, review security?..
Now, perhaps, it would be nice to have some kind of a ERP framework which would host AI-generated apps and connect them to each other. Is there anything like that?
killerstorm Dec 15, 2025

This reminds me of Idiocracy: "Ah, you talk like a fag, and your shit's all retarded" as a response to a normal speech.
killerstorm Dec 15, 2025

Well, you get better UX with native remote. I can always make add an external dongle unless it's bricked
killerstorm Dec 15, 2025

FFS. When I bought LG OLED TV, it was quite snappy. A year later, it asked to update webOS. OK. Now we are crawling through molasses...
All TV software seems appears to be an absolute fucking scam.
killerstorm Dec 14, 2025

ML experiment: "skill capsules" for LLM. Capsules can be cheaply extracted from successful episodes (as little as a single episode) and then applied to improve success of similar tasks.
I see it as a "poor man's continual learning".
killerstorm Dec 7, 2025

For most papers, the main idea can be described in 1-2 sentences, sort of "we did X using Y".
That doesn't work for HOPE - a short summary can't explain what it actually does besides "self-modifying" and "continuum memory".
So it seems to be an innovation of Transformers calibre, really big (if true). It's definitely not "transformer but with such-and-such modification".
Gemini came up with a following visual metaphor for the difference:
> Transformer is a series of frozen glass panes (the weights) and a scratchpad (the attention) where it writes notes about the current text.
> The HOPE architecture involves no scratchpad. Instead, the glass panes themselves are made of smart liquid. As the data flows through, the first pane reshapes itself instantly. The second pane reshapes itself slowly. And the mechanism deciding how to reshape them is itself a tiny, intelligent machine, not just a basic math rule.
killerstorm Dec 7, 2025

Well, back in the day many of the people making buying decisions were tech enthusiasts who like the idea of upgradeability, etc. Computers were quite expensive, and people didn't want to waste money on a box which can only do one thing.
Besides that, "app store" was just not feasible with tech of the day.
When vast majority of customers do not care, you can ship a locked down device.
You can buy a hackable phone, but it's a niche
killerstorm Dec 7, 2025

I hate this kind of writing which is rather common in science reporting. Is it bad on purpose?
Seems like the purpose is to keep reader confused about some point to maximize time spent on page. And I'm quite certain LLM can do a lot better
killerstorm Nov 25, 2025

He shares the progress on Twitter quite often. In the last year they shifted the focus away from raw performance (as beating existing stuff is rather daunting) and into rather unique stuff with code synthesis, perhaps relevant to formal verification of vibe-coded code, etc.
killerstorm Nov 24, 2025

There's a model of computation called 'interaction nets' / 'interaction calculus', which reduces in a more physically-meaningful, local, topologically-smooth way.
I.e. you can see from these animations that LC reductions have some "jumping" parts. And that does reflect LC nature, as a reduction 'updates' many places at once.
IN basically fixes this problem. And this locality can enable parallelism. And there's an easy way to translate LC to IN, as far as I understand.
I'm a noob, but I feel like INs are severely under-rated. I dunno if there's any good interaction net animations. I know only one person who's doing some serious R&D with interaction nets - that's Victor Taelin.
killerstorm Nov 21, 2025

tl;dr: If calculate "the human time horizon using the same methodology as we do for models", it's only 1.5 hours @ 50% success rate for the baseline experts METR hired, and it was surpassed by o3 in April 2025, 6 months ahead METR's prediction.
METR considers this "raw baseline" largely irrelevant as it might be affected by people getting bored / not paid enough, etc. But they admit this introduces a bias which makes reported numbers less relevant for human-vs-AI comparison.
1 point Nov 21, 2025

METR's time-horizon of coding tasks does not mean what you think it means

1 comment killerstorm github.io

This user hasn’t submitted anything.