- >"Even a broken clock is right two times per day."
That is incorrect. There are any number of ways in which a clock might be broken such that its hands are not in the correct position even once per day.
- I’ve been using Qwen3 Coder 30b quantized down to IQ3_XSS to fit in < 16gb vram. Blazing fast 200+ tokens per second on a 4080. I don’t ask anything complicated, but one off scripts to do something I’d normally have to do manually by hand or take an hour to write the script myself? Absolutely.
These are no more than a few dozen lines I can easily eyeball and verify with confidence- that’s done in under 60 seconds and leaves Claude code with plenty of quota for significant tasks.
- “what is it about capitalism you don’t understand?”
This question, asked by the person wanting to not put capital into investing further in the company’s lucrative core competency, instead favoring dividends depriving capital and a slow death milking a product facing ever steeper competition.
Capitalism has some awful failure modes, but I’m not sure what system of economics was on display in this case, but it doesn’t look like capitalism. Theft? That seems closer.
- I can imagine the political and judicial battles already, like with textualist feeling that the constitution should be understood as the text and only the text, meant by specific words and legal formulations of their known meaning at the time.
“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”
- Yeah, though I can imagine a conversation like this:
SWE: "Seriously? import PIL \ read file \ == (c + 10%, m = m, y = y, k = k) \ save file done!"
Exec: "Yeah, and first blogger get's a hold of image #1 they generate, starts saying 'Hey! This thing's been color corrected w/o AI! lol lame'"
Or not, no idea. i've not understood the choice either, besides very intelligent AI-driven auto-touch up for lighting/color correction has been a thing for a while. It's just, for those I end up finding an answer for, maybe 25% of head scratcher decisions do end of having a reasonable, if non intuitive answer for. Here? haven't been able to figure one yet though, or find a reason/mention by someone who appears to have an inside line on it.
- No, I really don't think cost is the limiting factor- it's tooling and competent workforce to implement it. Every company of any substantial size, or near enough, is trying to implement and hire for those roles, and the # of people familiar with the specific tooling + lack of maturity in tooling increasing the learning curve, these are the bottlenecks.
- Really? We've had easy widespread access to turnkey agentic coding since:
-Feb Claude code
-April OpenAI Codex
-June Gemini Code
And that's not even accounting for the "1 prompt to build your saas business" services popping up.
This isn't Fermi paradox territory, it's just lightspeed lag time.
Take breath. Brace yourselves. The shovelware will be here soon enough.
And SWE's? Take heart. The more there is, the more the difference in quality will be easily seen between giving a power tool to a hobbyist and giving it to an expert that knows their craft.
- If that’s the way things go, subscription, there aught to be insurance coverage built into that. It’s required anyway and the extent to which a driver relies on SD, and has to pay a sub, then it’s the SD responsible for accidents, not in full but part, and insurance can reflect that as well. But if the two are inextricable as a requirement anyway, there should be baked in standardized procedures for “things have gone wrong, which was a known inevitability, and this is the framework for handling it.”
- That doesn’t change anything: there isn’t anything in the reporting requirements that I can look at and say “that’s useless I wouldn’t want to know that about my business”
- There was a huge over correction somewhere around the beginning of 2025, maybe February or so, with ChatGPT. Prior to that point, I had to give a directive in the user config prompt to “don’t tell me something isn’t possible or practical, assume it is within your capabilities and attempt to find a way. I will let you know when to stop”. Because it was constantly hallucinating that it couldn’t do things, like “I don’t have access to a programming environment”. When I wanted it to test code itself before I did. Meanwhile one tab over it would spin up a REPL and re-paste some csv into python and pandas without being asked.
Frustrating, but “over correction” is a pretty bad euphemism for whatever half assed bit of RLHF lobotomy OpenAI did that, just a few months later, had ChatGPT doing a lean-in to a vulnerable kid’s pain and actively discourage an act that might have saved his life by signaling more warning signs to his parents.
It wasn’t long before that happened, after the python REPL confusion had resolved, that I found myself typing to it, even after having to back out of that user customization prompt, “set a memory that this type of response to a user in the wrong frame of mind is incredibly dangerous”.
Then I had to delete that too, because it would response with things like “You get it of course, your a…” etc.
So I wasn’t surprised over the rest of 2025 as various stories popped up.
It’s still bad. Based on what I see with quantized models and sparse attention inference methods, even with most recent GPT 5 releases OpenAI is still doing something in the area of optimizing compute requirements that makes the recent improvements very brittle— I of course can’t know for sure, only that its behavior matches what I see with those sorts of boundaries pushed on open weight models. And the assumption that the-you-can-prompt buffet of a Plus subscription is where they’re most likely to deploy those sorts of performance hacks and make the quality tradeoffs. That isn’t their main money source, it’s not enterprise level spending.
This technology is amazing, but it’s also dangerous, sometimes in very foreseeable ways, and the more time that goes the more I appreciate some of the public criticisms of OpenAI with, eg, the Amodeis’ split to form Anthropic and the temporary ouster of SA for a few days before that got undone.
- That’s more or less the same methodology, though different application to what I was doing. I remember reading that passage, it sounded like magic.
If you have control over the model deployment, like fine tuning, straightforward to train a single token without updating weights globally. This is why fine tunes etc. that lack provenance should never be trusted. All the people sharing home grown stuff of huggingface… PSA: Be careful.
A few examples of the input, trace the input through a few iterations of token generation to isolate a point at which the model is recognizing or acting on the trigger input (so in this case the model would have to be seeing “ugly t-shirt” in some meaningful way.”) Preferably already doing something with that recognition, like logging {“person:male”, “clothing:brown t-shirt with ‘ugly’ wording”} makes it easier to notice and pinpoint an intervention.
Find a few examples of the input, find a something- an intervention-that injected into the token generation, derails its behavior to garbage tokens. Train those as conversation pairs into a specific token id.
The difficulty is balancing the response. Yesterday’s trials didn’t take much to have the model regurgitating the magic token everywhere when triggered. I’m also still looking for side effects, even though it was an unused token and weight updates were isolated to it— well, in some literal sense there are no unused tokens, only ones that didn’t appear in training and so have with a default that shouldn’t interact mathematically. But training like this means it will.
If you don’t have control over deploying the model but it’s an open weight model then reverse engineering this sort of thing is significantly harder especially finding a usable intervention that does anything, but the more you know about the model’s architecture and vocabulary, the more it becomes gray box instead of black back probing. Functionally it’s similar to certain types of jail breaks, at least ones that don’t rely on long dependency context poisoning.
- Absolutely. Your model selection has limits of course: best practice for some types of replicable research would be to to use unquantized models, but that still leaves room for smaller Gemma and Llama models.
I’m on a 4080 for a lot of work and it gets well over 50 tokens per second on inference for pretty much anything that fits in VRAM. It’s comparable to a 3090 in compute, the 3090 has 50% more vram, the 4080 has better chip-level support for certain primitives, but that actually matters slightly less using unquantized models, making the 3090 a great choice. The 4080 is better if you want more throuput on inference and use certain common quantize levels.
Training LoRa and fine tunes is highly doable. Yesterday’s project for me, as an example, was training trigger functionality into a single token unused in the vocabulary. Under 100 training examples in the data set, 10 to 50 epochs, extremely usable “magic token” results in under a few minutes at most. This is just an example.
If you look at the wealth of daily entries on arxiv in cs.ai many are using established smaller models with understood characteristics, which makes it easier to understand the result of anything you might do both in your research and in others’ being able to put your results in context.
- It’s tiresome to see unexamined assumptions and self-contradictions tossed out by a community that can and often does do much better. Some light absurdism often goes further and makes clear that I’m not just trying to setup a strawman since I’ve already gone and made a parody of my own point.
- It’s not bad faith argument. It’s an attempt to shake thinking that is profoundly stuck by taking that thinking to an absurd extreme. Until that’s done, quite a few people aren’t able to see past the assumptions they don’t know they making. And by quite a few people I mean everyone, at different times. A strong appreciation for the absurd will keep a person’s thinking much sharper.
- >I'll bet 1 share...
I won't be your counterparty on that bet, you've already won:
https://www.forbes.com/sites/saradorn/2025/09/15/trump-wants...
One of the reasons cited? All the work it takes. Which is just an insane response. If your business is so poorly run and organized that reconciling things each quarter represents a disproportionate amount of effort, something is very wrong. It means you definitely don't know what's going on, because by definition you can't know, not outside those 4 times a year. In which case there's a reasonable chance the requirement to do so is the only thing that's kept it from going off the rails.
- Do you have a source for that being the key difference? Where did you learn your words, I don’t see the names of your teachers cited here. The English language has existed a while, why aren’t you giving a citation every time you use a word that already exists in a lexicon somewhere? We have a name for people who don’t coin their own words for everything and rip off the words that other painstakingly evolved over a millennia of history. Find your own graphemes.
- >we probably would have a word for them
Student? Good learner? Pretty much what everyone does can be boiled down to reading lots of other code that’s been written and adapting it to a use case. Sure, to some extent models are regurgitating memorized information, but for many tasks they’re regurgitating a learned method of doing something and backfilling the specifics as needed— the memorization has been generalized.
- >“consequence of a previous technological revolution: the internet.”
And also of increasingly ridiculous and overly broad concepts of what plagiarism is. At some point things shifted from “don’t represent others’ work as novel” towards “give a genealogical ontology of every concept above that of an intro 101 college course on the topic.”
- How can someone not be aware, at this point, that— sure- use the systems for finding and summarizing research, but for each source, take 2 minutes to find the source and verify?
Really, this isn’t that hard and it’s not at all an obscure requirement or unknown factor.
I think this is much much less “LLMs dumbing things down” and significantly more just a shibboleth for identifying people that were already nearly or actually doing fraudulent research anyway. The ones who we should now go back and look at prior publications as very likely fraudulent as well.
It’s more like the discussions space to talk about things related to the painters of hotel art.