- Really only prefixes, without a significant loss in accuracy. The point is that because later tokens can't influence earlier ones, the post-attention embeddings for those first tokens can't change. But the post-attention embeddings for "and then tell me what" would be wildly different for every prompt, because the embeddings for those tokens are affected by what came earlier.
My favorite not-super-accurate mental model of what's going on with attention is that the model is sort of compressing the whole preceding context into each token. So the word "tell" would include a representation not just of the concept of telling, but also of what it is that's supposed to be told. That's explicitly what you don't want to cache.
> So if I were running a provider I would be caching popular prefixes for questions across all users
Unless you're injecting user context before the question. You can have a pre baked cache with the base system prompt, but not beyond that. Imagine that the prompt always starts with "SYSTEM: You are ChatGPT, a helpful assistant. The time is 6:51 ET on December 19, 2025. The user's name is John Smith. USER: Hi, I was wondering..." You can't cache the "Hi, I was wondering" part because it comes after a high-entropy component (timestamp and user name).
- Go with Bazel gives you a couple options:
* You can use gazelle to auto-generate Bazel rules across many modules - I think the most up to date usage guide is https://github.com/bazel-contrib/rules_go/blob/master/docs/g....
* In addition, you can make your life a lot easier by just making the whole repo a single Go module. Having done the alternate path - trying to keep go.mod and Bazel build files in sync - I would definitely recommend only one module per repo unless you have a very high pain tolerance or actually need to be able to import pieces of the repo with standard Go tooling.
> a beefy VM to host CI
Unless you really need to self-host, Github Actions or GCP Cloud Build can be set up to reference a shared Bazel cache server, which lets builds be quite snappy since it doesn't have to rebuild any leaves that haven't changed.
- > if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong that library should be published as a v2 or dependents should pin to a specific version
...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time. If it's because you can't be certain from testing that it's actually a safe change, then fine, but note that that option is still available to you by copy/pasting to a v2/ or adding a feature flag. Going to a monorepo gives you strictly more options in how to deal with changes.
> You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service
This is true regardless of deployment pattern. The artifact that you publish needs to have pointers back to all changes that went into it/what commit it was built at. Mono vs. multi-repo doesn't materially change that, although I would argue it's slightly easier with a monorepo since you can look at the single history of the repository, rather than having to go an extra hop to find out what version 1.0.837 of your dependency included.
> the version that was published in the registry
Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history. If a binary is built at commit X, then all commits before X across all dependencies are included. That's kind of the point.
- Internal and external have wildly different requirements. Google internally can't update a library unless the update is either backward-compatible for all current users or part of the same change that updates all those users, and that's enforced by the build/test harness. That was an explicit choice, and I think an excellent one, for that scenario: it's more important to be certain that you're done when you move forward, so that it's obvious when a feature no longer needs support, than it is to enable moving faster in "isolation" when you all work for the same company anyway.
But also, you're conflating code and services. There's a huge difference between libraries that are deployed as part of various binaries and those that are used as remote APIs. If you want to update a utility library that's used by importing code, then you don't need simultaneous deployment, but you would like to update everywhere to get it done with - that's only really possible with a monorepo. If you want to update a remote API without downtime, then you need a multi-phase rollout where you introduce a backward-compatibility mode... but that's true whether you store the code in one place or two.
- I worked on building this at $PREV_EMPLOYER. We used a single repo for many services, so that you could run tests on all affected binaries/downstream libraries when a library changed.
We used Bazel to maintain the dependency tree, and then triggered builds based on a custom Github Actions hook that would use `bazel query` to find the transitive closure of affected targets. Then, if anything in a directory was affected, we'd trigger the set of tests defined in a config file in that directory (defaulting to :...), each as its own workflow run that would block PR submission. That worked really well, with the only real limiting factor being the ultimate upper limit of a repo in Github, but of course took a fair amount (a few SWE-months) to build all the tooling.
- "Ignoring the code entirely and only prompting" is the only definition of vibe-coding I'm aware of. It's from a Karpathy tweet (https://x.com/karpathy/status/1886192184808149383):
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists... I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension.
It specifically doesn't mean "using an LLM as a code assistant". It definitely doesn't mean asking the LLM questions about code which you'll then use to write your own code. Those are LLM-assisted activities, and it's totally fine if you're using the LLM that way. But it's not what the term "vibe coding" means. "Vibe coding" is giving up on any pretense that you're in control, and letting the LLM take the wheel. It's fun for getting quick projects done, but it's also now becoming a distressingly common practice for people who literally do not know how to program in order to get a "product" to market.
- Turning around a track definitely dissipates some heat energy through increased friction with the rails. Imagine taking a semicircle turn and making it tighter and tighter. At the limit, the train is basically hitting a solid wall and rebounding in the other direction, which would certainly transfer some energy.
The energy question is this: going from a 100kmh-due-north momentum to a 100kmh-due-south momentum via slowing, stopping, and accelerating again clearly takes energy. You can also switch the momentum vector by driving in a semicircle. Turning around a semicircle takes some energy, but how much - and where does it come from? Does it depend on how tight the circle is - or does that just spread it out over a wider time/distance? If you had an electric train with zero loss from battery to wheels, and you needed to get it from going north to going south, what would be the most efficient way to do it?
- I like natural keys... if you can prove that they're actually immutable and unique for the thing they're representing. Credit card number is a decent natural key for a table of payment instruments, not for users. Even for a natural-key-believer, users pretty much always need a synthetic ID, because anything you might possibly believe to be constant about humans turns out not to be.
- I hadn't thought about GitHub -I'm guessing the authors of the bill didn't either - but you're right, that is somewhat concerning. Still, I don't think it's the end of the world...
> The requirement is also that developers will request the signal. No scoping to developers that have a reason to care?
I don't see that requirement. Here's the sum total of the developer's responsibilities (emphasis added):
> A developer with actual knowledge that a user is a child via receipt of a signal regarding a user’s age shall, to the extent technically feasible, provide readily available features for parents to support a child user with respect to the child user’s use of the service and as appropriate given the risks that arise from use of the application, including features to do all of the following:
> (A) Help manage which accounts are affirmatively linked to the user under 18 years of age.
> (B) Manage the delivery of age-appropriate content.
> (C) Limit the amount of time that the user who is 18 years of age spends daily on application.
It would be nice if it had specific carve outs for things that aren't expected to interact with this system, but it seems like they're leaving it up to court judgment instead, with just enough wiggle room in the phrasing to make that possible.
If your application doesn't have a concept of "accounts", then A is obviously moot. If you don't deliver age-inappropriate content, then B is moot. The only thing that can matter is C, but I'd expect that (a) nobody is going to complain about the amount of time their kids are spending on Vim and (b) the OS would just provide that control at a higher level.
- It's always possible that they'll say it, but it would be a lie based on my reading of this bill. Sideloaded apps can choose whether or not to respect the OS's advice about the age of the user, it's not on the OS or device to enforce them being honest.
- Bill text: https://legiscan.com/CA/text/AB1043/id/3193837
This seems... not terrible? The typical counter-argument to any "think of the children!" hand-wringing is that parents should instead install parental controls or generally monitor what their own kids are up to. Having a standardized way to actually do that, without getting into the weirdness of third-party content controls (which are themselves a privacy/security nightmare), is not an awful idea. It's also limited to installed applications, so doesn't break the web.
This is basically just going to require all smartphones to have a "don't let this device download rated-M apps" mode. There's no actual data being provided - and the bill explicitly says so; it just wants a box to enter a birth date or age, not link it to an actual ID. I'm not clear on how you stop the kid from just flipping the switch back to the other mode; maybe the big manufacturers would have a lock such that changing the user's birthdate when they're a minor requires approval from a parent's linked account?
That said, on things like this I'm never certain whether to consider it a win that a reasonable step was taken instead of an extreme step, or to be worried that it's the first toe in the door that will lead to insanity.
- Yeah... I'm far from an expert on state-of-the-art ML, but it feels like a new embedding would invalidate any of the layers you keep. Taking off a late layer makes sense to me, like in cases where you want to use an LLM with a different kind of output head for scoring or something like that, because the basic "understanding" layers are still happening in the same numerical space - they're still producing the same "concepts", that are just used in a different way, like applying a different algorithm to the same data structure. But if you have a brand new embedding, then you're taking the bottom layer off. Everything else is based on those dimensions. I suppose it's possible that this "just works", in that there's enough language-agnostic structure in the intermediate layers that the model can sort of self-heal over the initial embeddings... but that intuitively seems kind of incredible to me. A transformation over vectors from a completely different basis space feels vanishingly unlikely to do anything useful. And doubly so given that we're talking about a low-resource language, which might be more likely to have unusual grammatical or linguistic quirks which self-attention may not know how to handle.
- It's much weirder now.
The current holder of that domain is using it to host a single page that pushes anti-vax nonsense under the guise of fighting censorship... but also links to the actual PuTTY site. Very weird mix of maybe-well-meaning and nonsense.
- > MCP promises to standardize AI-tool interactions as the “USB-C for AI.”
Ironically, it's achieved this - but that's an indictment of USB-C, not an accomplishment of MCP. Just like USB-C, MCP is a nigh-universal connector with very poorly enforced standards for what actually goes across it. MCP's inconsistent JSON parsing and lack of protocol standardization is closely analogous to USB-C's proliferation of cable types (https://en.wikipedia.org/wiki/USB-C#Cable_types); the superficial interoperability is a very leaky abstraction over a much more complicated reality, which IMO is worse than just having explicitly different APIs/protocols.
- Uptime and reliability are not the same thing. Designing a bridge doesn't require that the engineer be working 99.9% of minutes in a day, but it does require that they be right in 99.9% of the decisions they make.
- Yeah, I know that's how it works under the hood - and why you have things like all integers with values in [-5, 256] being assigned to the pre-allocated objects - but I don't think it's a particularly useful model for actually programming. "Pass-by-reference with copy-on-write" is semantically indistinguishable from "pass-by-value".
- Your first example has to do with the fact that tuples are copied by value, whereas lists are "copied" by reference. This is a special case of an even larger (IMO) misfeature, which is that the language tries very, very hard to hide the concept of a pointer from you. This is a rampant problem in memory-managed languages; Java has similar weirdness (although it's at least a bit more consistent since there are fewer primitives), and Go is doubly odd because it does have a user-controllable value vs. pointer distinction but then hides it in a lot of cases (with the . operator working through pointers, and anything to do with interfaces).
I think the whole thing does a misservice to novice or unwary programmers. It's supposed to be easier to use because you "don't have to worry about it" - but you really, really do. If you're not familiar with most of these details, it's way too easy to wander into code that behaves incorrectly.
- SF city and county are actually the same legal entity, not just the same land. It's officially called the City and County of San Francisco, and it's just as unusual as it sounds. The mayor also has the powers of a county executive with both a sheriff's department (county police to run the jails) and police department (city law enforcement) reporting to him; the city government runs elections like other counties; the Board of Supervisors - which is the typical county legislative structure - also serves as city council. (Denver, Colorado works the same way, I think.)
The only thing I can think of that Go uses a lot of generation for that other languages have other solutions for is mocks. But in many languages the solution is "write the mocks by hand", so that's hardly fair.