Preferences

jrk
Joined 1,464 karma
Associate professor of computer science, MIT. Graphics, compilers, systems, architecture.

  1. Why are GPUs necessarily higher latency than TPUs? Both require roughly the same arithmetic intensity and use the same memory technology at roughly the same bandwidth.
  2. Not sure when you last tried, but Gemini, Claude, and ChatGPT have all supported pretty effective PDF input for quite a while.
  3. It is not only not that much more complex, it is often less complex.

    Higher-level services like PaaS (Heroku and above) genuinely do abstract a number of details. But EC2 is just renting pseudo-bare computers—they save no complexity, and they add more by being diskless and requiring networked storage (EBS). The main thing they give you is the ability to spin up arbitrarily many more identical instances at a moment’s notice (usually, at least theoretically, though the amount of the time that you actually hit unavailability or shadow quotas is surprisingly high).

  4. You know that AWS will come back up. You definitely don’t know whether your own instances will come back or if you’ll need to redeploy it all.
  5. Yes but you can also do the same thing with autoregressive models just by making them smaller. This tradeoff always exists, the question is whether the Pareto curve for diffusion models ever crosses or dominates the best autoregressive option at the same throughput (or quality).
  6. I think they weren’t asking “why can’t Gemini 3, the model, just do good transcription,” they were asking “why can’t Gemini, the API/app, recognize the task as something best solved not by a single generic model call, but by breaking it down into an initial subtask for a specialized ASR model followed by LLM cleanup, automatically, rather than me having to manually break down the task to achieve that result.”
  7. I think the point was not that gem-grade synthetic diamonds are ugly, but that, as industry masters gem-grade production, presumably below-gem-grade production (“ugly synthetic diamonds”) would become cheap enough to deploy in more engineering settings where diamond’s other unique properties were the key concern.
  8. This is an established, though advanced, idea.

    Sourcegraph Amp (https://sourcegraph.com/amp) has had this exact feature built in for quite a while: "ask the oracle" triggered an O1 Pro sub-agent (now, I believe, GPT-5 High), and searching can be delegated to cheaper, faster, longer-context sub-agents based on Gemini 2.5 Flash.

  9. It is a few generations behind: Blackwell is still on N4, which is an N5 variant. Meanwhile TSMC has been shipping N3 family processes in large volume products (Apple) for more than 2 years already, and is starting to ramp the next major node family (N2) for Apple et al. next year.

    NVIDIA has often lagged on process, since they drive such large dies, but having the first major project demo wafer on N4 now is literally 2 generations behind Taiwan.

  10. If you go just a few posts back in Peter's own blog he has a video of himself doing exactly this:

    https://steipete.me/posts/2025/live-coding-session-building-...

    He has posted others over the past few months, but they don't seem to be on his blog currently.

    As @simonw mentions in a peer comment, Armin Ronacher also has several great streams (and he's less caffeinated and frenetic than Peter :)

  11. People of course often do read (and even modify) the model-generated code, but doing so is specifically not “vibe coding” according to the original definition, which was not meant to encompass “any programming with an LLM” but something much more specific: https://simonwillison.net/2025/Mar/19/vibe-coding/
  12. Many people see top-line rate increases and assume the issue is supply cost, but transmission and distribution have become over 50% of cost everywhere I’ve lived, and are growing fast, regardless of underlying generation or fuel costs. Distribution alone (the neighborhood/local grid) is now roughly matching the supply cost on my MA bill, and though I last lived in CA in 2019, I would be surprised if PG&E weren’t similar.
  13. eInk devices are very much not converging to 16:9 or wider aspect ratios. This device is intentionally the size and shape of a reporter's notebook, but there are virtually no other eInk tablets which diverge significantly from more common paper aspect ratios – they all (ReMarkable, Supernote, Boox, Kindle, etc.) are and continue to be exactly what you say you want.
  14. The OP is talking about fabrication technology, not end products. Even years into their delays getting to 10nm, Intel had more advanced fabrication technology than TSMC until N7 reached volume in 2018.
  15. Intel’s situation in 2025 is not comparable to the rest of big tech. They have lost technical leadership, bled market share, and started losing a ton of money in a hugely capital-intensive business. They are actually in need of major triage to survive, not just hopping on a belt-tightening trend among still-massively-profitable software companies.
  16. This is indeed a reasonable cost estimate for competitive short-term H100 rentals (source: much SemiAnalysis coverage, and my own exploration of the market), but there is a critical error (besides the formatting glitch with `*`):

    It was 24 days (576 hours) not 24 hours. $663,552 @ $3/hr.

  17. Simon Willison nailed exactly this 2 years ago:

    > I've been thinking about generative AI tools as "bicycles for the mind" (to borrow an old Steve Jobs line), but I think "electric bicycles for the mind" might be more appropriate.

    > They can accelerate your natural abilities, you have to learn how to use them, they can give you a significant boost that some people might feel is a bit of a cheat, and they're also quite dangerous if you're not careful with them!

    https://simonwillison.net/2023/Feb/13/ebikes/

  18. The notion that Google is worse at carefully managing PII than a Wild West place like OpenAI (or Meta, or almost any major alternative) is…not an accurate characterization, in my experience. Ad tech companies (and AI companies) obsessively capture data, but Google internally has always been equally obsessive about isolating and protecting that data. Almost no one can touch it; access is highly restricted and carefully managed; anything that even smells adjacent to ML on personal data has gotten high-level employees fired.

    Fully private and local inference is indeed great, but of the centralized players, Google, Microsoft, and Apple are leagues ahead of the newer generation in conservatism and care around personal data.

  19. They don't "have your data," even at an aggregated and noised level, due to the homomorphic encryption part.

    Restating the layers above, in reverse:

    - They don't see either your data or the results of the query (it's fully encrypted even from them where they compute the query -- this is what homomorphic encryption means)

    - Even if they broke the encryption and had your query data / the query result, they don't know who "you" are (the relay part)

    - Even if they had your query hash and your identity, they couldn't reverse the hash to identify which specific photos you have in your library (the client-side vectorization + differential privacy part), though by the this point they could know what records in the places database were hits. So they could know that you took a photo of a landmark, but only if the encryption and relay were both broken.

  20. My first point was not that economics as a discipline is infallible, just that this was the opposite of a “hot take.”

    Regarding some costs (especially housing) growing faster than the overall rate of inflation: of course they do, just as some grow more slowly or even deflate. The index here is computed based on the average amount people spend on these different categories (the consumer price index), so the need to spend a large share on housing (or toilet paper) *is already included* in the inflation measure.

  21. Your main point is the well-established modern consensus among economists, and nicely put. But what makes you think salaries don't keep up with inflation? In the postwar US, they absolutely have kept up over the long term. There are ups and downs depending on the strength of the labor market, and they don't always react instantly to short-term bursts of inflation (as in 2021-22), but over the long term they are steadily up even in the "great stagnation" period of the past 50 years:

    https://fred.stlouisfed.org/series/MEPAINUSA672N (individual) https://fred.stlouisfed.org/series/MEHOINUSA672N (household)

  22. Rust was started in 2006 and launched publicly, I believe, in 2009, the same year as Go. The point stands that these are still fairly new, but it’s not nearly that new.
  23. SRAM is scaling significantly more slowly than logic in recent process nodes.
  24. It is a coincidence. They are unrelated hardware blocks and very different architectures.
  25. The have built and operated a growing number of their own data centers for years. Presumably this will go into those.
  26. If you annotate PDFs in Preview it is very common to find that they disappear when the PDF is shared with someone else (e.g. when giving feedback on a document), or even when it’s reopened in Preview in the future. Frequent inexplicable data loss is not great.

    But indeed, as a pure reader, Preview remains mostly fine.

  27. Within the market of discrete gaming GPUs, there has been a hierarchy from "low-" to "high-end" for decades -- since before integrated GPUs even existed. For almost 20 years, things with 192- or 256-bit memory busses have been "mid-range" (vs. 384- or occasionally 512-bit memory busses at the high end, and smaller at the low-end). NVIDIA's "7"-tier GPUs have historically been the top of the mid-range in this world.

    Within this world "midrange" has been creeping upward not via big shifts in these fundamental characteristics but via:

    1. Prices increasing steadily across the board, due to shortages and market power 2. Power budgets (and corresponding board/cooler sizes) increasing across the board

    The fundamentals (memory bus width -- still 256-bit; die size and performance relative to the top of the line) remain "mid-range" in exactly this same sense.

  28. I was curious to follow your development, but the lack of RSS feed for your blog means I basically can't.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal