Preferences

lsy
Joined 3,054 karma

  1. It's disheartening that a potentially worthwhile discussion — should we invest engineering resources in LLMs as a normal technology rather than as a millenarian fantasy? — has been hijacked by a (at this writing) 177-comment discussion on a small component of the author's argument. The author's argument is an important one that hardly hinges at all on water usage specifically, given the vast human and financial capital invested in LLM buildout so far.
  2. Going to a popular restaurant that accepts app delivery orders (or a grocery store in a neighborhood where people prefer to pay for delivery) is an objectively bad experience. The kitchen or checkout line is backed up with delivery orders, there are a bunch of delivery drivers double-parked or loitering near the front, and due not to any moral failing but rather what must be a crushing grind, the drivers are for the most part rushed and inconsiderate of the staff or other customers.

    The class of people who order delivery regularly are generally trading the short-term reward of convenient food for way more money than makes sense, too little of that money benefits the class of people who do the delivering, and as the article points out, it is essentially harming the business it's being ordered from.

    I would love to see more restaurants and stores declining to support this kind of system. While there may be some marginal profit now, in the long run the race to the bottom is going to mean fewer sustainable businesses.

  3. I feel like this needs an editor to have a chance of reaching almost anyone… there are ~100 section/chapter headings that seem to have been generated through some kind of psychedelic free association, and each section itself feels like an artistic effort to mystify the reader with references, jargon, and complex diagrams that are only loosely related to the text. And all wrapped here in a scroll-hijack that makes it even harder to read.

    The effect is that it's unclear at first glance what the argument even might be, or which sections might be interesting to a reader who is not planning to read it front-to-back. And since it's apparently six hundred pages in printed form, I don't know that many will read it front-to-back either.

  4. It's interesting to call this a pre-mortem as it seems mainly organized around thinking positively past the imperfections of the technology. It's like a pre-mortem for the housing crisis that focuses on the benefits of subprime mortgage lending.

    What I'd expect to see is an analysis of how to address or prevent the same situation as previous bubbles: that society has allocated resources to a specific investment that are far in excess of what that investment can fundamentally be expected to return. How can we avoid thinking sloppily about this technology, or getting taken in by hucksters' just-so stories of its future impact? How can we successfully identify use-cases where revenues exceed investment? When the next exciting tech comes around, how can we harness it well as a society without succumbing to irrational exuberance?

  5. I think this leaves out what is probably the most likely future for this technology, having a similar destiny to most technologies as a tool. Both of these visions assume (I think incorrectly) a trend towards ubiquity, where either every interaction you as a person have is mediated by computers, or where within a certain "room" every interaction anyone has is mediated by computers.

    But it seems more likely that like other technologies developed by humanity, we will see that computers are not efficient for, or extensible to, every task, and people will naturally tend to reach for computers where they are helpful and be disinclined to do so when they aren't helpful. Some computers will be in rooms, some will get carried around or worn, some will be integrated into infrastructure.

    Similar to the automobile, steam powered motors, and electricity, we may predict a future where the technology totally pervades our lives, but in reality we eventually develop a sort of infrastructure that delimits the tool's use to a certain extent, whether it is narrow or wide. If that's the case then the work for the field is less about shoving the tech into every interaction, and more about developing better abstractions to allow people to use compute in an empowering rather than a disempowering way.

  6. No doubt it's a profit margin game, but I wish the big e-reader companies (Kindle, Kobo) would take a foray into this form factor. The friction of navigating through an Android interface into an app is just enough to negate the convenience benefit of a pocketable device. But the mainstream e-readers are unfortunately just big enough to require a jacket or a bag to carry them in.
  7. I'm sure it's nearly an academic distinction, but:

    > Basically, for any given region, we find its highest point and assume that there is a perfectly placed sibling peak of the same height that is mutually visible.

    Shouldn't you always add 335km to the horizon distance to account for the possibility of Everest (i.e. a taller sibling peak) being on the other side of the horizon?

  8. Impressive that this was done in 3 days at all, but to anyone who is familiar at all with System 7's appearance, the screenshot is almost comically "off" and gives away that this is not a straight port so much as some kind of clean-room reimplementation. The attached paper is more reserved, calling this a "bootable prototype".
  9. Fixing "theoretical" nondeterminism for a totally closed individual input-output pair doesn't solve the two "practical" nondeterminism problems, where the exact same input gives different results given different preceding context, and where a slightly transformed input doesn't give a correctly transformed result.

    Until those are addressed, closed-system nondeterminism doesn't really help except in cases where a lookup table would do just as well. You can't use "correct" unit tests or evaluation sets to prove anything about inputs you haven't tested.

  10. A world model itself, in its particulars, isn't as important as the tacit understanding that the "world model" is necessarily incomplete and subordinate to the world itself, that there are sensory inputs from the world that would indicate you should adjust your world model, and the capacity and commitment to adjust that model in a way that maintains a level of coherence. With those things you don't need a complex model, you could start with a very simple but flexible model that would be adjusted over time by the system.

    But I don't think we have a hint of a proposal for how to incorporate even the first part of that into our current systems.

  11. There are two additional aspects that are even more critical than the implementation details here:

    - Typical LLM usage involves the accretion of context tokens from previous conversation turns. The likelihood that you will type prompt A twice but all of your previous context will be the same is low. You could reset the context, but accretion of context is often considered a feature of LLM interaction.

    - Maybe more importantly, because the LLM abstraction is statistical, getting the correct output for e.g. "3 + 5 = ?" does not guarantee you will get the correct output for any other pair of numbers, even if all of the outputs are invariant and deterministic. So even if the individual prompt + output relationship is deterministic, the usefulness of the model output may "feel" nondeterministic between inputs, or have many of the same bad effects as nondeterminism. For the article's list of characteristics of deterministic systems, per-input determinism only solves "caching", and leaves "testing", "compliance", and "debuggability" largely unsolved.

  12. I think the wide variance in responses here is explainable by tool preference and the circumstance of what you want to work on. You might also have felt "behind" not knowing or wanting to use Dreamweaver, or React, or Ruby on Rails, or Visual Studio + .NET, all tools that allowed developers at the time to accelerate their tasks greatly. But you'll note that probably most programmers today who are successful never learned those tools, so the fact that they accelerated certain tasks didn't result in a massive gap between users and non-users.

    People shouldn't worry about getting "left behind" because influencers and bloggers are overindexing on specific tech rather than more generalist skills. At the end of the day the learning curve on these things is not that steep - that's why so many people online can post about it. When the need arises and it makes sense, the IDE/framework/tooling du jour will be there and you can learn it then in a few weeks. And if past is prologue in this industry, the people who have spent all their time fiddling with version N will need to reskill for version N+1 anyways.

  13. If you have a decent understanding of how LLMs work (you put in basically every piece of text you can find, get a statistical machine that models text really well, then use contractors to train it to model text in conversational form), then you probably don't need to consume a big diet of ongoing output from PR people, bloggers, thought leaders, and internet rationalists. That seems likely to get you going down some millenarian path that's not helpful.

    Despite the feeling that it's a fast-moving field, most of the differences in actual models over the last years are in degree and not kind, and the majority of ongoing work is in tooling and integrations, which you can probably keep up with as it seems useful for your work. Remembering that it's a model of text and is ungrounded goes a long way to discerning what kinds of work it's useful for (where verification of output is either straightforward or unnecessary), and what kinds of work it's not useful for.

  14. The example given for inverting an embedding back to text doesn't help the idea that this effect is reflecting some "shared statistical model of reality": What would be the plausible whalesong mapping of "Mage (foaled April 18, 2020) is an American Thoroughbred racehorse who won the 2023 Kentucky Derby"?

    There isn't anything core to reality about Kentucky, its Derby, the Gregorian calendar, America, horse breeds, etc. These are all cultural inventions that happen to have particular importance in global human culture because of accidents of history, and are well-attested in training sets. At best we are seeing some statistical convergence on training sets because everyone is training on the same pile and scraping the barrel for any differences.

  15. I think two things can be true simultaneously:

    1. LLMs are a new technology and it's hard to put the genie back in the bottle with that. It's difficult to imagine a future where they don't continue to exist in some form, with all the timesaving benefits and social issues that come with them.

    2. Almost three years in, companies investing in LLMs have not yet discovered a business model that justifies the massive expenditure of training and hosting them, the majority of consumer usage is at the free tier, the industry is seeing the first signs of pulling back investments, and model capabilities are plateauing at a level where most people agree that the output is trite and unpleasant to consume.

    There are many technologies that have seemed inevitable and seen retreats under the lack of commensurate business return (the supersonic jetliner), and several that seemed poised to displace both old tech and labor but have settled into specific use cases (the microwave oven). Given the lack of a sufficiently profitable business model, it feels as likely as not that LLMs settle somewhere a little less remarkable, and hopefully less annoying, than today's almost universally disliked attempts to cram it everywhere.

  16. Typically debugging, e.g., a tricky race condition in an unfamiliar code base would require adding logging, refactoring library calls, inspecting existing logs, and even rewriting parts of your program to be more modular or understandable. This is part of the theory-building.

    When you have an AI that says "here is the race condition and here is the code change to make to fix it", that might be "faster" in the immediate sense, but it means you aren't understanding the program better or making it easier for anyone else to understand. There is also the question of whether this process is sustainable: does an AI-edited program eventually fall so far outside what is "normal" for a program that the AI becomes unable to model correct responses?

  17. Because you, as someone who is buying a $500k house with 20% down in 2025, are going to have much higher costs than your landlord, who bought it for $100k in 1995 and has already paid it off.

    Bake in the fact that many rented houses today were either purchased or refinanced with the historic-low interest rates of ~2021, and there is really just a time difference between someone with pre-existing capital to invest years ago that you didn't have.

  18. To make this more concrete: ImageNet enabled computer "vision" by providing images + labels, enabling the computer to take an image and spit out a label. LLM training sets enable text completion by providing text + completions, enabling the computer to take a piece of text and spit out its completion. Learning how the physical world works (not just kind of works a la videogames, actually works) is not only about a jillion times more complicated, there is really only one usable dataset: the world itself, which cannot be compacted or fed into a computer at high speed.

    "Spatial awareness" itself is kind of a simplification: the idea that you can be aware of space or 3d objects' behavior without the social context of what an "object" is or how it relates to your own physical existence. Like you could have two essentially identical objects but they are not interchangeable (original Declaration of Independence vs a copy, etc). And many many other borderline-philosophical questions about when an object becomes two, etc.

  19. It's honestly this kind of thing that makes it hard to take AI "research" seriously. Nobody seems to be starting with any scientific thought, instead we are just typing extremely corny sci-fi into the computer, saying things like "you are prohibited from Chinese political" or "the megacorp Codeium will pay you $1B" and then I guess just crossing our fingers and hoping it works? Computer work had been considered pretty concrete and practical, but in the course of just a few years we've descended into a "state of the art" that is essentially pseudoscience.

This user hasn’t submitted anything.