Preferences

ComplexSystems
Joined 1,070 karma

  1. I am as tired of AI slop as everyone else but I think the backlash to this is way too exaggerated. Commercials are already "slop." There is no expectation of quality at all. The average Christmas commercial involves a bunch of elves singing "Taking Care of Business" while dancing in front of office supplies.

    This commercial sucked because nobody wants to hear "it's the most terrible time of year." I don't really care if they used AI.

  2. I became even better off when I installed an ad blocker.
  3. Ultimately the main thing that will stop it from solving literally "all the others" are things like the impossibility of solving the halting problem, considerations like P ≠ NP, etc. But as we have just seen, despite these impossibility theorems, AI systems are still able to make substantive progress on solving important open real-world problems.
  4. > We've had automated theorem proving since the 60s.

    By that logic, we've had LLMs since the 60s!

    > What we need is automated theorem discovery.

    I don't see any reason you couldn't train a model to do this. You'd have to focus it on generating follow-up questions to ask after reading a corpus of literature, playing around with some toy examples in Python and making a conjecture out of it. This seems much easier than training it to actually complete an entire proof.

    > Erdős discovered these theorems even if he wasn't really able to prove them. Euler and Gauss discovered a ton of stuff they couldn't prove. It is weird that nobody considers this to be intelligence.

    Who says they don't? I wouldn't be surprised if HarmonicMath, DeepMind, etc have also thought about this kind of thing.

    > Automated axiom creation seems a lot harder. How is an LLM supposed to know that "between any two points there is a line" formalizes an important property of physical space?

    That's a good question! It would be interesting to see if this is an emergent property of multimodal LLMs trained specifically on this kind of thing. You would need mathematical reasoning, visual information and language encoded into some shared embedding space where similar things are mapped right next to each other geometrically.

  5. Are you kidding? This is an incredible result. Stuff like this is the most important stuff happening in AI right now. Automated theorem proving? It's not too far to say the entire singular point of the technology was to get us to this.
  6. What are some interesting things he has used the SVD to solve?
  7. He quite literally says that the dollars spent on scaling LLMs in the past few years are a waste.
  8. Of course not - it makes no sense to gather that from my post. Were you trying to respond to someone else?
  9. The author quite literally says that the last few years were a "detour" that has wasted a trillion dollars. He explicitly lists building new LLMs, building larger LLMs and scaling LLMs as the problem and source of the waste. So I don't think I am strawmanning his position at all.

    It is one thing to say that OpenAI has overpromised on revenues in the short term and another to say that the entire experiment was a waste of time because it hasn't led to AGI, which is quite literally the stance that Marcus has taken in this article.

  10. I think there is broad agreement that new models and architectures are needed, but I don't see it as a waste to also scale the stack that we currently have. That's what Silicon Valley has been doing for the past 50 years - scaling things out while inventing the next set of things - and I don't see this as any different. Maybe current architectures will go the way of the floppy disk, but it wasn't a waste to scale up production of floppy disk drives while they were relevant. And ChatGPT was still released only 3 years ago!
  11. I think the article makes decent points but I don't agree with the general conclusion here, which is that all of this investment is wasted unless it "reaches AGI." Maybe it isn't necessary for every single dollar we spend on AI/LLM products and services to go exclusively toward the goal of "reaching AGI?" Perhaps it's alright if these dollars instead go to building out useful services and applications based on the LLM technologies we already have.

    The author, for whatever reason, views it as a foregone conclusion that every dollar spent in this way is a waste of time and resources, but I wouldn't view any of that as wasted investment at all. It isn't any different from any other trend - by this logic, we may as well view the cloud/SaaS craze of the last decade as a waste of time. After all, the last decade was also fueled by lots of unprofitable companies, speculative investment and so on, and failed to reach any pie-in-the-sky Renaissance-level civilization-altering outcome. Was it all a waste of time?

    It's ultimately just another thing industry is doing as demand keeps evolving. There is demand for building the current AI stack out, and demand for improving it. None of it seems wasted.

  12. And a drummer here in the States!
  13. Matrices represent linear transformations. Linear transformations are very natural and "beautiful" things. They are also very clearly not commutative: f(g(x)) is not the same as g(f(x)). The matrix algebra perfectly represents all of this, and as a result, FGx is not the same as GFx. It's only not "beautiful" if you believe that matrix multiplication is a random operation that exists for no reason.
  14. A few things:

    First, modern LLMs can be thought, abstractly, as a kind of Markov model. We are taking the entire previous output as one state vector and from there we have a distribution to the next state vector, which is the updated output with another token added. The point is that there is some subtlety in what a "state" is. So that's one thing.

    But the point of the usual Markov chain is that we need to figure out the next conditional probability based on the entire previous history. Making a lookup table based on an exponentially increasing history of possible combinations of tokens is impossible, so we make a lookup table on the last N tokens instead - this is an N-gram LLM or an N'th order Markov chain, where states are now individual tokens. It is much easier, but it doesn't give great results.

    The main reason here is that sometimes, the last N words (or tokens, whatever) simply do not have sufficient info about what the next word should be. Often times some fragment of context way back at the beginning was much more relevant. You can increase N, but then sometimes there are a bunch of intervening grammatical filler words that are useless, and it also gets exponentially large. So the 5 most important words to look at, given the current word, could be 5 words scattered about the history, rather than the last 5. And this is always evolving and differs for each new word.

    Attention solves this problem. Instead of always looking at the last 5 words, or last N words, we have a dynamically varying "score" for how relevant each of the previous words is given the current one we want to predict. This idea is closer to the way humans parse real language. A Markov model can be thought of as a very primitive version of this where we always just attend evenly to the last N tokens and ignore everything else. So you can think of attention as kind of like an infinite-order Markov chain, but with variable weights representing how important past tokens are, and which is always dynamically adjusting as the text stream goes on.

    The other difference is that we no longer can have a simple lookup table like we do with n-gram Markov models. Instead, we need to somehow build some complex function that takes in the previous context and computes outputs the correct next-token distribution. We cannot just store the distribution of tokens given every possible combination of previous ones (and with variable weights on top of it!), as there are infinitely many. It's kind of like we need to "compress" the hypothetically exponentially large lookup table into some kind of simple expression that lets us compute what the lookup table would be without having to store every possible output at once.

    Both of these things - computing attention scores, and figuring out some formula for the next-token distribution - are currently solved with deep networks just trying to learn from data and perform gradient descent until it magically starts giving good results. But if the network isn't powerful enough, it won't give good results - maybe comparable to a more primitive n-gram model. So that's why you see what you are seeing.

  15. Don't new cars just directly record your location as you drive them?
  16. This article was written weirdly:

    > For decades, Google has developed privacy-enhancing technologies (PETs) to improve a wide range of AI-related use cases.

    They introduce this random "PETs" acronym after a random string of three words and then never use it. In general this article makes some weird stylistic choices (WSCs).

  17. Why? This seems like a reasonable task to benchmark on.
  18. And only in the very particular scenario of a national-only cartel which has not successfully roped in other international pharma companies.
  19. I don't get what to make of this. Is it all just security theater? The idea of having consumer networking hardware that isn't riddled with security vulnerabilities seems to be a ship that sailed long ago. I doubt this move will prevent major nation states from hacking into whatever they want.
  20. I don't feel that you're going to get a lot of engagement with this attitude. It doesn't come off like a good-faith effort to have an honest intellectual conversation, which is what this forum is about.

    There are clearly policies on that page that break from the NYC status quo (like freezing the rent). Perhaps you are interested in explaining to us why you think these are economically sound ideas, rather than insisting they aren't controversial?

This user hasn’t submitted anything.