I've asked it to prove the Riemann hypothesis, GPT-4 replied it can't, since none of the training data say anything about such proof.
Then I asked it to prove statements of the implications of said hypothesis (I tried 2 or 3, don't remember which), without calling them by name or mentioning Riemann, and it confidently proved them in a way that it reads like it knows what its doing, though it really doesn't. It just takes bits and pieces from proofs in the same field of mathematics that kinda hold together language-wise.
And I suppose this shouldn't be too surprising considering even logic itself runs into godelian issues. For any consistent logic there exists some statement which is true, but which does not have a proof. Which feels vaguely similar to "for any functional LLM there exists some truth which defies the next statistically likely sequence of tokens".
This is true. The base pretrained model is even technically a Markov chain. As they have been getting more and more powerful with more parameters and compute and dataset sizes they've been unlocking more and more powerful cognitive capabilities.
> "I've asked it to prove the Riemann hypothesis, GPT-4 replied it can't, since none of the training data say anything about such proof."
This is some alignment/docility/lobotomy that the RLHF did. The raw base model would never say it. This is part of their No Tay No Syndey push.
> "Then I asked it to prove statements of the implications of said hypothesis (I tried 2 or 3, don't remember which), without calling them by name or mentioning Riemann, and it confidently proved them in a way that it reads like it knows what its doing, though it really doesn't. It just takes bits and pieces from proofs in the same field of mathematics that kinda hold together language-wise."
Yes it's true. Even though these LLMs have unlocked so many cognitive capabilities, they are still not yet powerful enough to prove the Riemann hypothesis. They aren't quite yet to the level of scientists who can advance the state of the art in their field. Some will say it's an inherent limitation, but I disagree, for complicated reasons that are easy to make superficial and pithy arguments against.
While LLMs can simulate human-like behavior and decision-making based on algorithms trained on large data sets, they lack awareness or understanding of their actions. They rely on pre-programmed rules or statistical correlations rather than intuition or insight derived from personal experience.
Therefore, even though LLMs may seem intelligent in certain contexts like playing games or recognizing objects in images with high accuracy rates- they cannot replicate human cognition since there is no consciousness behind those decisions made by the systems.
Your statements make equal sense if you turn it around and claim LLMs have this but we do not: "these processes are rooted in computational structures such as neural networks, and data that allow LLMs to learn from experience. While humans can simulate LLM-like behavior and decision-making based on information from past experiences, they lack awareness or understanding of their actions."
All we can do is measure real world problem solving abilities, the ability to come up with correct solutions to difficult and novel problems. LLMs can do this well, in some cases exceeding human abilities, and there is no reason to think there is any fundamental limitation that will prevent them from exceeding human abilities in more areas the future.
First, it assumes that consciousness and self-awareness are merely subjective inner experiences that can't be externally validated. While it's true that these processes cannot be directly observed or measured like other physical phenomena, there is ample evidence to suggest that they are real and have important functional roles in the brain. For example, studies have shown that damage to certain areas of the brain can result in a loss of self-awareness or an altered sense of consciousness. Additionally, research has demonstrated correlations between neural activity patterns and conscious experience.
Secondly, the argument suggests that there is no reason to think humans actually possess consciousness or self-awareness. This assertion runs counter to our everyday experiences and perceptions as human beings. We all have first-hand knowledge of our own subjective inner world - our thoughts, emotions, sensations - which strongly suggests the existence of some form of conscious awareness.
I've had success getting it to evaluate FizzBuzz, but to do so I told it to write out the state of the program one timestep at a time.
https://chat.openai.com/share/c109e946-fb6d-494e-8182-fc93d2...
...this is actually 3.5. 4 wouldn't need as much explanation.
That's how AI research works, you do that first, then run the tests and see the score. If it works, then you publish. There is no real understanding anywhere in this process.
Here's an example while learning linear algebra where I tried to throw it off:
I say> So I'll say A_{M x N} to mean a matrix of size M x N. Similarly, I'll also say x_{M x 1} to mean a vector of a certain length. The result of A_{M x N} x_{N x 1} is a linear combination of the columns in A. A has N columns. The result is a M x 1 matrix (or vector) in M dimensional space. Please be very brief, only confirming whether I am correct or incorrect until I explicitly ask for more details.
GPT4 says> Yes, your understanding is correct. The product of an M x N matrix A and an N x 1 vector x is indeed an M x 1 vector, and it is a linear combination of the columns of A. <end quote>
But then I try to trick it, just for fun, and I very subtly change my prompt. See if you can spot the difference.
I say> So I'll say A_{M x N} to mean a matrix of size M x N. Similarly, I'll also say x_{M x 1} to mean a vector of a certain length. The result of A_{M x N} x_{N x 1} is a linear combination of the columns in A. A has N columns. The result is a N x 1 matrix (or vector) in N dimensional space. Please be very brief, only confirming whether I am correct or incorrect until I explicitly ask for more details.
GPT says> Almost correct. The result of A_{M x N} x_{N x 1} is actually an M x 1 vector (or matrix), not N x 1. The resulting vector lives in the column space of A, which is a subspace of R^M, not R^N. <end quote>
I mentioned elsewhere that I use GPT4 as a math tutor. It is quick to call out my mistakes. It is more than a smiling mirror.
> "Faith and Fate: Limits of Transformers on Compositionality" - We investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that Transformers solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills.
Or recognize that you're not confident in the answer and throw something else and see if it sticks.
These models seem to work similarly with code. If you ask them to recreate something that appears in hundreds of GitHub repositories, no problem, a little bit of rubber-ducky debugging and you'll have a working solution. If you ask it for something novel, without many publicly-available implementations, the quality of its output appears to drop off a cliff.
We still don't seem to have reached the point where it's "intelligent" enough to generate new information. Only repeat new permutations of existing information. Obviously, this is currently a fundamental property of LLMs (as someone who's implemented one), since they are trained to learn and repeat existing data (hell, even the model I've been playing with lately is benchmarked solely on "perplexity", how well it can repeat its training data).
that makes sense, it's going to be best at the kind of code you find out there on blogs and other sources. Boilerplate or code to demonstrate a specific concept or answer a question.
and it answered:
Yes, that is correct for appropriate matrix sizes. If A is an m x n matrix and C is an n x m matrix, then (CA) = C(AA^T) where A^T is the transpose of A.
This comes from the property that for any matrices A, B and C of appropriate sizes:
(CA)B = C(AB)
Applying that to your example, with B = A^T, we have: (CA)(A^T) = C(A(A^T)^T)
= C(AA^T)
So in short, yes your statement (A) = C(AA^T) is true for matrices A and C of the appropriate dimensions.Then I asked it: "in linear algebra, is the following true? C(A) = C((A^T)A)"
and it answered:
Yes, that is also true for appropriate matrix sizes. This follows from a similar argument as before.
If A is an n x m matrix and C is an m x n matrix, then:
C(A) = C(A^T(A^T)^T)
= C((A^T)A)
So the statement C(A) = C((A^T)A) is true for matrices A and C of the appropriate dimensions.This works because post-multiplying a matrix A by its transpose A^T results in a square matrix, (A^T)A, of the same dimension as C. So C can be multiplied with that resulting square matrix.
In summary, both of the statements you provided are true linear algebra identities for appropriately sized matrices:
(A) = C(AA^T)
C(A) = C((A^T)A)
Hope this explanation helps! Let me know if you have any other questions.I've relied on GPT4 a lot while learning math, it has been very precise and has not lead me astray. I do use a text book as well.
I shared my story about it being wrong, but it was wrong in a very open minded manner. It knew C(A) = C(AA^T) might be true for special cases, but it couldn't see that it was true for all cases until I explained the proof. Even when wrong it still made compelling arguments that made sense. (Of course, arguments that are wrong but make sense are the most dangerous kind.)
GPT4 is still state-of-the-art. Claude has a ways to go.
But also the following math/logic is nonsense.
You might be able to get ChatGPT to agree that 1 + 1 = 3 if the temperature were high enough, but I don't know whether even a temperature of 1 would allow such a low probability sequence of tokens to be produced.
As someone who just implemented an LLM token sampler: `probabilities[token] = logits[token] ^ (1 / temperature)`.
I speak of temperature specifically in the context of top-p/top-k sampling.
See this Reddit comment for confirmation that my definition is a commonly accepted one: https://old.reddit.com/r/GPT3/comments/qujerp/comment/hkqoqx...
> Temperature defines how likely it is to choose less probable words. T=0 gives the same response every time because there's a 0% chance to choose any word but the most likely. T=1 is the default, it just picks based on the model's base confidence. T>1 gives more weight to unlikely words than to likely ones.
Bonus content:
> This means that a reasonably low p, like 0.8, and high temp will produce quite interesting outputs, because the model will only choose from the most likely words, but won't go for the most most likely. It's perfect for "creative" models, e.g., for writing fiction.
For example, if you instruct a child how to play rock-paper-scissors, you will instruct them to act randomly and unpredictably.
It is the same with a language model, the optimal solution involves some randomness. A temperature less than 1 will "widen the gap", so if the word choice probabilities are [0.4, 0.6], they might widen to [0.2, 0.8] with a temperature less than zero. When temperature equals zero, then [0.0, 1.0], the model will always choose the most likely word and thus become deterministic, always giving the same output.
The probability distribution gets closer to the output of the model as the temperature goes to one, and as you increase it further you tend to a uniform probability (ignoring completely the output of the model).
full stop, FTFY
Early LLMs were very malleable, so to speak: they would go with the flow of what you're saying. But this also meant you could get them to deny climate change or advocate for genocide by subtly nudging them with prompts. A lot of RLHF work focused on getting them to give brand-safe, socially acceptable answers, and this is ultimately achieved by not giving credence to what the user is saying. In effect, the models pontificate instead of conversing, and will "stand their ground" on most of the claims they're making, no matter if right or wrong.
You can still get them to do 180 turns or say outrageous things using indirect techniques, such as presenting external evidence. That evidence can be wrong / bogus, it just shouldn't be phrased as your opinion. You can cite made-up papers by noted experts in the field, reference invalid mathematical proofs, etc.
It's quite likely that you replicated this, and that it worked randomly in one case but not the other. I'd urge you to experiment with it by providing it with patently incorrect but plausibly-sounding proofs, scientific references, etc. It will "change its mind" to say what you want it to say more often than not.
There needs to be a balance between standing your ground and being malleable. This is true in life for people, and it's true for a good LLM. I think GPT4 (the only LLM I've used much) finds a good balance here.
As mentioned in my other comment, it wouldn't be useful to me if it didn't push back. It pushes back a lot, and I'm always looking for subtle tricks I can throw at it to test its abilities. It does well I think.
(I don’t know linear algebra enough to know if the statement C(A) = C(AA^T)is true over any kind of field that constructs the span. Maybe that’s why GPT4 becomes unsure))
I think you need A to be a square matrix since AA^T is a square matrix. So the result is no true in general.
The proof is (briefly): A and AA^T have the same rank. AA^T is a linear transformation of the columns in A and spans the same number of dimensions and is thus the same subspace. It applies to non-square matrices as well.
Why?
My interpretation of someone trying to gaslight a machine is the same as if someone said they were trying to gaslight a lawn mower.
There’s plenty of literature, prepublication or otherwise, that can help you achieve your goals!
The concept of gaslighting a brainless language parrot (GPT-4) is funny to me. I get where they were coming from.
GPT-4 is like a lawnmower for the mind: sharp, automatic, efficient, and doesn't do anything unless pushed. They were just saying they like pushing GPT-4 around.
I've tried before to gaslight GPT4 into saying things which are mathematically untrue, I lie to it, I tell it it's malfunctioning, I tell it to just do it, it wouldn't do it.
I was recently studying linear algebra which can be a very tricky subject. In linear algebra the column space of a matrix is the same as the column space of the product with itself transposed: C(A) = C(AA^T). If you ask GPT4 if "C(A) = C(AA^T)" is true, it will understand what you're asking, it knows it's about linear algebra, but it will get it wrong (at the time of this writing, I've tried several times).
I couldn't get GPT4 to agree it was a true statement until I told it the steps of the proof. Once it saw the proof it agreed it was a true statement. However, if you try to apply the same proof to C(A) = C((A^T)A), GPT4 cannot be tricked, and indeed, the proof is not applicable to this latter case.
So GPT4 was incorrect yet able to be persuaded with a correct proof, but a very similar proof with a subtle mistake cannot trick it.