> What should or shouldn’t be a wh-island" is literally a statement of "what words might come after some other words"!
This gets at the nub of the misunderstanding. Chomsky is interested in modeling the range of grammatical structures and associated interpretations possible in natural languages. The wh-island condition is a universal structural constraint that only indirectly (and only sometimes) has implications for which sequences of words are ‘valid’ in a particular language.
LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.
If you want a more concrete example of why wh-islands can’t be understood in terms of permissible or impermissible sequences of words, consider cases like
How often did you ask why John took out the trash?
The wh-island created by ‘why’ removes one of the in-principle possible interpretations (the embedded question reading where ‘how often’ associates with ‘took’), but the sequence of words is fine.
> Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.
No, Chomsky really was working on a different task: a solution to the logical problem of language acquisition and a theory of the range of possible grammatical variation across human languages. There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text. From a cognitive point of view, text generation rather obviously involves the contribution of many non-linguistic cognitive systems which are not modeled (nor intended to be modeled) by a generative grammar.
>the paper linked below
This paper doesn’t make any claims that are obviously incompatible with anything that Chomsky has said. The fundamental finding is unsurprising: brains are sensitive to surprisal. The better your language model is at modeling whether or not a sequence of words is likely, the better you can predict the brain’s surprisal reactions. There are no implications for cognitive architecture. This ought to be clear from that fact that a number of different neural net architectures are able to achieve a good degree of success, according to the paper’s own lights.
This gets at the nub of the misunderstanding. Chomsky is interested in modeling the range of grammatical structures and associated interpretations possible in natural languages. The wh-island condition is a universal structural constraint that only indirectly (and only sometimes) has implications for which sequences of words are ‘valid’ in a particular language.
LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.
If you want a more concrete example of why wh-islands can’t be understood in terms of permissible or impermissible sequences of words, consider cases like
How often did you ask why John took out the trash?
The wh-island created by ‘why’ removes one of the in-principle possible interpretations (the embedded question reading where ‘how often’ associates with ‘took’), but the sequence of words is fine.
> Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.
No, Chomsky really was working on a different task: a solution to the logical problem of language acquisition and a theory of the range of possible grammatical variation across human languages. There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text. From a cognitive point of view, text generation rather obviously involves the contribution of many non-linguistic cognitive systems which are not modeled (nor intended to be modeled) by a generative grammar.
>the paper linked below
This paper doesn’t make any claims that are obviously incompatible with anything that Chomsky has said. The fundamental finding is unsurprising: brains are sensitive to surprisal. The better your language model is at modeling whether or not a sequence of words is likely, the better you can predict the brain’s surprisal reactions. There are no implications for cognitive architecture. This ought to be clear from that fact that a number of different neural net architectures are able to achieve a good degree of success, according to the paper’s own lights.