- Their problem is with false positives they find, not true positives you find. My application for a credit card was somehow flagged as fraudulent. Chase repeatedly asked for additional forms of ID, then told me the scans I sent were illegible. (The scans were fine; I think they just needed an excuse.) I went to a branch with the physical documents, and they said they couldn't look at them. The branch put me in an office and called the same telephone support, with the same result. I eventually gave up.
I guess I'm lucky they rejected me before any money changed hands. I've heard horror stories from people with significant assets at their bank, locked out until an actual lawsuit (the letter from a lawyer didn't work) finally got their attention. I think it's like Google support, usually fine but catastrophic when it's not.
- I think those optoisolators are indeed sold mostly for switching power supplies. That's probably why someone cared enough about aging to write an app note, since the ambient temperature is high there and the exact CTR matters more when it's in that analog feedback loop. I've also seen them for digital inputs in industrial control systems, where speeds are slow and the wires might be coming from far away on a noisy ground.
That said, I believe optical isolation is typical for these "data diode" applications, even between two computers in the same rack. I don't think it provides any security benefit, but it's cheap and customers expect it; so there's no commercial incentive to do anything else.
- The "RS-232" part is important here, since directly connecting the UART pins for the two MCUs without the RS-232 level shifters may trivially permit bidirectional dataflow, for example by reconfiguring the pins to GPIO and bit-banging a UART in the reverse direction, as already noted below. That wouldn't be directly exploitable (since you'd need to somehow bootstrap that reconfiguration in), but it would widen the attack surface.
If the cable wires control signals like DTR and RTS, then you'd need to cut those too. The goal in any case is one wire (plus ground) out of the transmitter and one wire into the receiver, with something in between that enforces data flow in only one direction. An optoisolator can do that, but a buffer without galvanic isolation (like the RS-232 level shifters) can do that too.
- > over a few years, the LED gets dimmer and dimmer
That shouldn't happen unless the LED is driven near the top of its current rating, which shouldn't be necessary unless you're pushing the limits of its rise/fall times (in which case a different part would be advisable as you say).
A random app note shows 95% of initial current transfer ratio after 25 years at If = 5 mA, and depending on the necessary bit rate we could probably design for at least 2x initial margin on that CTR. Such a design would last effectively forever.
https://www.we-online.com/catalog/media/o303314v410%20ANO006...
I think the galvanic isolation is mostly a feelgood here, allowing people to say it's "air-gapped" even though that's not directly relevant (since Wi-Fi is also "air-gapped"). A simple gate or level shifter can also enforce unidirectional data flow as you say.
- In many industrial applications, the concern is mostly control of the isolated side, like because that could physically destroy stuff. Exfiltration is a smaller or nonexistent concern, since you're already sending most data out deliberately.
So there's still an attack surface, but it's a lot smaller. Any side channel exploit would need to work (at least in some initial form) without changes to the software on the isolated side, since you otherwise can't bootstrap your way to installing it.
- I'm not aware of any evidence that he was using testosterone enanthate (or any other particular steroid), though he certainly looked like he was using something.
Those are already controlled substances, though. His drug dealer is presumably aware of that, and the threat of a lawsuit doesn't add much to the existing threat of prison. OpenAI's conduct is untested in court, so that's the new and notable question.
- > in the context of a civil law dispute
The squatters are very frequently committing criminal fraud, by showing a fake lease to the police to portray themselves as legitimate tenants. Leases aren't recorded like deeds, and landlords' signatures often appear in public records. So it's easy to make a good enough fake that the police will take the squatters' side. I don't know why this article doesn't mention that, but a web search ("fake lease squatter") will show this is routine.
The squatters don't expect to win their dispute in court, just to take advantage of the extended time to trial. Oakland's eviction moratorium lasted for literally years, and they're still working through the backlog. When the case finally reaches court, the squatters will get evicted but the fraud is almost never charged. So from the squatter's perspective, it makes sense to fake the lease.
From a small landlord's perspective, the tradeoff may thus be certain financial ruin waiting for the judicial process vs. a slight chance of ruin if the "nightmare cotentant" approach goes wrong. So it's no surprise the sword guy has business. The risk that his services would be used against a real tenant is partially mitigated by the risk that that tenant would sue. The fake tenants prefer to stay out of court, since the judge (and opposing counsel) will look more carefully at their fake lease than the police did.
Georgia recently created an accelerated judicial review for cases where the landlord is alleging that the lease is fraudulent, separate from default on a non-fraudulent lease. That seems like the right approach to me.
- Yeah. I think the additional trick is that squatters often have a fraudulent lease. That makes it owner vs. tenant, and the police have orders to err on the side of not facilitating an illegal eviction. The owner could attempt to owner-occupy the property, but there's no document for that and there is a lease. So when the police show up, the owner is very likely to be the one removed or arrested.
The sword guy makes it tenant vs. tenant, so neither party has that formal advantage. Of course the police know the game, but they're generally happy with the workaround.
- I spent some time looking for sources for the various "railroad investment as % of GDP" numbers floating around, and I don't think they're very good. The modern concept of GDP didn't even exist back then, so the denominator is calculated in retrospect from the limited contemporary data. The numerator is more confident, but the papers I found mostly showed closer to 3%. A pretty wide range is at least defensible though, and I guess VCs are comparing against the high end for obvious reasons.
https://www.hackerneue.com/item?id=44805979
This AI investment is interesting because it's mostly not in durable goods, unlike the railroad's rails and (most importantly) land. The buildings and power infrastructure for the datacenters could retain value for decades, but the servers won't unless something goes badly wrong. I believe this is the largest investment in human history justified primarily by the anticipated value of intellectual property.
- > Sure, Chomsky's work doesn't have practical applications. Most scientific work doesn't.
> Has geology accomplished something considered difficult outside of geology?
Ask an oilfield services company? A structural engineer who needs a foundation? If that work were easy, then their geologists wouldn't get paid.
I could have just said "economically important", but that seemed too limiting to me. For example, computer-aided proofs were a controversial subfield of math, but I'd take their success on the four-color theorem (which came from outside their subfield and had resisted proof by other means) as evidence of their value, despite the lack of practical application for the result. I think that broader kind of success could justify further investment, but I also don't see that here.
> As a former syntactician who's constructed lots of theories that turned out to be false
I should clarify that I do see a concept of falsifiability at that level, of whether a grammar fits a set of examples of a language. That seems pretty close to math or CS to me. I don't see how that small number of examples is supposed to scale to an entire natural language or to anything about the human brain's capability for language, and I don't see any falsifiable attempt to make that connection. (I don't see much progress towards the loftiest goals from the statistical approach either, but their spectacular engineering results break that tie for me.)
Anyways, Merry Christmas if you're celebrating. I guess we're unlikely to be the ones to settle this dispute, but I appreciate the insight into the worldview.
- > So, essentially, you have decided not to engage with Chomsky’s work. That is a perfectly legitimate thing to do, but it does mean that you cannot make informed criticisms of it.
Any criticism that I'd make of homeopathy would be uninformed by the standards of a homeopath--I don't know which poison to use, or how many times to strike the bottle while I'm diluting it, or whatever else they think is important. But to their credit they're often willing to put their ideas to the external test (like with an RCT), and I know that evidence in aggregate shows no benefit. I'm therefore comfortable criticizing homeopathy despite my unfamiliarity with its internals.
I don't claim any qualifications to criticize the internals of Chomsky's linguistics, but I do feel qualified to observe the whole thing appears to be externally useless. It seems to reject the idea of falsifiable predictions entirely, and if one does get made and then falsified then "the implications for generative linguistics are pretty minor". After dominating academic linguistics for fifty years, it has never accomplished anything considered difficult outside the newly-created field. So why is this a place where society should expend more of its finite resources?
Hardy wrote his "Mathematician's Apology" to answer the corresponding question for his more ancient field, explicitly acknowledging the uselessness of many subfields but still defending them. He did that with a certain unease though, and his promises of uselessness also turned out to be mistaken--he repeatedly took number theory as his example, not knowing that in thirty years it would underly modern cryptography. Chomsky's linguists seem to me like the opposite of that, shouting down anyone who questions them (he called Everett a "charlatan") while proudly delivering nothing to the society funding their work. So why would I want to join them?
- > It's very unlikely that Everett's key claims about Pirahã are true
Everett achieved something unequivocally difficult--after twenty years of failed attempts by other missionaries, he was the first Westerner to learn Pirahã, living among the people and conversing with them in their language. In my view, that gives him significantly greater credibility than academics with no practical exposure to the language (and I assume you're aware of his response to the paper you linked).
I understand that to Chomsky's followers, Everett's achievement is meaningless, in the same way that LLMs saturating almost every prior benchmark in NLP is meaningless. But what achievements outside the "self-referential parlor game" are meaningful then? You must need something to ground yourself in outside reality, right?
> Then when we finally get to see the concrete alternative proposal, it turns out to be nothing more than a promissory note.
I'm certainly not claiming that statistical modeling has already achieved any significant insight into how physical structures in the brain map to an ability to generate language, and I don't think anyone else is either. We're just speculating that it might in future.
That seems a lot less grandiose to me than anything Chomsky has promised. In the present, that statistical modeling has delivered some pretty significant, strictly falsifiable, different but related achievements. Again, what does Chomsky's side have?
> I don't see how we can discuss this question without getting into specifics, so let me try to push things in that direction. Here is a famous syntax paper by Chomsky: https://babel.ucsc.edu/~hank/On_WH-Movement.pdf
And when I asked that before, you linked a sixty-page paper, with no further indication ("various things"?) of what you want to talk about. If you're trying to argue that Chomsky's theories are anything but a tarpit for a certain kind of intellectual curiosity, then I don't think that's helping.
- > Andrew Wiles has not produced any result evaluable by me or by almost anyone else.
Fermat wrote the theorem in the margin long before Wiles was born. There is no question that many people tried and failed to prove it. There is no question that Wiles succeeded, because the skill required to verify a proof is much less than the skill required to generate it. I haven't done so myself; but lots of other people have, and there is no dispute by any skilled person that his proof is correct. So I believe that Wiles has accomplished something significant.
I don't think Chomsky has any similar accomplishment. I roughly understand the grandiose final goal; I just see no evidence that he has made any progress towards it. Everything that I'd see as an interesting intermediate goal is dismissed as out of scope, especially when others achieve it. On the rare occasion that Chomsky has made externally intelligible predictions on the range of human language, they've been falsified anthropologically. I assume you followed the dispute on Pirahã, which I believe clarified that features like recursion were in fact optional, rendering the theory safely non-falsifiable again.
So what's his progress? Everything that I see turns inward, valuable only within the framework that he himself constructed. Anyone can build such a framework, so that's not an accomplishment. Convincing others to spend years of their lives on that framework is a sort of an achievement, but it's not a scientific one--homeopathy has many practitioners.
> I expect anyone learning Japanese as a second language will get a chuckle out of this one. It’s in fact a common scenario.
I think this view is just as wrong applied to a human as to a model. A beginning language student probably knows a lot more grammar rules than a native speaker, but their inability to converse doesn't come from their inability to quickly apply them. It comes from the fact that those rules capture only a small amount of the structure of natural language. You seem to acknowledge this yourself--if nothing Chomsky is working on would help a machine generate language, then it wouldn't help a human either. This also explains my teachers' usual advice to stop studying and converse as best I could, watch movies, etc.
Humans clearly learn language in a more structured way than LLMs do (since they don't need trillions of tokens), but they learn primarily from exposure, with partial structure but many exceptions. I don't think that's surprising, since most other things "designed" in an evolutionary manner have that same messy form. LLMs have succeeded spectacularly in modeling that, taking the usual definition in ML or other math for "modeling".
It's thus strange to me to see them dismissed as a source of insight into natural language. I guess most experts in LLMs are busy becoming billionaires right now; but if anything resembling Chomsky's universal grammar ever does get found to exist, then I'd guess it will be extracted computationally from models trained on corpora of different languages and not any human insight, in the same way that the Big Five personality traits fall out of a PCA.
- > LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.
The human-designed architecture of an LLM makes no such prediction; but after training, the overall system including the learned weights absolutely does, or else it couldn't generate valid language. If you'd prefer to run in the opposite direction, then you can feed in sentences with correct and incorrect wh-movement, and you'll find the incorrect ones are much less probable.
That prediction is commingled with billions of other predictions, which collectively model natural language better than any machine ever constructed before. It seems like you're discounting it because it wasn't made by and can't be understood by an unaided human; but it's not like the physicists at the LHC are analyzing with paper and pencil, right?
> There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text.
Imagine that claim in human form--I'm an expert in the structure of the Japanese language, but I'm unable to hold a basic conversation. Would you not feel some doubt? So why aren't you doubting the model here? Of course it would have been outlandish to expect that of a model five years ago, but it isn't today.
I see your statement that Chomsky isn't attempting to model the "many non-linguistic cognitive systems", but those don't seem to cause the LLM any trouble. The statistical modelers have solved problem after problem that was previously considered impossible, and the practical applications of that are (for better or mostly worse) reshaping major aspects of society. Meanwhile, every conversation I've had with a Chomsky supporter seems to reduce to "he is deliberately choosing not to produce any result evaluable by a person who hasn't spent years studying his theories". I guess that's true, but that mostly just makes me regret what time I've already spent.
- "What should or shouldn’t be a wh-island" is literally a statement of "what words might come after some other words"! An LLM encodes billions of such statements, just unfortunately in a quantity and form that makes them incomprehensible to an unaided human. That part is strictly worse; but the LLM's statements model language well enough to generate it, and that part is strictly better.
As I read Norvig's essay, it's about that tradeoff, of whether a simple and comprehensible but inaccurate model shows more promise than a model that's incomprehensible except in statistical terms with the aid of a computer, but far more accurate. I understand there's a large group of people who think Norvig is wrong or incoherent; but when those people have no accomplishments except within the framework they themselves have constructed, what am I supposed to think?
Beyond that, if I have a model that tells me whether a sentence is valid, then I can always try different words until I find one that makes it valid. Any sufficiently good model is thus capable of generation. Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.
As to the relationship between signals from biological neurons and ANN activations, I mean something like the paper linked below, whose authors write:
> Thus, even though the goal of contemporary AI is to improve model performance and not necessarily to build models of brain processing, this endeavor appears to be rapidly converging on architectures that might capture key aspects of language processing in the human mind and brain.
https://www.biorxiv.org/content/10.1101/2020.06.26.174482v3....
I emphasize again that I believe these results have been oversold in the popular press, but the idea that an ANN trained on brain output (including written language) might provide insight into the physical, causal structure of the brain is pretty mainstream now.
- I could iterate with an LLM and Lean, and generate an unlimited amount of logic (or any other kind of math). This math would be correct, but it would almost surely be useless. For this reason, neither computer programs nor grad students are rewarded simply for generating logically correct math. They're instead expected to prove a theorem that other people have tried and failed to prove, or perhaps to make a conjecture with a form not obvious to others. The former is clearly an achievement, and the latter is a falsifiable prediction.
I feel like Norvig is coming from that standpoint of solving problems well-known to be difficult. This has the benefit that it's relatively easy to reach consensus on what's difficult--you can't claim something's easy if you can't do it, and you can't claim it's hard if someone else can. This makes it harder to waste your life on an internally consistent but useless sidetrack, as you might even agree (?) Chomsky has.
You, Chomsky, and Pearl seem to reject that worldview, instead believing the path to an important truth lies entirely within your and your collaborators' own minds. I believe that's consistent with the ancient philosophers. Such beliefs seem to me halfway to religious faith, accepting external feedback on logical consistency, but rejecting external evidence on the utility of the path. That doesn't make them necessarily bad--lots of people have done things I consider good in service of religions I don't believe in--but it makes them pretty hard to argue with.
- The generation of natural language is an aspect of human cognition, and I'm not aware of any better model for that than current statistical LLMs. The papers mapping between EEG/fMRI/etc. and LLM activations have been generally oversold so far, but it's active area of research for good reason.
I'm not saying LLMs are a particularly good model, just that everything else is currently worse. This includes Chomsky's formal grammars, which fail to capture the ways humans actually use language per Norvig's many examples. Do you disagree? If so, what model is better and why?
- Unless and until neurologists find evidence of a universal grammar unit (or a biological Transformer, or whatever else) in the human connectome, I don't see how any of these models can be argued to be "causal" in the sense that they map closely to what's physically happening in the brain. That question seems so far beyond current human knowledge that any attempt at it now has about as much value as the ancient Greek philosophers' ideas on the subatomic structure of matter.
So in the meantime, Norvig et al. have built statistical models that can do stuff like predicting whether a given sequence of words is a valid English sentence. I can invent hundreds of novel sentences and run their model, checking each time whether their prediction agrees with my human judgement. If it doesn't, then their prediction has been falsified; but these models turned out to be quite accurate. That seems to me like clear evidence of some kind of progress.
You seem unimpressed with that work. So what do you think is better, and what falsifiable predictions has it made? If it doesn't make falsifiable predictions, then what makes you think it has value?
I feel like there's a significant contingent of quasi-scientists that have somehow managed to excuse their work from any objective metric by which to evaluate it. I believe that both Chomsky and Judea Pearl are among them. I don't think every human endeavor needs to make falsifiable predictions; but without that feedback, it's much easier to become untethered from any useful concept of reality.
- I agree that his contributions to proto-computer-science were real and significant, though I think they're also overstated. Note the link to the Wikipedia page for BNF elsewhere in these comments. There's no evidence that Backus or Naur were aware of Chomsky's ideas vs. simply reinventing them, and Knuth argues that an ancient Indian Sanskrit grammarian deserves priority anyways.
I think Chomsky's political views were pretty terrible, especially before 1990. He spoke favorably of the Khmer Rouge. He dismissed "Murder of a Gentle Land", one of the first Western reports of their mass killing, as a "third rate propaganda tract". As the killing became impossible to completely deny, he downplayed its scale. Concern for human rights in distant lands tends to be a left-leaning concept in the West, but Chomsky's influence neutralized that here. This contributed significantly to the West's indifference, and the killing continued. (The Vietnamese communists ultimately stopped it.)
Anyone who thinks Chomsky had good political ideas should read the opinions of Westerners in Cambodia during that time. I'm not saying he didn't have other good ideas; but how many good ideas does it take to offset 1.5-2M deaths?
- Norvig's textbook surely appears on the bookshelf of researchers including those building current top LLMs. So it's odd to say that such an approach "may not even provide a good predictive model". As of today, it is unquestionably the best known predictive model for natural language, by huge margin. I don't think that's for lack of trying, with billions of dollars or more at stake.
Whether that model provides "insight" (or a "cause"; I still don't know if that's supposed to mean something different) is a deeper question, and e.g. the topic of countless papers trying to make sense of LLM activations. I don't think the answer is obvious, but I found Norvig's discussion to be thoughtful. I'm surprised to see it viewed so negatively here, dismissed with no engagement with his specific arguments and examples.
- Shannon first proposed Markov processes to generate natural language in 1948. That's inadequate for the reasons discussed extensively in this essay, but it seems like a pretty significant hint that methods beyond simply counting n-grams in the corpus could output useful probabilities.
In any case, do you see evidence that Chomsky changed his view? The quote from 2011 ("some successes, but a lot of failures") is softer but still quite negative.
- I'm not sure what you mean? As the length of a sequence increases (from word to n-gram to sentence to paragraph to ...), the probability that it actually ever appeared (in any corpus, whether that's a training set on disk, or every word ever spoken by any human even if not recorded, or anything else) quickly goes to exactly zero. That makes it computationally useless.
If we define perplexity in the usual way in NLP, then that probability approaches zero as the length of the sequence increases, but it does so smoothly and never reaches exactly zero. This makes it useful for sequences of arbitrary length. This latter metric seems so obviously better that it seems ridiculous to me to reject all statistical approaches based on the former. That's with the benefit of hindsight for me; but enough of Chomsky's less famous contemporaries did judge correctly that I get that benefit, that LLMs exist, etc.
- If Chomsky were known only as a mathematician and computer scientist, then my view of him would be favorable for the reasons you note. His formal grammars are good models for languages that machines can easily use, and that many humans can use with modest effort (i.e., computer programming languages).
The problem is that they're weak models for the languages that humans prefer to use with each other (i.e., natural languages). He seems to have convinced enough academic linguists otherwise to doom most of that field to uselessness for his entire working life, while the useful approach moved to the CS department as NLP.
As to politics, I don't think it's hard to find critics of the West's atrocities with less history of denying or excusing the West's enemies' atrocities. He's certainly not always wrong, but he's a net unfortunate choice of figurehead.
- Here's Chomsky quoted in the article, from 1969:
> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.
He was impressively early to the concept, but I think even those skeptical of the ultimate value of LLMs must agree that his position has aged terribly. That seems to have been a fundamental theoretical failing rather than the computational limits of the time, if he couldn't imagine any framework in which a novel sentence had probability other than zero.
I guess that position hasn't aged worse than his judgment of the Khmer Rouge (or Hugo Chavez, or Epstein, or ...) though. There's a cult of personality around Chomsky that's in no way justified by any scientific, political, or other achievements that I can see.
- The acute pain paper they cited (linked in other comment) said "low-quality evidence [...] for a small but significant reduction", which seems clear and correct to me. If these authors think that's too favorable, then the paper I linked above suggests "insufficient evidence to confirm or exclude an important difference".
Either of those distinguishes "strong evidence this doesn't work, and more studies are probably wasted effort" vs. "weak evidence, more studies required". I don't see any benefit to a single phrase covering both cases unless the goal is to deliberately mislead.
- In this context, "does not support" means "the evidence is of low quality", not "the evidence says it probably doesn't work". Per the quotations in my other comment here, the paper and its references conclude that the best available RCT evidence is favorable to cannabis for those conditions. They're just not impressed with the statistical power and methodological rigor of those studies.
It's unfortunately common to report that situation of favorable but low-quality evidence as "does not support", despite the confusion that invariably results. This confusion has been noted for literally decades, for example in
https://pmc.ncbi.nlm.nih.gov/articles/PMC351831/
I'm sad to see it repeated here, and I hope we can avoid propagating it further.
- That's true, but I believe the authors' complaint here is efficacy rather than safety. (I also think they're using terms of art from evidence-based medicine to make a statement the general public is likely to misinterpret, per my other comment here.)
Safety is barely discussed in this paper, probably because the available RCT evidence is favorable to cannabis. I'm not sure that means it's actually safe, since RCTs of tobacco cigarettes over the same study periods probably wouldn't show signal either. This again shows the downside of ignoring all scientific knowledge except RCT outcomes, just in the other direction.
- Acute pain isn't discussed in detail in this paper, but here's a paper they cited:
> Conclusions: There is low-quality evidence indicating that cannabinoids may be a safe alternative for a small but significant reduction in subjective pain score when treating acute pain, with intramuscular administration resulting in a greater reduction relative to oral.
https://dx.doi.org/10.1089/can.2019.0079
For insomnia, this paper itself says:
> meta-analysis of 39 RCTs, 38 of which evaluated oral cannabinoids and 1 administered inhaled cannabis, that included 5100 adult participants with chronic pain reported that cannabis and cannabinoid use, compared with placebo, resulted in a small improvement in sleep quality [...]
It goes on to criticize those studies, but we again see low-quality evidence in favor.
In the context of evidence-based medicine, "does not support" can mean the RCTs establish with reasonable confidence that the treatment doesn't work. It can also mean the RCTs show an effect in the good direction but with insufficient statistical power, so that an identical study with more participants would probably--but not certainly--reach our significance threshold. The failure to distinguish between those two quite different situations seems willful and unfortunate here.
The biggest downside is all the dark patterns at Merrill trying to sell you advisory services. That seems to be only upon account opening, though.