Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?
There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.
LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.
chilipepperhott
I'll admit it's something of a bold label, but there is truth in it.
Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.
> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.
I appreciate your skepticism and attention to detail.
And from there it's mostly work on improving optimization (both at training and inference time), training techniques (many stages), data (quality and modality), and scale.
---
There's also state space models, but don't believe they've gone mainstream yet.
All this being said- many tasks are solved very well using a linear model and tfidf. And are actually interpretable.
oersted
This is indeed the previous generation, but it's not even that old. When I was coming out of undergrad word2vec was the brand-new thing that was eating-up the whole field.
Indeed, before that there was a lot of work on applying classical ML classifiers (Naive Bayes, Decision Trees, SVM, Logistic Regression...) and clustering algorithms (fancily referred to as unsupervised ML) to bag-of-words vectors. This was a big field, with some overlap with Information Retrieval, lending to fancier weightings and normalizations of bag-of-words vectors (TF-IDF, BM25). There was also the whole field of Topic Modeling.
Before that there was a ton of statistical NLP modeling (Markov chains and such), primarily focused around machine translation before neural-networks got good enough (like the early version of Google Translate).
And before that there were a few decades of research on grammars (starting with Chomsky), with a lot of overlap with compilers, theoretical CS (state-machines and such) and symbolic AI (lisps, logic programming, expert systems...).
I myself don't have a very clear picture of all of this. I learned some in undergrad and read a few ancient NLP books (60s - 90s) out of curiosity. I started around the time where NLP, and AI in general, had been rather stagnant for a decade or two, it was rather boring and niche, believe it or not, but was starting to be revitalized by the new wave of ML and then word2vec with DNNs.
tolerance
I would much rather check my writing against grammatical rules that are hard coded in an open source program—meaning that I can change them—than ones that I imagine would be subject to prompt fiddling or worse; implicitly hard coded in a tangle of training data that the LLM would draw from.
The whole thing seems cool. Automattic should mention this on their homepage. Tools like this are the future of something.
triknomeister
You would lose out on evolution of language.
phoe-krk
Natural languages evolve so slowly that writing and editing rules for them is easily achievable even this way. Think years versus minutes.
fakedang
Aight you win fam, I was trippin fr. You're absolutely bussin, no cap. Harvard should be taking notes.
(^^ alien language that was developed in less than a decade)
notahacker
The existence of common slang which isn't used in the sort of formal writing that grammar linting tools are typically designed to promote is more of a weakness of learning grammar by a weighted model of the internet vs formal grammatical rules than a strength.
Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...
aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).
afeuerstein
I don't think anyone has the need to check such a message for grammar or spelling mistakes.
Even then, I would not rely on a LLM to accurately track this "evolution of language".
Yes, precisely. This "less than a decade" is magnitudes above the hours or days that it would take to manually add those words and idioms to proper dictionaries and/or write new grammar rules to accomodate aspects like skipping "g" in continuous verbs to get "bussin" or "bussin'" instead of "bussing". Thank you for illustrating my point.
Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.
Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".
You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
phoe-krk
> Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".
Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".
It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.
> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.
I don't need grammar to evolve in real time. In fact, having a stabilizing function is probably preferable to the alternative.
eadmund
If a language changes, there are only three possible options: either it becomes more expressive; or it becomes less expressive; or it remains as expressive as before.
Certainly we would never want our language to be less expressive. There’s no point to that.
And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.
Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.
So, what would the point of evolution be? If technology impedes it … fine.
canjobear
The world that we need to be expressive about is changing.
dragonwriter
> So, what would the point of evolution be?
Being equally as expressive overall but being more focussed where current needs are.
OTOH, I don't think anything is going to stop language from evolving in that way.
Polarity
why did you use chatgpt for this text then?
acidburnNSA
I can write em-dashes on my keyboard in one second using the compose key: right alt + ---
Freak_NL
Same here — the compose key is so convenient you forget most people never heard of it. This em-dashes mean LLM output thing is getting annoying though.
johnisgood
> This em-dashes mean LLM output thing is getting annoying though.
Agreed. Same with those non-ASCII single and double quotes.
shortformblog
LanguageTool (a Grammarly competitor) is also open source and can be managed locally:
I haven't messed with Harper closely but I am aware of its existence. It's nice to have options, though.
It would sure be nice if the Harper website made clear that one of the two competitors it compares itself to can also be run locally.
akazantsev
There are two versions of the LanguageTool: open source and cloud-based. Open source checks the individual words in the dictionary just like the system's spell checker. Maybe there is something more to it, but in my tests, it did not fix even obvious errors. It's not an alternative to Grammarly or this tool.
shortformblog
There is. It can be heavily customized to your needs and built to leverage a large ngram data set:
IMO not using LLMs is a big plus in my book. Grammarly has been going downhill since they've been larding it with "AI features," it has become remarkably inconsistent. It will tell me to remove a comma one hour, and then tell me to add it back the next.
tiew9Vii
Being dyslexic, I was an avid Grammarly user. Once it started adding "AI features" the deterioration was noticeable, I cancelled my subscription and stopped using it a year ago.
I also only ever used the web app, so copy+pasting as installing the app is for all intentness and purposes is installing a key logger.
Grammar works on rules, not sure why that needs an LLM, Grammarly certainly worked better for me when it was more dumb, using rules.
InsideOutSanta
Grammarly sometimes gets stuck in a loop, where it suggests changing from A to B. It then immediately suggests changing from B to A again, continuing to suggest the opposite change every time I accept the suggestion.
It's not a problem; I make the determination which option I like better, but it is funny.
boplicity
General purpose LLMs seem to get very confused about punctuation, in my experience. It's one of their big areas of obvious failing. I'm surprised Grammarly would allow this to happen.
jethro_tell
The internet, especially post phone keyboards, is extremely inconsistent about punctuation. I’m not sure how anyone could think an llm wouldn’t be.
raincole
So is there a similar tool but based on an LLM?
Not that I think LLM is always better, but it would be interesting to compare these two approaches.
Given LISP was supposed to build "The AI" ... pretty sad than a dumb LLM is taking its place now
7thaccount
Grammarly came out before the LLMs. I'm not sure what approach it took, but they're likely feeling a squeeze as LLMs can tell you how to rewrite a sentence to remove passive voice and all that. I doubt the LLMs are as consistent (some comments below show some big issues), but they're free (for now).
harvey9
'imo' and 'in my book' are redundant in the same sentence. Are there rules-based techniques to catch things like that? Btw I loved the use of 'larding' outside the context of food.
chneu
Thank you. In general my grammarly and gboard predictions have become so, so bad over the last year.
Alex-Programs
DeepL Write was pretty good in the post-LLM, pre-ChatGPT era.
Dr4kn
DeepL is different in my opinion. They always focused on machine learning for languages.
They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.
They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.
DeepL shows, in my opinion, how much more useful a model trained for specific uses is.
monkeywork
Any suggestions for models ppl can run locally that are close to deepl
attendant3446
If you are talking about the current status of DeepL, that would be a low bar.
raverbashing
> It will tell me to remove a comma one hour, and then tell me to add it back the next.
So just like English teachers I see
aDyslecticCrow
Harper is decent.
I've relied on Grammarly to spellcheck all my writing for a few years (dyslexia
prevents me from seeing the errors even when reading it 10 times). However, I find its increasing focus on LLMs and its insistence on rewriting sentences in more verbose ways bothers me a lot. (It removes personality and makes human-written text read like AI text.)
So I've tried out alternatives, and Harper is the closest I've found at the moment... but i still feel like grammarly does a better job at the basic word suggestion.
Really, all I wish for is a spellcheck that can use the context of the sentence to suggest words. Most ordinary dictionary spellchecks can pick the wrong word because it's syntactically closer. They may replace "though" with "thought" because I wrote "thougt" when the sentence clearly indicates "though" is correct; and I see no difference visually between any of the three words.
Breza
What's wild is that OpenAI's earlier models were trained to guess the next word in a sentence. I wonder if GPT-2 would get "though" correct more often than the latest AI-assisted writing tools like Grammerly.
There are some areas where it seems like LLMs (or even SLMs) should be way more capable. For example, when I touch a word on my Kindle, I'd think Amazon would know how to pick the most relevant definition. Yet it just grabs the most common definition. For example, consider the proper definition of "toilet" in this passage: "He passed ten hours out of the twenty-four in Saville Row, either in sleeping or making his toilet."
demarq
"Me and Jennifer went to have seen the ducks cousin."
No errors detected. So this needs a lot of rule contributions to get to Grammarly level.
alpb
Similarly 0 grammatical errors flagged: "My name John. What your name? What day today?"
Tsarp
I was initially impressed. But then I tested a bunch, it wasn't catching some really basic things. Mostly hit or miss.
wellthisisgreat
What the duck is that test
canyp
Nominative vs objective
thfuran
There's a little more going on than that.
rdlw
In addition to case, it's testing tense (went to have seen) and plural vs. posessive (ducks cousin)
canyp
Yeah, I stopped parsing after "Me and Jennifer".
marginalia_nu
Goes the other way around too. For
> In large, this is _how_ anything crawler-adjacent tends to be
It suggests
> In large, this is how _to_ anything crawler-adjacent tends to be
healsdata
Given this is an Automattic product, I'm hesitant to use it. If it gets remotely successful, Matt will ruin it in the name of profit.
josephcsible
It's FOSS, so even if the worst happens, anyone could just fork the last good version and continue development there.
jantissler
Oh, that’s a big no from me then.
icapybara
Why wouldn't you want an LLM for a language learning tool? Language is one of things I would trust an LLM completely on. Have you ever seen ChatGPT make an English mistake?
healsdata
Grammarly is all in on AI and recently started recommended splitting "wasn't" and added the contraction to the word it modified. Example: "truly wasn't" becomes "was trulyn't"
Hm ... I wonder, is Grammarly also responsible for the flood of contraction of lexical "have" the last few years? It's standard in British English, but outside of poetry it is proscribed in almost all other dialects (which only permit contraction of auxiliary "have").
Even in British I'm not sure how widely they actually use it - do they say "I've a car" and "I haven't a car"?
filterfish
"they" say "I haven't got a car".
Contractions are common in Australian English to, though becoming less so due to the influence of US English.
NoboruWataya
In my experience "I've a car" is much more common than "I haven't a car" (I've never heard the latter construct used, but regularly hear the former in casual speech). "I haven't got a car" or "I've no car" would be relatively common though.
akdev1l
This is what peak innovation looks like
Destiner
I don't think an LLM would recommend an edit like that.
Has to be a bug in their rule-based system?
healsdata
Gemini: "Was trulyn't" is a contraction that follows the rules of forming contractions, but it is not a widely used or accepted form in standard English. It is considered grammatically correct in a technical sense, but it's not common usage and can sound awkward or incorrect to native speakers.
marginalia_nu
I wonder how much memes like whomst'd might skew the training set.
InsideOutSanta
Yeah, I agree. An open-source LLM-based grammar checker with a user interface similar to Grammarly is probably what I'm looking for. It doesn't need to be perfect (none of the options are); it just needs to help me become a better writer by pointing out issues in my text. I can ignore the false positives, and as long as it helps improve my text, I don't mind if it doesn't catch every single issue.
Using an LLM would also help make it multilingual. Both Grammarly and Harper only support English and will likely never support more than a few dozen very popular languages. LLMs could help cover a much wider range of languages.
Szpadel
I tried to use one LLM based tool to rewrite sentence in more official corporate form, and it rewrote something like "we are having issues with xyz" into "please provide more information and I'll do my best to help".
LLMs are trained so hard to be helpful that it's really hard to contain them into other tasks
Groxx
uh. yes? it's far from uncommon, and sometimes it's ludicrously wrong. Grammarly has been getting quite a lot of meme-content lately showing stuff like that.
it is of course mostly very good at it, but it's very far from "trustworthy", and it tends to mirror mistakes you make.
perching_aix
Do you have any examples? The only time I noticed an LLM make a language mistake was when using a quantized model (gemma) with my native language (so much smaller training data pool).
Breza
Not GP, but I've definitely seen cutting edge LLMs make language mistakes. The most head scratching one I've seen in the past few weeks is when Gemini Pro decided to use <em> and </em> tags to emphasize something that was not code.
dartharva
Because this "language learning tool" will be dominantly used to avoid actually learning the language.
VTimofeenko
Comes with a great LSP server capable of checking grammar in code comments:
Would be nice if they had a website where you could demo/test it before downloading extensions and stuff. Their firefox extension opens to this page https://writewithharper.com/install-browser-extension but when you paste in anything more than a few paragraphs the highlighting is all messed up.
ErrorNoBrain
Great to hear
i honestly don't trust grammarly ... i mean, its essentially a keylogger.
i did try it a bit once, and i never seem to have it work that well for me. But i am multilingual so maybe thats part of my hurdle
SZJX
I was wondering about grammar checking tools in the era of LLMs, especially for grammar checks beyond English, and Sapling https://sapling.ai seemed decent. Nobody seems to have mentioned it here?
DavideNL
Interesting, curious to try this;
I wonder whether it will impact the performance (Firefox) and things will become noticeably slower...
Recently i noticed highlighting extensions in Firefox were slowing things down significantly, not just loading but also while scrolling up and down web pages.
ibobev
I'm a long-time Grammarly user. I just tried Harper, and it simply performs very poorly. It is a good initiative, but I don't feel the current state of this software to be worthwhile.
IceWreck
Slightly controversial compared to other comments here but I haven't used Grammerly at all since LLMs came out. Even a 4B local LLM is good enough to rephrase all forms of text and fix most grammer mistakes.
gglanzani
I think a lot of value comes by integrating with a language server and/or browser extensions.
Do you have a setup where this is possible or do you copy paste between text fields? (Genuine question. I’d love to use a local LLM integrating with an LSP).
loughnane
Surprised coming into this that I don't see anyone mentioning vale[0]. I've been using it for ~4 years now and love it.
I use grammarly briefly when it came out and liked the idea. Admittedly it has more polish than vale for people writing in google docs, &c. Still, I stick with Vale. Is there any case for moving to Harper?
Looks interesting for linting and cleaning markdown documentation, but it doesn't seem like a very competent "spellcheck". I'll check it out... but it doesn't actually do the same thing as Grammarly or Harper.
WhyNotHugo
Vale requires a lot of tweaking, and I’ve never been able to get a rule set with which I’m happy.
It’s missing a default rule set with rules that are generally okay without being too opinionated.
I wish it had keyboard shortcuts. As a Vim user, in Chrome it's tedious to click on every suggestion given by the app. Also, maybe add a "delay" so it doesn't think the currently-being-typed word is a mistake (let me finish typing first!).
Otherwise, it's great work. There should be an option to import/export the correction rules though.
cAtte_
this solution is just fundamentally insufficient. in the age of LLMs it's pretty insane to imagine programmers manually hard-coding an arbitrary subset of grammatical corrections (sure: it's faster, it's local first, but it's not enough). on top of that, English (like any other natural language) is such a complicated beast that you will never write a classic deterministic parser that's sophisticated enough to allow you to reliably implement even the most basic of grammatical corrections (check the other comments for examples). it's just not gonna happen.
i guess it's a nice and lightweight enhancement on top of the good old spellchecker, though
novoreorx
Seeing Harper as an implementation of natural language's LSP brings me great joy, as it proves an idea I've had for a long time—natural language and programming languages are interconnected. Many concepts and techniques from programming languages can also be applied to natural language, making our lives more convenient. The development of LLMs and vibe coding has further blurred the boundary between natural language and programming languages, offering similar insights.
jacooper
I think if you can self host language tool, it would still be the better option.
dartharva
I never understood the appeal of grammar tools. If you have reached the minimum professional/academic level needed to be designated to write something, shouldn't you at least be capable of verifying its semantic "correctness" just by reading through it once yourself?
Why would you pass a writing job to someone who isn't 100% fluent in the language and then make up for it by buying complex tools?
facundo_olano
As a non native English speaker/writer there are a bunch of errors I miss, no matter how much attention I pay and how much I proofread, and these tools are useful to catch those.
jordanpg
I'm a lawyer. I write 10s of pages of text every day. "Reading through it once yourself" is obviously an imperfect solution. See, e.g., Poisson statistics. It's also slow and I bill in 6-minute increments. There is significant value in a grammar tool that protects confidentiality and is more effective than my wetware.
Veen
People are bad at proofreading their own work. Professional writers often use third-party copy editors and proofreaders for that reason.
victorbjorklund
I know for example David Sparks (MacSparky https://www.macsparky.com ) uses it (or at leased used it). And he was an American lawyer and he says writing has been his passion his whole life so I assume his English is better than the average person.
Semaphor
I use it (well, languagetool) in the free version for comments on sites like this. It directly catches mistakes I make, that I'd normally only catch on re-reads. From typos, over my brain doing weird stuff, to sometimes things I simply didn't (actively) know.
speedgoose
Have you considered that some people aren’t 100% fluent in English but still competent?
Finnucane
I’m a production editor at an uni press, and I can tell you there’s not a strong correlation between professional/academic level and writing well.
JPLeRouzic
It is available in Autommatic's Github repository:
How big is English in "English grammar checker"? Is it plausible to add other languages to it, or the underlying framework is so English-specific that it doesn't make sense to even bother building something else than English grammar checker upon it?
0xjunhao
In a world of LLMs, it's great to see classic NLP works like Harper. Both definitely have their own use cases.
klabetron
Odd choice that the example text on the homepage is almost all obvious typos that a standard spell check would pick up.
b0a04gl
this is the right direction. rulebased, local, transparent. not perfect yet, but that's not the point. getting something lightweight and tweakable matters more than catching every edge case out of the box. if it misses, you add rules. simple as that. if you expect it to match grammarly day one then might be we are missing the tradeoff
jimaek
I don't understand why we even need such services. Why don't the browsers and maybe even the OS just not improve their included grammar checkers?
The Chrome enhanced grammar checker is still awful after decades.
Maybe the AI hype will finally fix this? I'm still surprised this wasn't the first thing they did.
paxys
Looks cool, but it's weird to constantly make comparisons to Grammarly (in the post title, description section of the site, benchmarks) when this is clearly a rule-based spellcheck and very different from what Grammarly offers.
Instead tell me how it compares to the built-in spellcheck in my browser/IDE/word processor/OS.
msravi
Looks very good. Was looking to replace ltex (which is really slow), but for some reason the nvim-lspconfig filetype setting for harper doesn't seem to have (la)tex listed as a default, although markdown and typst are listed. Anyone knows why?
chilipepperhott
Harper maintainer here
We've had some contributors have a go at adding LaTeX support in the past, but they've yet to succeed with a truly polished option. The irregularity of LaTeX makes it somewhat difficult to parse.
We accept contributions, if anyone is interested in getting us across the finish line.
lurk2
Who is the target market is for Grammarly? Working professionals who speak English as a second language?
victorbjorklund
I think it is anyone who wanna make sure they write correctly. I know for example David Sparks (MacSparky https://www.macsparky.com ) uses it (or at leased used it). And he was an American lawyer and he says writing has been his passion his whole life so I assume his English is better than the average person.
InsideOutSanta
Adam Engst from TidBITs, a person whose job has been writing things for all his life, also uses Grammarly:
I use it as a proofreader, not to improve my writing. It’s difficult to proofread your own work, and Grammarly is a useful assistant. Plus, I’m British and I often write on behalf of American clients. I’m pretty good at following US English standards because I’ve been doing it for a long time, but the odd Britishism slips through and Grammarly usually catches it (although a standard spell checker would too, I suppose).
InsideOutSanta
“Think of how poorly the average person writes, and realize half of them write worse than that.”
(George Carlin or something, quote's veracity depends on what you mean by “average.”)
I think everybody could benefit from having something like Grammarly on their computer. None of us writes perfectly, and it's always beneficial to strive for improvement.
m00dy
People who haven't heard of LLMs
akazantsev
LLMs are not nice to use for spell checking. I do not want to read a wall of text from LLM just to find a missed article somewhere and I want to receive feedback as I type.
Also, once I asked LLM to check the message. It said everything looked fine and made a copy of the message in its response with one sentence in the middle removed.
SilverSlash
I haven't used Grammarly but for simple things like spelling mistakes, missed articles, or punctuation, wouldn't even Google Docs be enough?
heldrida
I've been a grammarly paying customer for at least 8 years. Nice to have an alternative :)
AbstractH24
My biggest problem with Grammarly has always been how buggy the product is. From not checking random sites to messing up formatting to not updating text with the selected changes.
If Harper does better at this I’d change in a minute.
victorbjorklund
Very cool. Has anyone integrated this into their own app? How was your experience?
The-Ludwig
Looks awesome! I’ll give it a try over language tool.
Is there any reason why there is no firefox extension?
Unfortunately, the last time I tested Harper inside Neovim, it alone used more than 1 GB of RAM for just the LSP! However, the concept is nice, open source, no AI, and easy to integrate.
pragmatick
"For most documents, Harper can serve up suggestions in under 10ms." 10l is OK. 10kg as well. Why is 10ms wrong?
orliesaurus
Very buggy, but great start!!
I.e. if you write an "MISTAEK" and then you scroll the highlight follows me around the page
crimputer
Good start. But still has bugs i guess.
I tried with the following phrase
--
"This should can't logic be done me."
--
No errors.
cchance
Any chance to get it working in word? my wife would love to use it most likely
yablak
Any chance to make the obsidian plugin work in mobile/Android?
mpaepper
Are languages other than English also supported? Or is this for English only?
ssernikk
From their FAQ:
> We currently only support English and its dialects British, American, Canadian, and Australian. Other languages are on the horizon, but we want our English support to be truly amazing before we diversify.
Finnucane
No serial comma? Screw that.
v5v3
I used to see ads for Grammarly and wondered if anyone was using it.
Then post COVID with the increase in screen sharing video calls, I soon realised nearly every non-native English speaker from countries around the world heavily relied on it in their jobs. As I could see it installed when people share screens.
Huge market, good luck.
EugeneOZ
Great! Please create an iOS keyboard with Harper
harper
nice name!
sharkjacobs
This seems to use a hard coded list of explicit rules, not an LLM
"PointIsMoot" => (
["your point is mute"],
["your point is moot"],
"Did you mean `your point is moot`?",
"Typo: `moot` (meaning debatable) is correct rather than `mute`."
),
a2128
From a quick look phrase corrections is just one type of rule. There are many other rules, some are dynamic like when to use "your" vs "you're", oxford commas, etc.
That it doesn't use LLMs is its advantage, it runs in under 10ms and can be easily embedded in software and still provide useful grammar checking even if it's not exhaustive
I'm just a bit skeptical about this quote:
> Harper takes advantage of decades of natural language research to analyze exactly how your words come together.
But it's just a rather small collection of hard-coded rules:
https://docs.rs/harper-core/latest/harper_core/linting/trait...
Where did the decades of classical NLP go? No gold-standard resources like WordNet? No statistical methods?
There's nothing wrong with this, the solution is a good pragmatic choice. It's just interesting how our collective consciousness of expansive scientific fields can be so thoroughly purged when a new paradigm arises.
LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
Progress is good, but it's important not to forget all those hard-earned lessons, it can sometimes be a real superpower to be able to leverage that old toolbox in modern contexts. In many ways, we had much more advanced methods in the 60s for solving this problem than what Harper is doing here by naively reinventing the wheel.
Before our rule engine has a chance to touch the document, we run several pre-processing steps that imbue semantic meaning to the words it reads.
> LLMs have completely overshadowed ML NLP methods from 10 years ago, and they themselves replaced decades statistical NLP work, which also replaced another few decades of symbolic grammar-based NLP work.
This is a drastic oversimplification. I'll admit that transformer-based approaches are indeed quite prevalent, but I do not believe that "LLMs" in the conventional sense are "replacing" a significant fraction of NLP research.
I appreciate your skepticism and attention to detail.
1. https://jalammar.github.io/illustrated-word2vec/
2. https://jalammar.github.io/visualizing-neural-machine-transl...
3. https://jalammar.github.io/illustrated-transformer/
4. https://jalammar.github.io/illustrated-bert/
5. https://jalammar.github.io/illustrated-gpt2/
And from there it's mostly work on improving optimization (both at training and inference time), training techniques (many stages), data (quality and modality), and scale.
---
There's also state space models, but don't believe they've gone mainstream yet.
https://newsletter.maartengrootendorst.com/p/a-visual-guide-...
And diffusion models - but I'm struggling to find a good resource so https://ml-gsai.github.io/LLaDA-demo/
---
All this being said- many tasks are solved very well using a linear model and tfidf. And are actually interpretable.
Indeed, before that there was a lot of work on applying classical ML classifiers (Naive Bayes, Decision Trees, SVM, Logistic Regression...) and clustering algorithms (fancily referred to as unsupervised ML) to bag-of-words vectors. This was a big field, with some overlap with Information Retrieval, lending to fancier weightings and normalizations of bag-of-words vectors (TF-IDF, BM25). There was also the whole field of Topic Modeling.
Before that there was a ton of statistical NLP modeling (Markov chains and such), primarily focused around machine translation before neural-networks got good enough (like the early version of Google Translate).
And before that there were a few decades of research on grammars (starting with Chomsky), with a lot of overlap with compilers, theoretical CS (state-machines and such) and symbolic AI (lisps, logic programming, expert systems...).
I myself don't have a very clear picture of all of this. I learned some in undergrad and read a few ancient NLP books (60s - 90s) out of curiosity. I started around the time where NLP, and AI in general, had been rather stagnant for a decade or two, it was rather boring and niche, believe it or not, but was starting to be revitalized by the new wave of ML and then word2vec with DNNs.
The Neovim configuration for the LSP looks neat: https://writewithharper.com/docs/integrations/neovim
The whole thing seems cool. Automattic should mention this on their homepage. Tools like this are the future of something.
(^^ alien language that was developed in less than a decade)
Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...
aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).
Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.
Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".
It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.
> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?
Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.
Certainly we would never want our language to be less expressive. There’s no point to that.
And what would be the point of changing for the sake of change? Sure, we blop use the word ‘blop’ instead of the word ‘could’ without losing or gaining anything, but we’d incur the cost of changing books and schooling for … no gain.
Ah, but it’d be great to increase expressiveness, right? The thing is, as far as I am aware all human languages are about equal in terms of expressiveness. Changes don’t really move the needle.
So, what would the point of evolution be? If technology impedes it … fine.
Being equally as expressive overall but being more focussed where current needs are.
OTOH, I don't think anything is going to stop language from evolving in that way.
Agreed. Same with those non-ASCII single and double quotes.
https://github.com/languagetool-org/languagetool
I generally run it in a Docker container on my local machine:
https://hub.docker.com/r/erikvl87/languagetool
I haven't messed with Harper closely but I am aware of its existence. It's nice to have options, though.
It would sure be nice if the Harper website made clear that one of the two competitors it compares itself to can also be run locally.
https://dev.languagetool.org/finding-errors-using-n-gram-dat...
I would suggest diving into it more because it seems like you missed how customizable it is.
I also only ever used the web app, so copy+pasting as installing the app is for all intentness and purposes is installing a key logger.
Grammar works on rules, not sure why that needs an LLM, Grammarly certainly worked better for me when it was more dumb, using rules.
It's not a problem; I make the determination which option I like better, but it is funny.
Not that I think LLM is always better, but it would be interesting to compare these two approaches.
Given LISP was supposed to build "The AI" ... pretty sad than a dumb LLM is taking its place now
They must have acquired fantastic data for their Models. Especially because of the business language and professional translations which they focus on.
They keep your intended message in tact and just refine it. Like a book post editing. Grammarly and other tools force you to sound like they think is best.
DeepL shows, in my opinion, how much more useful a model trained for specific uses is.
So just like English teachers I see
I've relied on Grammarly to spellcheck all my writing for a few years (dyslexia prevents me from seeing the errors even when reading it 10 times). However, I find its increasing focus on LLMs and its insistence on rewriting sentences in more verbose ways bothers me a lot. (It removes personality and makes human-written text read like AI text.)
So I've tried out alternatives, and Harper is the closest I've found at the moment... but i still feel like grammarly does a better job at the basic word suggestion.
Really, all I wish for is a spellcheck that can use the context of the sentence to suggest words. Most ordinary dictionary spellchecks can pick the wrong word because it's syntactically closer. They may replace "though" with "thought" because I wrote "thougt" when the sentence clearly indicates "though" is correct; and I see no difference visually between any of the three words.
There are some areas where it seems like LLMs (or even SLMs) should be way more capable. For example, when I touch a word on my Kindle, I'd think Amazon would know how to pick the most relevant definition. Yet it just grabs the most common definition. For example, consider the proper definition of "toilet" in this passage: "He passed ten hours out of the twenty-four in Saville Row, either in sleeping or making his toilet."
No errors detected. So this needs a lot of rule contributions to get to Grammarly level.
> In large, this is _how_ anything crawler-adjacent tends to be
It suggests
> In large, this is how _to_ anything crawler-adjacent tends to be
https://imgur.com/a/RQZ2wXA
Even in British I'm not sure how widely they actually use it - do they say "I've a car" and "I haven't a car"?
Contractions are common in Australian English to, though becoming less so due to the influence of US English.
Has to be a bug in their rule-based system?
Using an LLM would also help make it multilingual. Both Grammarly and Harper only support English and will likely never support more than a few dozen very popular languages. LLMs could help cover a much wider range of languages.
LLMs are trained so hard to be helpful that it's really hard to contain them into other tasks
it is of course mostly very good at it, but it's very far from "trustworthy", and it tends to mirror mistakes you make.
https://writewithharper.com/docs/integrations/language-serve...
https://automattic.com/2024/11/21/automattic-welcomes-harper...
i honestly don't trust grammarly ... i mean, its essentially a keylogger.
i did try it a bit once, and i never seem to have it work that well for me. But i am multilingual so maybe thats part of my hurdle
I wonder whether it will impact the performance (Firefox) and things will become noticeably slower...
Recently i noticed highlighting extensions in Firefox were slowing things down significantly, not just loading but also while scrolling up and down web pages.
Do you have a setup where this is possible or do you copy paste between text fields? (Genuine question. I’d love to use a local LLM integrating with an LSP).
I use grammarly briefly when it came out and liked the idea. Admittedly it has more polish than vale for people writing in google docs, &c. Still, I stick with Vale. Is there any case for moving to Harper?
[0] https://vale.sh/
It’s missing a default rule set with rules that are generally okay without being too opinionated.
Passes.
For reference: https://youtu.be/w-R_Rak8Tys?si=h3zFCq2kyzYNRXBI
Otherwise, it's great work. There should be an option to import/export the correction rules though.
i guess it's a nice and lightweight enhancement on top of the good old spellchecker, though
Why would you pass a writing job to someone who isn't 100% fluent in the language and then make up for it by buying complex tools?
https://github.com/Automattic/harper
The Chrome enhanced grammar checker is still awful after decades.
Maybe the AI hype will finally fix this? I'm still surprised this wasn't the first thing they did.
Instead tell me how it compares to the built-in spellcheck in my browser/IDE/word processor/OS.
We've had some contributors have a go at adding LaTeX support in the past, but they've yet to succeed with a truly polished option. The irregularity of LaTeX makes it somewhat difficult to parse.
We accept contributions, if anyone is interested in getting us across the finish line.
https://tidbits.com/2025/01/30/why-grammarly-beats-apples-wr...
(George Carlin or something, quote's veracity depends on what you mean by “average.”)
I think everybody could benefit from having something like Grammarly on their computer. None of us writes perfectly, and it's always beneficial to strive for improvement.
Also, once I asked LLM to check the message. It said everything looked fine and made a copy of the message in its response with one sentence in the middle removed.
If Harper does better at this I’d change in a minute.
Is there any reason why there is no firefox extension?
https://addons.mozilla.org/en-US/firefox/addon/private-gramm...
I.e. if you write an "MISTAEK" and then you scroll the highlight follows me around the page
I tried with the following phrase -- "This should can't logic be done me." --
No errors.
> We currently only support English and its dialects British, American, Canadian, and Australian. Other languages are on the horizon, but we want our English support to be truly amazing before we diversify.
Then post COVID with the increase in screen sharing video calls, I soon realised nearly every non-native English speaker from countries around the world heavily relied on it in their jobs. As I could see it installed when people share screens.
Huge market, good luck.
https://writewithharper.com/docs/rules
https://github.com/Automattic/harper/blob/0c04291bfec25d0e93...
That it doesn't use LLMs is its advantage, it runs in under 10ms and can be easily embedded in software and still provide useful grammar checking even if it's not exhaustive
https://github.com/Automattic/harper