Comment by phoe-krk - Hacker Neue

phoe-krk 3 days ago parent

Natural languages evolve so slowly that writing and editing rules for them is easily achievable even this way. Think years versus minutes.

fakedang 3 days ago

Aight you win fam, I was trippin fr. You're absolutely bussin, no cap. Harvard should be taking notes.

(^^ alien language that was developed in less than a decade)

notahacker 3 days ago

The existence of common slang which isn't used in the sort of formal writing that grammar linting tools are typically designed to promote is more of a weakness of learning grammar by a weighted model of the internet vs formal grammatical rules than a strength.

Not an insurmountable problem, ChatGPT will use "aight fam" only in context-sensitive ways and will remove it if you ask to rephrase to sound more like a professor, but RHLFing slang into predictable use is likely a bigger potential challenge than simply ensuring the word list of an open source program is sufficiently up to date to include slang whose etymology dates back to the noughties or nineties, if phrasing things in that particular vernacular is even a target for your grammar linting tool...

chrisweekly 3 days ago

Huh, this is the first time I've seen "noughties" used to describe the first decade of the 2000s. Slightly amusing that it's surely pronounced like "naughties". I wonder if it'll catch on and spread.

nailer 3 days ago

‘Noughties’ was popular in Australia from 2010 onwards. Radio stations would “play the best from the eighties nineties noughties and today”.

notahacker 3 days ago

Common in Britain too, also appears in the opening lines of the Wikipedia description for the decade and the OED.

harvey9 3 days ago

The fact that you never saw it before suggests it did not catch on and spread during the last 25 years.

dmoy 3 days ago

Pedantically,

aight, trippin, fr (at least the spoken version), and fam were all very common in the 1990s (which was the last decade I was able to speak like that without getting jeered at by peers).

afeuerstein 3 days ago

I don't think anyone has the need to check such a message for grammar or spelling mistakes. Even then, I would not rely on a LLM to accurately track this "evolution of language".

fakedang 3 days ago

What if you're writing emails to GenZers?

dpassens 3 days ago

As a zoomer, I'd rather not receive emails that sound like they're written by a moron.

bombcar 3 days ago

Attempting to write like a GenZ when you’re not gets you “hello fellow kids” and “Boomer” right away.

phoe-krk OP 3 days ago

Yes, precisely. This "less than a decade" is magnitudes above the hours or days that it would take to manually add those words and idioms to proper dictionaries and/or write new grammar rules to accomodate aspects like skipping "g" in continuous verbs to get "bussin" or "bussin'" instead of "bussing". Thank you for illustrating my point.

Also, it takes at most few developers to write those rules into a grammar checking system, compared to millions and more that need to learn a given piece of "evolved" language as it becomes impossible to avoid learning it. It's not only fast enough to do this manually, it also takes much less work-intensive and more scalable.

fakedang 3 days ago

Not exactly. It takes time for those words to become mainstream for a generation. While you'd have to manually add those words in dictionaries, LLMs can learn these words on the fly, based on frequency of usage.

phoe-krk OP 3 days ago

At this point we're already using different definitions of grammar and vocabulary - are they discrete (as in a rule system, vide Harper) or continuous (as in a probability, vide LLMs). LLMs, like humans, can learn them on the fly, and, like humans, they'll have problems and disagreements judging whether something should be highlighted as an error or not.

Or, in other words: if you "just" want a utility that can learn speech on the fly, you don't need a rigid grammar checker, just a good enough approximator. If you want to check if a document contains errors, you need to define what an error is, and then if you want to define it in a strict manner, at that point you need a rule engine of some sort instead of something probabilistic.

efitz 3 days ago

I’m glad we have people at HN who could have eliminated decades of effort by tens of thousands of people, had they only been consulted first on the problem.

phoe-krk OP 3 days ago

Which effort? Learning a language is something that can't be eliminated. Everyone needs to do it on their own. Writing grammar checking software, though, can be done few times and then copied.

qwery 3 days ago

Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly". You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?

phoe-krk OP 3 days ago

> Please share your reasoning that led you to this conclusion -- that natural language "evolves slowly".

Languages are used to successfully communicate. To achieve this, all parties involved in the communication must know the language well enough to send and receive messages. This obviously includes messages that transmit changes in the language, for instance, if you tried to explain to your parents the meaning of the current short-lived meme and fad nouns/adjectives like "skibidi ohio gyatt rizz".

It takes time for a language feature to become widespread and de-facto standardized among a population. This is because people need to asynchronously learn it, start using it themselves, and gain critical mass so that even people who do not like using that feature need to start respecting its presence. This inertia is the main source of slowness that I mention, and also and a requirement for any kind of grammar-checking software. From the point of such software, a language feature that (almost) nobody understands is not a language feature, but an error.

> You also seem to be making an assumption that natural languages (English, I'm assuming) can be well defined by a simple set of rigid patterns/rules?

Yes, that set of patterns is called a language grammar. Even dialects and slangs have grammars of their own, even if they're different, less popular, have less formal materials describing them, and/or aren't taught in schools.

qwery 3 days ago

Fair enough, thanks for replying. I don't see the task of specifying a grammar as straightforward as you do, perhaps. I guess I just didn't understand the chain of comments.

I find that clear-cut, rigid rules tend to be the least helpful ones in writing. Obviously this class of rule is also easy/easier to represent in software, so it also tends to be the source of false positives and frustration that lead me to disable such features altogether.

phoe-krk OP 3 days ago

When you do writing as a form of art, rules are meant to be bent or broken; it's useful to have the ability to explicitly write new ones and make new forms of the language legal, rather than wrestle with hallucinating LLMs.

When writing for utility and communication, though, English grammar is simple and standard enough. Browsing Harper sources, https://github.com/Automattic/harper/blob/0c04291bfec25d0e93... seems to have a lot of the basics already nailed down. Natural language grammar can often be represented as "what is allowed to, should, or should not, appear where, when, and in which context" - IIUC, Harper seems to tackle the problem the same way.

qwery 2 days ago

I'm certainly not disputing the existence of grammar nor do I think an LLM is a good way to implement/check/enforce one. And now I realise how my first comment landed. Thanks again!

fl0id 1 day ago

Your first point would be more fitting if a language checker would need a complete, computable grammar that can be parsed and understood. That would be problematic for natural languages.

tolerance 1 day ago

You get it!!

bombcar 3 days ago

Just because the rules aren’t set fully in stone, or can be bent or broken, doesn’t mean they don’t “exist” - perhaps not the way mathematical truths exist, but there’s something there.

Even these few posts follow innumerable “rules” which make it easier to (try) to communicate.

Perhaps what you’re angling against is where rules of language get set it stone and fossilized until the “Official” language is so diverged from the “vulgar tongue” that it’s incomprehensibly different.

Like church/legal Latin compared to Italian, perhaps. (Fun fact - the Vulgate translation of the Bible was INTO the vulgar tongue at the time: Latin).

This item has no comments currently.