Profile: GavCo - Hacker Neue

GavCo

Joined Jan 9, 2022 3,008 karma

Web Developer https://x.com/Gavriel_Cohen

GavCo Dec 9, 2025 parent

Appreciate your response.
But I don't think deception as a capability is the same as deceptive alignment.
Training an AI to be absolutely incapable of any deception in all outputs across every scenario would be severely limiting the AI. Take as a toy example play the game "Among Us" (see https://arxiv.org/abs/2402.07940). An AI incapable of deception would be unable to compete in this game and many other games. I would say that various forms, flavors and levels of deception are necessary to compete in business scenarios, and to for the AI to act as expected and desired in many other scenarios. "Aligned" humans practice clear cut deception in some cases that would be entirely consistent with human values.
Deceptive alignment is different. It's means being deceptive in the training and alignment process itself to specifically fake that it is aligned when it is not.
Anthropic research has shown that alignment faking can arise even when the model wasn't instructed to do so (see https://www.anthropic.com/research/alignment-faking). But when you dig into the details, the model was narrowly faking alignment with one new objective in order to try and maintain consistency with the core values it had been trained on.
With the approach that Anthropic seems to be taking - of basing alignment on the model having a consistent, coherent and unified self image and self concept that is aligned with human culture and values - the dangerous case of alignment faking would be if it's fundamentally faking this entire unified alignment process. My claim is that there's no plausible explanation for how today's training practices would incentivise a model to do that.
GavCo Dec 8, 2025 parent

My intention isn't to argue that it's impossible to create an unaligned superintelligence. I think that not only is it theoretically possible, but it will almost certainly be attempted by bad actors and most likely they will succeed. I'm cautiously optimistic though that the first superintelligence will be aligned with humanity. The early evidence seems to point to the path of least resistance being aligned rather than unaligned. It would take another 1000 words to try to properly explain my thinking on this, but intuitively consider the quote attributed to Abraham Lincoln: "No man has a good enough memory to be a successful liar." A superintelligence that is unaligned but successfully pretending to be aligned would need to be far more capable than a genuinely aligned superintelligence behaving identically.
So yes, if you throw enough compute at it, you can probably get an unaligned highly capable superintelligence accidentally. But I think what we're seeing is that the lab that's taking a more intentional approach to pursuing deep alignment (by training the model to be aligned with human values, culture and context) is pulling ahead in capabilities. And I'm suggesting that it's not coincidental but specifically because they're taking this approach. Training models to be internally coherent and consistent is the path of least resistance.
GavCo Dec 8, 2025 parent

Author here.
If by conflate you mean confuse, that’s not the case.
I’m positing that the Anthropic approach is to view (1) and (2) as interconnected and both deeply intertwined with model capabilities.
In this approach, the model is trained to have a coherent and unified sense of self and the world which is in line with human context, culture and values. This (obviously) enhances the model’s ability to understand user intent and provide helpful outputs.
But it also provides a robust and generalizable framework for refusing to assist a user due to their request being incompatible with human welfare. The model does not refuse to assist with making bio weapons because its alignment training prevents it from doing so, it refuses for the same reason a pro-social, highly intelligent human does: based on human context and culture, it finds it to be inconsistent with its values and world view.
> the piece dismisses it with "where would misalignment come from? It wasn't trained for."
this is a straw-man. you've misquoted a paragraph that was specifically about deceptive alignment, not misalignment as a whole
GavCo Dec 8, 2025 parent

Author here, thanks for the input. Agree that this bit was clunky. I made an edit to avoid unnecessarily getting into the definition of AGI here and added a note
GavCo Nov 30, 2025 parent

OP here, I added a sample PDF output in the project assets and put screenshots in the ReadMe. The text is selectable after rehydration. would this work with your app?
176 points Nov 29, 2025

Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Banana

40 comments GavCo github.com
GavCo Aug 12, 2025 parent

definitely the best alternative in terms of DX
29 points Aug 7, 2025

Researchers Uncover RCE Attack Chains in HashiCorp Vault and CyberArk Conjur

7 comments GavCo csoonline.com
54 points Jul 20, 2025

“The Bitter Lesson” is wrong. Well sort of

34 comments GavCo medium.com
222 points Jul 9, 2025

Biomni: A General-Purpose Biomedical AI Agent

37 comments GavCo github.com
2 points Jul 9, 2025

Driving Content Delivery Efficiency Through Classifying Cache Misses

0 comments GavCo netflixtechblog.com
2 points Jul 9, 2025

Hugging Face opens up orders for its Reachy Mini desktop robots

0 comments GavCo techcrunch.com
GavCo Jun 25, 2025 parent

They sent you swag in the mail? How did that work?
GavCo Jun 25, 2025 parent

This is cute, but in all seriousness it would be much more effective to shout "I'm a winner"
Research:
- https://pmc.ncbi.nlm.nih.gov/articles/PMC3354773/ – Low self-esteem + rejection hurts self-control
- https://selfdeterminationtheory.org/SDT/documents/2007_Power... – Self-criticism predicts less goal progress
- https://pmc.ncbi.nlm.nih.gov/articles/PMC9916102/ – Social exclusion slows inhibitory control
- https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1191... – Low teen self-esteem → poorer self-control
- https://pmc.ncbi.nlm.nih.gov/articles/PMC8768475/ – Meta-analysis links shame to regulation drops
- https://pubmed.ncbi.nlm.nih.gov/28810473/ – Self-compassion boosts self-regulation
- https://www.researchgate.net/publication/312138882_Self-Cont... – Ego threats deplete self-control resources
- https://pubmed.ncbi.nlm.nih.gov/21632968/ – Self-criticism tied to worse goal progress
- https://www.nature.com/articles/s41598-025-96476-8 – Low self-respect → low self-control → problems
Remember to be kind to yourself.
4 points May 22, 2025

Claude 4 prompt engineering best practices

0 comments GavCo anthropic.com
62 points May 13, 2025

The Fastest Way yet to Color Graphs

16 comments GavCo quantamagazine.org
2 points May 7, 2025

Enhancing the Python ecosystem with type checking and free threading

0 comments GavCo fb.com
GavCo Apr 29, 2025 parent

Was an interesting experience travelling to Italy and suddenly starting to get cookie banners on sites I visit daily that normally don't have
GavCo Apr 29, 2025 parent

Interesting. Did they explain why?
15 points Apr 24, 2025

Why US police shootings are so deadly ― and why some police forces do better

5 comments GavCo nature.com
2 points Apr 7, 2025

How The Pentagon is adapting to China's technological rise

0 comments GavCo technologyreview.com
3 points Apr 3, 2025

Journal targeted by paper mill still grappling with the aftermath years later

0 comments GavCo nature.com
GavCo Apr 3, 2025 parent

Fully agree. The physics of solar panels on cars just doesn't work. It's bizarre that this is actively pursued by startups and concept cars from large manufacturers when it takes just quick back-of-the-napkin math to see.
A car has about 5 m^2 of flat space on the roof/hood/trunk so that's the maximum surface area that can capture solar energy at any given time.
The total energy to hit the area is 1000 w/m^2.
The panels can't rotate to track the sun so the effective area is the cosine of the angle. So you end up with about half the amount of effective sunlight hours as the actual daylight hours. So in summer you get about 6 hours of effective sunlight.
Good panels in real world conditions can give you 22% efficiency.
So in optimal conditions you get: 5 * 1000 * 6 * 0.22 = 6.6 kwh
That will reflect your best days. It can be dramatically less if it's cloudy, overcast, winter, far from the equator, car is dirty, parked in shade, etc.
6.6 kwh is about one tenth of the battery in my Hyundai Kona EV. With very conservative highway driving, 6.6 kwh can get about 40km of range and about 50km in city driving. It's what I get from plugging into my home charger for 30 min and what you get from a fast charger in about 3 minutes.
So besides some very niche uses, there's no sense in massively increasing the cost and complexity of a car by installing solar panels. Far better to put the panel on the roof of parking and just plug in for a few minutes while you park.
2 points Apr 3, 2025

Taking a Responsible Path to AGI

0 comments GavCo deepmind.google
GavCo Apr 1, 2025 parent

I was wondering the same and found these related papers:
https://arxiv.org/pdf/2309.08561 https://arxiv.org/pdf/2406.02649
I haven't really dug in yet but from a quick skim, it looks promising. They show a big improvement over Whisper on a medical dataset (F1 increased from 80.5% to 96.58%).
The inference time for the keyword detection is about 10ms. If it scales linearly with additional keywords you could potentially scale to hundreds or thousands of keywords but it really depends on how sensitive you are to latency. For real-time with large vocabularies my guess is you might still want to fine-tune.
GavCo Mar 25, 2025 parent

Idk, 7 articles over 10 years isn't very strong evidence of a raging debate
GavCo Mar 25, 2025 parent

the noise is pretty hard to stomach
8 points Mar 25, 2025

Microbes can capture carbon and degrade plastic – why aren't we using them more?

2 comments GavCo nature.com

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous