- The baked-in assumptions observation is basically the opposite of the impression I get after watching Gemini 3's CoT. With the maximum reasoning effort it's able to break out of the wrong route by rethinking the strategy. For example I gave it an onion address without the .onion part, and told it to figure out what this string means. All reasoning models including Gemini 2.5 and 3 assume it's a puzzle or a cipher (because they're trained on those) and start endlessly applying different algorithms to no avail. Gemini 3 Pro is the only model that can break the initial assumption after running out of ideas ("Wait, the user said it's just a string, what if it's NOT obfuscated"), and correctly identify the string as an onion address. My guess is they trained it on simulations to enforce the anti-jailbreaking commands injected by the Model Armor, as its CoT is incredibly paranoid at times. I could be wrong, of course.
- It's pretty slow to converge though, as it needs enough data points so they cross some certainty threshold. Especially in the context of VPN exit points as the traffic comes from all over the world.
- I'm surprised that sampling bias is not in the list. Is it possible that these fossils simply haven't been found yet?
- As someone who always avoids four-wheeled cages if possible, I can say highways are more or less safe if you have a fast enough motorcycle, compared to slow lane filtering with big vehicles on the road. Tiny spaces, poor visibility, and things that crush you on a slighest mistake is a dangerous combination.
If you look at the "motorcycling countries" (SEA, India/Pakistan, Africa, Latin America, etc.), most terrible accidents happen because trucks share the road with an army of scooters. I've been to a few of them on a motorcycle and it's a nightmare. For the same reason, big cities in China introduced dedicated scooter lanes separated by concrete barriers.
- Convincing AND useful procedural terrain is usually hard-simulated along some manually placed guides, which is typically faster and more versatile than a diffusion model. I don't see any model being used in practice for this, at least not until it has good controlnets trained specifically for this task. However something like this can be useful for texture generation, especially with geometry/camera position/lighting as additional inputs.
- In other words, if you ever need to install anything on your device, you do need to worry. What even could be trusted, a random app from Play Store?
- I'm living in a reality very different from yours, I don't think you can understand. In my country, actual journalism and speaking about certain things are crimes that will put you into the jail for the rest of your life, get you tortured and likely murdered. Access to the knowledge about certain things is blocked. To be able to do journalism or to circumvent the censorship, one essentially has to commit crimes in another (supposedly free) country as well, because there it's considered to be sanctions evading and/or illegal money laundering.
So yeah, of course you can frame it in your way and that would be valid. That was the original ideal of cryptocurrencies - to have a financial system not controlled by the governments. Of course it can be used for fraud and other things we probably both consider bad, by design. Just like gold.
- >criminals, sanction evaders
The definition of both may vary depending on where you are. Being not controlled by governments is the original purpose of cryptocurrencies.
- >Your comment seems to imply "these views aren't valid" without any evidence for that claim.
No, your comment seems to be a deflection. You made an outstanding claim, that DS stole some IP, and have been asked for outstanding evidence, or at least some evidence. You need to provide it if you want to be taken seriously.
>Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek
Where's the evidence for that? I also have a claim that I can't back up with anything more than XLab's report: before the release of R1, there were multiple attempts to hack DS's systems, which nobody noticed. [1]
You really seem to have no idea what you're talking about. R1 was an experiment on teaching the model to reason on its own, exactly to avoid large amounts of data in post-training. It also partially failed, they called the failed snapshot R1-Zero. And it's pretty different from any OpenAI or Anthropic model.
>DeepSeek's claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method
DeepSeek published a lot more about their models than any top tier US lab before them, including their production code. And they're continuing doing so. All their findings in R1 are highly plausible and most are replicated to some degree and adopted in the research and industry. Moonshot AI trained their K2 on DeepSeek's architecture with minor tweaks (not to diminish their novel findings). That's a really solid model.
Moreover, they released their DeepSeek-Math-7B-RL back in April 2024. [2] It was a tiny model that outperformed huge then-SOTA LLMs like Claude 3 Opus in math, and validated their training technique (GPRO). Basically, they made the first reasoning model worth talking about. Their other optimizations (MLA) can be traced back to DeepSeek v2.
>Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. https://www.reddit.com/r/ChatGPT/comments/1idqi7p/deepseek_a...
That's n=1 nonsense, not evidence. GPT contamination was everywhere, even Claude used to claim to be GPT-3 occasionally, or Reddit Anti-Evil Team. (yes, really) All models have overlapping datasets that are also contaminated with previous models outputs, and mode collapse makes them converge on similar patterns which seem to come and go with each generation.
- All modern models have their default looks. Meaningful variety of outputs for the same inputs in finetuned models is still an open technical problem. It's not impossible, but not solved either.
- Does it do anything that Sidebery doesn't?
- It's their own decisions they made long before the controls and presure. Besides being in bed with the US gov, people that run big AI shops tend to be fervently nationalistic and politically ambitious on their own. Leopold Aschenbrenner's dystopian rant [1] or Dario Amodei's [2] [3] are pretty representative.
[1] https://situational-awareness.ai/
[2] https://www.darioamodei.com/essay/machines-of-loving-grace
[3] https://www.darioamodei.com/post/on-deepseek-and-export-cont...
- All of this is hard but solvable at least for some genres, with the help of the platform and game devs. A decade+ ago, I ran servers for several PvP sandbox games, we've been one of the major EU hubs. This was pretty complicated - tons of custom observability tooling, a community with 24/7 mods, investigations, mod transparency, etc. The IDs were partially solved by Steam, the rest was handled by player behavior tracking and reputation.
We also had skill-separated servers. Casual ones had votebans for teams and players that were too organized/skilled/abusive, with case by case mod approval. Anarchy servers had nearly zero rules and were absolutely cutthroat and toxic but fair, you always knew what you were signing up for. They even had mods temporarily banning players for whining.
Cheaters never lasted long in our servers, including returning ones. Running a dedicated server takes some dedication. Know the game and people you're doing this for.
- >situational and temporary alignment of self-interest with the public good
That's how it supposed to work.
- As a matter of fact, commercial passenger service started almost immediately as the tech was out of the fiction phase. The airship were large, highly experimental, barely controllable, hydrogen-filled death traps that were marketed as luxurious and safe. First airliners also appeared with big engines and large planes (WWI disrupted this a bit). Nothing of that was built on solid grounds. The adoption was only constrained by the industrial capacity and cost. Most large aircraft were more or less experimental up until the 50's, and aviation in general was unreliable until about 80's.
I would say that right from the start everyone was pretty well aware about the unreliability of LLM-assisted coding and nobody was experimenting on unwitting people or forcing them to adopt it.
>Engineering at it's most basic is tricking physics into doing what you want.
Very well, then Mr Tinkleberry also passes the bar because it's exactly such a trick. That it irks you as a cheap hack that lacks rigor (which it does) is another matter.
- Looking at the accessibility alternatives they suggest, they were probably detecting XIM users, not the much nastier PC stuff like DMA cards.
- We're not going to advance the discussion this way. I also hate this kind of HN comment that makes grand sweeping statement like "LLMs are like having a fictional friend in a text file for the token predictor", because there's no way to tell whether you're just pulling these things out of your... to get internet points or actually have insightful parallels to make.
Yes, during the Wright era aeronautics was absolutely dominated by tinkering, before the aerodynamics was figured out. It wouldn't pass the high standard of Real Engineering.
- This reads like you either have an idealized view of Real Engineering™, or used to work in a stable, extremely regulated area (e.g. civil engineering). I used to work in aerospace in the past, and we had a lot of silly Mr Tinkleberry canaries. We didn't strictly rely on them because our job was "extremely regulated" to put it mildly, but they did save us some time.
There's a ton of pretty stable engineering subfields that involve a lot more intuition than rigor. A lot of things in EE are like that. Anything novel as well. That's how steam in 19th century or aeronautics in the early 20th century felt. Or rocketry in 1950s, for that matter. There's no need to be upset with the fact that some people want to hack explosive stuff together before it becomes a predictable glacier of Real Engineering.
Any normal pre-total-surveillance store would've had zero issues selling me something for cash if I walked in wearing a ski mask.