Preferences

ImprobableTruth
Joined 1,623 karma

  1. They're not making money on inference alone because they blow ungodly amounts on R&D. Otherwise it'd be a very profitable business.
  2. > These games are the starting point, but the bulk of the game is new puzzles combining mechanics from different games together

    Seems like the puzzles are novel, but the mechanics are not?

  3. An almost 50% price increase. Benchmarks look nice, but 50% more nice...?
  4. Unfortunately not, this model is noticeably worse. I imagine horizon is either gpt 5 nano/mini.
  5. Even near-perfect LLMs would benefit from the compiler optimizations that types allow.

    However perfect LLMs would just replace compilers and programming languages above assembly completely.

  6. This is the fault of sloppy language. In Lean, _proofs_ (equivalent to functions) and _proof objects/certificates_ (values) need to be distinguished. You can't compute proofs, only proof objects. In the above quote, replace "proof" with "certificate" and you'll see that it's a perfectly valid (if trivial - it essentially just applies a lemma) proof.
  7. Caveat: Coercions exist in Lean, so subtypes actually can be used like the supertype, similar to other languages. This is done via essentially adding an implicit casting operation when such a usage is encountered.
  8. I think the concept of a game DSL is cool, but it just feels so undercooked to me.

    Like, I'm a huge fan of gradual typing, especially TypeScript's, but gdscript's is just so primitive. Not even to speak of something like intersection or union types, even something basic like an interfaces mechanism is missing. has_method is an awful substitute - in general way too much relies on strings, making even simple refactoring a headache and breaks autocompletion. Lots of things also just aren't typable e.g. because generics are missing, pushing one to Variant. These aren't deal breakers, especially for the small-ish projects I've done, but it just feels bad.

    A 'fully realized' version of gdscript would probably be great, but as is I'm just really not very fond of it and progress currently isn't exactly happening at a rapid pace (which is of course understandable).

    Also - and this is definitely a lot more subjective - but I find its C++ FFI pretty ugly, even for basic stuff like working with structs. In theory using gsdcript as glue and C++ for the more core things would be a great approach (like unreal with its blueprints), but in practice I just want to avoid it as much as possible.

  9. What's the issue with the remix -> react-router transition? As far as I can tell it's just a branding thing.
  10. I think the fact that all (good) LLM datasets are full with licensed/pirated material means we'll never really see a decent open source model under the strict definition. Open weight + open source code is really the best we're going to get, so I'm fine with it coopting the term open source even if it doesn't fully apply.
  11. No, that's a different scenario. In the one I gave there's explicitly a dependency between requests. If you use gather, the network requests would be executed in parallel. If you have dependencies they're sequential by nature because later ones depend on values of former ones.

    The 'trick' for CUDA is that you declare all this using buffers as inputs/outputs rather than values and that there's automatic ordering enforcement through CUDA's stream mechanism. Marrying that with the coroutine mechanism just doesn't really make sense.

  12. The reason is that the usage is completely different from coroutine based async. With GPUs you want to queue _as many async operations as possible_ and only then synchronize. That is, you would have a program like this (pseudocode):

      b = foo(a)
      c = bar(b)
      d = baz(c)
      synchronize()
    
    With coroutines/async await, something like this

      b = await foo(a)
      c = await bar(b)
      d = await baz(c)
    
    would synchronize after every step, being much more inefficient.
  13. On their subscriptions, specifically the pro subscription, because it's a flatrate to their most expensive model. The API prices are all much more expensive. It's unclear whether they're losing money on the normal subscriptions, but if so, probably not by much. Though it's definitely closer to what you described, subsidizing it to gain 'mindshare' or whatever.
  14. If you compare with e.g. Deepseek and other hosters, you'll find that OpenAI is actually almost certainly charging very high margins (Deepseek has an 80% profit margin and they're 10x cheaper than openai).

    The training/R&D might make OpenAI burn VC cash, but this isn't comparable with companies like WeWork whose products actively burn cash

  15. Do you take issue with the 'purely empirical' approach (just trying out variants and seeing which sticks) or only with its insufficient documentation?

    I don't know how you'd improve on the former. For a lot of it there simply isn't any sound theoretical foundation, so you just end up with flimsy post-hoc rationalizations.

    While I agree that it's unfortunate that people often just present magic numbers without explaining where they come from, in my experience providing documentation for how one arrives at these often enough gets punished because it draws more attention to them. That is, reviewers will e.g. complain about preliminary experiments, asking for theoretical analysis or question why only certain variants were tried, whereas magic numbers are just kind of accepted.

  16. There are even much cheaper services that host it for only slightly more than deepseek itself [1]. I'm now very certain that deepseek is not offering the API at a loss, so either OpenAI has absurd margins or their model is much more expensive.

    [1] the cheapest I've found, which also happens to run in the EU, is https://studio.nebius.ai/ at $0.8/million input.

    Edit: I just saw that openrouter also now has nebius

  17. The "97.3%" match is probably just the confidence value - I don't think a frequentist interpretation makes sense for this. I'm not an expert in face recognition, but these systems are very accurate, typically like >99.5% accuracy with most of the errors coming from recall rather than precision. They're also not _that_ expensive. Real-time detection on embedded devices has been possible for around a decade and costs for high quality detection have come down a lot in recent years.

    Still, you're right that at those scales these systems will invariably slip once in a while and it's scary to think that this might enough to be considered a criminal, especially because people often treat these systems as infallible.

  18. Yes, that's how it works.

    I think in Zig for new types you'd use enums for ints and packed structs for more complex types.

  19. Latency is magnitudes more critical when it's something where you have to react.
  20. Transmission speeds aren't fast enough for this, unless you crank up the batch size ridiculously high.

This user hasn’t submitted anything.