Preferences

kiratp
Joined 917 karma
Co-Founder @ osmos.io

  1. This is missing a key part of the picture - Nvidia just announced that partners will need to source RAM themselves.

    OpenAI is basically ensuring that they can actually get the chips they need for the DCs they are building.

    I can’t guess as to what move came first (Nvidia policy change or these DRAM deals) but I would bet this is a large if not larger factor here than “bloc my competitors.

  2. A loop either never halts or has a conditional. I guess a compiler could elide a “while True:” to a branch-less jump instruction.

    One hack would be to use recursion and let stack exhaustion stop you.

  3. A for loop has a conditional in it.

    Unless by conditionals we mean “no if/else” and not “no branch instructions”.

  4. A for loop has an implicit conditional in its stop condition check.
  5. This only applies to large employers. Smaller ones are just presentef a limited list of plans to pick from, and the plans change every year. Most of the time, as a startup, you can’t buy a Mag7 equivalent health plan for any amount of money off the marketplace
  6. Should the app builder’s ability to “trust” that the hardware will protect them from the user supersede the user’s ability to be able to trust that the hardware will protect them from the app?

    In other words, should the device be responsible to enforcing DRM (and more) against its owner?

  7. The kind of people in these small teams are not ones to think "work is just work".
  8. You can put the AI on rails by just prompting by it. The latest models are very steerable.

    System prompt: “stick to steps 1-n. Step 1 is…”

    I can say confidently because our company does this. And we have F500 customers in production.

  9. I see no evidence of that. It seems like they tried to put the AI “on rails” with predefined steps and things went wrong.
  10. So much negativity.

    I’m just excited that our industry is lead by optimists and our culture enables our corporations to invest huge sums into taking us forward technologically.

    Meta could have just done a stock buyback but instead they made a computer that can talk, see, solve problems and paint virtual things into the real world in front of your eyes!

    I commend them on attempting a live demo.

  11. This is due to RoPE scaling.

    > All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the rope_scaling configuration only when processing long contexts is required. It is also recommended to modify the factor as needed. For example, if the typical context length for your application is 524,288 tokens, it would be better to set factor as 2.0.

    https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking

  12. Hardware can be the same but scheduling is a whole different beast.

    Also, if you pull too manny resources from training your next model to make inference revenue today, you’ll fall behind in the larger race.

  13. > Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.

    Things they could do that would not technically contradict that:

    - Quantize KV cache

    - Data aware model quantization where their own evals will show "equivalent perf" but the overall model quality suffers.

    Simple fact is that it takes longer to deploy physical compute but somehow they are able to serve more and more inference from a slowly growing pool of hardware. Something has to give...

  14. lol look up Civil Asset Forfeiture.
  15. > Edit: Letter frequency apparently has just become another scripted output, like doing arithmetic. LLMs don't have the ability to do this sort of work inherently, so they're trained to offload the task.

    Mechanistic research at the leading labs has shown that LLMs actually do math in token form up to certain scale of difficulty.

    > This is a real-time, unedited research walkthrough investigating how GPT-J (a 6 billion parameter LLM) can do addition.

    https://youtu.be/OI1we2bUseI

  16. The web browsers that the AI companies are about to ship will make requests that are indistinguishable from user requests. The ship on trying to save minimization has sailed.
  17. So a sequence of characters that is a python program is “neurosymbolic” but a sequence (of the same domain) in English (a different ruleset) that says “reverse this string” is not?
  18. That will play out exactly like the "Do not track" bit did.
  19. How do you launch a dev tool with a “contact us” call to action?

    It’s like Mistral is choosing to fail here.

    Edit: I can't even tell if its a CLI tool, an IDE plugin or a standalone IDE!

    Edit 2: oh man! it's at the bottom of the page

    Edit 3: "Mistral Code Enterprise is currently only available with an enterprise license." :D

  20. The productivity boost can be so massive that this amount of fiddling to control costs is counterproductive.

    Developers tend to seriously underestimate the opportunity cost of their own time.

    Hint - it’s many multiples of your total compensation broken down to 40 hour work weeks.

  21. You walk into your nearest Chase (or similar) branch. Out in 10 mins.
  22. I would urge you to not think this way: https://www.osmos.io/fabric
  23. https://www.osmos.io/fabric

    Practical, real-world application.

  24. Things have been moving so fast that it’s honestly hard for a small team to do that in parallel.

    I got to present at GCP Next about a part of this last year: https://www.youtube.com/watch?v=5QsM1K9ahtw

    I’m presenting in one (and maybe two) sessions with more info on the training side this year.

  25. We use multiple post-trained models in production, at scale at https://osmos.io
  26. The entire wrapped package of tested prompts, context management etc. is a whole step change from what you can build yourself.

    There is a reason Cursor is the fastest startup to $100M in revenue, ever.

  27. You’re not using the best tools.

    Claude Code, Cline, Cursor… all of them with Claude 3.7.

  28. Our EU customers use our technology to deal with all the invoices etc. they get sent as PDFs.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal