- Should the app builder’s ability to “trust” that the hardware will protect them from the user supersede the user’s ability to be able to trust that the hardware will protect them from the app?
In other words, should the device be responsible to enforcing DRM (and more) against its owner?
- So much negativity.
I’m just excited that our industry is lead by optimists and our culture enables our corporations to invest huge sums into taking us forward technologically.
Meta could have just done a stock buyback but instead they made a computer that can talk, see, solve problems and paint virtual things into the real world in front of your eyes!
I commend them on attempting a live demo.
- This is due to RoPE scaling.
> All the notable open-source frameworks implement static YaRN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the rope_scaling configuration only when processing long contexts is required. It is also recommended to modify the factor as needed. For example, if the typical context length for your application is 524,288 tokens, it would be better to set factor as 2.0.
- > Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.
Things they could do that would not technically contradict that:
- Quantize KV cache
- Data aware model quantization where their own evals will show "equivalent perf" but the overall model quality suffers.
Simple fact is that it takes longer to deploy physical compute but somehow they are able to serve more and more inference from a slowly growing pool of hardware. Something has to give...
- It's an arms race.
- > Edit: Letter frequency apparently has just become another scripted output, like doing arithmetic. LLMs don't have the ability to do this sort of work inherently, so they're trained to offload the task.
Mechanistic research at the leading labs has shown that LLMs actually do math in token form up to certain scale of difficulty.
> This is a real-time, unedited research walkthrough investigating how GPT-J (a 6 billion parameter LLM) can do addition.
- How do you launch a dev tool with a “contact us” call to action?
It’s like Mistral is choosing to fail here.
Edit: I can't even tell if its a CLI tool, an IDE plugin or a standalone IDE!
Edit 2: oh man! it's at the bottom of the page
Edit 3: "Mistral Code Enterprise is currently only available with an enterprise license." :D
- I would urge you to not think this way: https://www.osmos.io/fabric
- https://www.osmos.io/fabric
Practical, real-world application.
- Things have been moving so fast that it’s honestly hard for a small team to do that in parallel.
I got to present at GCP Next about a part of this last year: https://www.youtube.com/watch?v=5QsM1K9ahtw
I’m presenting in one (and maybe two) sessions with more info on the training side this year.
- We use multiple post-trained models in production, at scale at https://osmos.io
OpenAI is basically ensuring that they can actually get the chips they need for the DCs they are building.
I can’t guess as to what move came first (Nvidia policy change or these DRAM deals) but I would bet this is a large if not larger factor here than “bloc my competitors.