- sosodevI'm something of a data scientist for a community college. Most of the problems are social not technical but I am still writing code often enough.
- It’s refreshing to a see single optimistic take in this thread
- Prosecution for insider trading in 2026? I highly doubt that
- I suspect that 2026 will be the year we see a big breakthrough in the use of LLM agent systems. I don’t know what that will look like but I suspect the agents will be doing meaningful research (probably on AI).
- You do know this reads the same as every pessimistic commentary on technology ever, right? So many people were convinced that television was going to fry our brains.
- Nvidia released Nemotron 3 nano recently and I think it fits your requirements for an OSS model: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...
It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy
- Did you see this HN submission? https://www.hackerneue.com/item?id=46242838
It seems similar to what you're describing.
- Yes, and I'm a little ashamed to admit my morning routine wasn't the same without it.
- The claim that a small, fast, and decently accurate model makes a good foundation for agentic workloads seems like a reasonable claim.
However, is cost the biggest limiting factor for agent adoption at this point? I would suspect that the much harder part is just creating an agent that yields meaningful results.
- I love how detailed and transparent the data set statistics are on the huggingface pages. https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...
I've noticed that open models have made huge efficiency gains in the past several months. Some amount of that is explainable as architectural improvements but it seems quite obvious that a huge portion of the gains come from the heavy use of synthetic training data.
In this case roughly 33% of the training tokens are synthetically generated by a mix of other open weight models. I wonder if this trend is sustainable or if it might lead to model collapse as some have predicted. I suspect that the proliferation of synthetic data throughout open weight models has lead to a lot of the ChatGPT writing style replication (many bullet points, em dashes, it's not X but actually Y, etc).
- It’s all subjective. Personally I think it would border on useless for local inference but maybe some people are happy with low quality models at slow speeds.
- Even if they fire him he'll still have a huge amount of ownership in the company...
- Thank you for sharing this. Their point about the 128GB desktop mainboard being a bargain while their prices remain low rings true. I bought one a couple weeks ago because I've been wanting to build a beefy, efficient home server and I think this might be the last window of affordability for quite a while.
- Yes, a sufficiently advanced marrying of TTS and LLM could pass a lot of these tests. That kind of blurs the line between native voice model and not though.
You would need:
* A STT (ASR) model that outputs phonetics not just words
* An LLM fine-tuned to understand that and also output the proper tokens for prosody control, non-speech vocalizations, etc
* A TTS model that understands those tokens and properly generate the matching voice
At that point I would probably argue that you've created a native voice model even if it's still less nuanced than the proper voice to voice of something like 4o. The latency would likely be quite high though. I'm pretty sure I've seen a couple of open source projects that have done this type of setup but I've not tried testing them.
- It specifically says in the architecture docs for the agents platform that it's STT (ASR) -> LLM -> TTS
https://elevenlabs.io/docs/agents-platform/overview#architec...
- You can test it by asking it to: change the pitch of its voice, make specific sounds (like laughter), differentiate between words that are spelled the same but pronounced differently (record and record), etc.
- Does elevenlabs have a real-time conversational voice model? It seems like like their focus is largely on text to speech and speech to text. Which can approximate that type of thing but it's not at all the same as the native voice to voice that 4o does.
- Qwen's voice chat is nowhere near as good as ChatGPT's.
- Weirdly, I just tried it again and it seems to understand the difference between record and record just fine. Perhaps if there's heavy demand for voice chat, like after a new release, they load shed by using TTS to a smaller model.
However, It still doesn't seem capable of producing any of the sounds, like laughter, that I would expect from a native voice model.
- I just got my framework mainboard today. I haven't had a chance to set it up yet but from the research I've been doing it seems like Minimax M2 might be the best coding model for it at the moment. Similar performance to Devstral2 with only 10b active params.