- > which does pose interesting questions over nvidia's throne...
> Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that...
Hmmm
- It's definitely cool and engineering wise close to SOTA given lovable and all of the app generators.
But, assuming you are trying to be in between lovable and google, how are you not going to be steamrolled by google or perplexity etc the moment you get solid traction? Like, if your insight for v3 was that the model should make its own tools, so even less hardcoded, then i just dont see a moat or any vertical direction. What really is the difference?
- All VC's have preferred shares, meaning in case of liquation like now, they get their investment back, and then the remainder gets shared.
Additionally, depending on round, they also have multiples, like 2x meaning they get at least 2x their investment before anyone else gets anything
- because the secret is that the web runs on advertising/targeted recommendations. Brezos(tm) wants you to actively browse Ramazon so he can harvest your data, search patterns etc. Amazon and most sites like that are very not crawl friendly for this reason. Why would Brezos let Saltman get all the juicy preference data?
- On the foundational level, test time compute(reasoning), heavy RL post training, 1M+ plus context length etc.
On the application layer, connecting with sandboxes/VM's is one of the biggest shifts. (Cloudfares codemode etc). Giving an llm a sandbox unlocks on the fly computation, calculations, RPA, anything really.
MCP's, or rather standardized function calling is another one.
Also, local llm's are becoming almost viable because of better and better distillation, relying on quick web search for facts etc.
- A very obvious AI review with 80 points(?) plus a couple of more comments. Discussion also here https://old.reddit.com/r/MachineLearning/comments/1oyce03/d_...
- 3 points
- Their page itself looks classic v0/ai generated, that yellow/orange warning box, plus the general shadows/borders screams LLM slop etc. Is it too hard these days to spend 30 minutes to think about UI/user experience?
I actually like the idea, not sure about monetization.
It also requires access to all the data?? And it's not even open source.
- They introduced pay as you go recently. The limits on that is similar to the plans, 1 million tokens per minute, so if you stack a few keys and do a simple load balancing with redis, can cover a decent amount of traffic with no upfront cost. Eventually we would have to go enterprise though yes!
- Great feedback thanks! We have added a synthetic e-commerce dataset as an example when you sign up so you can test it without your data first. Will also add a demo video ASAP.
- I'm working on Flavia, an ultra-low latency voice AI data analyst that can join your meetings. You can throw in data(csv's, postgres db's, bigquery, posthog analytics for now) and you just talk and ask questions. Using cerebras(2000 tokens per second) and very low latency sandboxes on the fly, you can get back charts/tables/analysis in under 1 second. (excluding time of the actual SQL query if you are doing bigquery).
She can also join your google meet or teams meetings, share her screen and then everyone in the meeting can ask questions and see live results. Currently being used by product managers and executives for mainly analytics and data science use cases.
We plan to open-source it soon if there is demand. Very fast voice+actions is the future imo
- Breaking news: For profit company chases profit, briefly pretends it's not while it is
- This has nothing to do with what i said. I said they are addicted. The free limits are designed this way. If openai suddenly removed the free plan, i guarantee you a lot of people would buy. They dont have an alternative they cannot think independently anymore
- Everyone. People are becoming dependent on chatgpt. They literally cannot function professionally or even socially without it. They will pay their last 20-30 dollars if needed. It's literally like a drug especially when it's asking you if you want to followup/continue.
- If my mom gives me 1000 dollars for 1% of my lemonade stand, that doesn't mean my stand is worth 100k. Tether is in talks with investors to mayb raise 20b at a 500b valuation. Keep in mind also that crypto investors overvalue companies to create the hype and then lobby for better regulations etc. It doesn't mean at all that someone would be interested to buy 100% of tether for 500b. Now, if they were public is a different story, like Tesla etc
- The whole point of embeddings and tokens are that they are a compressed version of text, a lower dimensionality. now, how low depends on performance, lower amount of vectors=more lossy (usually). https://huggingface.co/spaces/mteb/leaderboard
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
- Context editing is interesting because most agents work on the assumption that KV cache is the most important thing to optimise and are very hesitant to remove parts of the context during work. It also sometimes introduces hallucinations, because parts of the context are with the assumption that eg tool results are there, but theyre not. Example Manus [0]. Eg, read file A, make changes on A. Then prompt on some more changes. If you now remove the "read file A" tool results, not only you break the cache, but in my own agent implementations(on gpt 5 at least) can hallucinate now since my prompt etc all naturally point to the content of the tool still beeing there.
Plus, the model got trained and RLed with a continuous context, except if they now tune it with messing with the context as well.
https://manus.im/blog/Context-Engineering-for-AI-Agents-Less...
- 2 points
- Eh i mean often innovation is made just by letting a lot of fragmented, small teams of cracked nerds trying out stuff. It's way too early in the game. I mean, qwens release statements have anime etc. IBM, Bell, Google, Dell, many did it similarly, letting small focused teams having many attempts at cracking the same problem. All modern quant firms are doing basically the same as well. Anthropic is actually an exception, more like Apple.
- The Chinese are doing what they have been doing to the manufacturing industry as well. Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive. These models might be bechmaxxed but as another comment said, i see so many that it might as well be the most impressive benchmaxxing today, if not just a genuinely SOTA open source model. They even released a closed source 1 trillion parameter model today as well that is sitting on no3(!) on lm arena. EVen their 80gb model is 17th, gpt-oss 120b is 52nd https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2...
- 1. I was not talking about official MCP servers, those are often even free. Im talking about pricing of other devtools for aggregating tools/mcp's. I think this is an obvious space to build i agree, i just worry about differentiation. Its a search space not as big as web search, or complex (order doesnt matter).
2. Yes, i see, this is what i meant by agentic search. Essentially is a tiny subagent, taking list of tools in and out the relevant ones. Still implementable in 5 mins. But i guess if the experience is very smooth enterprise might pay?
- 1. Oh okay great, maybe clarify it in the pricing page? That mcp server call means just execute. But its' still 10x more expensive right?
2. From what i understand it's just nested search right? It is not anything different, if you do flat or embedding search or fuzzy/agentic nested is a choice for sure, but Im just saying not sure how defensible this is, if all other mcp competitors or even users themselves put in a nested search tool
- 1. Interesting approach, but the pricing seems 1-2 orders of magnitude too expensive. For your example for slack, It contains 4 calls for an action. Pricing shows 100 dollars per 10k cals, so 1 cent per call. This means, for an agent that lets say does 4 actions, so and your examples show at least 3-4 api calls per action , it's already 12 cents? Similar tools like composio.dev have 200k calls for 29 dollars, so around 70x cheaper (both for the cheapest tier). Even with needing only 1 call for subsequent calls, 1 cent per single api call sounds wrong, at least for our use case it does not economic sense to pay 5-10 cents on top of llm costs on every user query. Apologies if I'm missing something!
2. Could this not be replicated by others by just handmaking a fuzzy search tool on the tools? I think this is the approach that will win, even with rag for lets say 10k plus tools maybe in the future, but not sure how much differentiation this is in the long term, i've made this search tool myself a couple of times already
- Of course not! But usually, you can quantify metrics for quality, like uptime, lost transactions, response time, throughput etc. Then you can have accountability, and remediate. Even for other bugs, you can often reproduce and show clearly the impact. But in this case, other than internal benchmarks, you cannot really prove it. There is no accountability yet
- Wow. Sneaky. They do not even state the rate of impact for the XLA bug afaik, which affected everyone, not just claude code users, very vague. Interesting.
Claude code made almost half a billion so far[1] (>500m in ARR and its like 9 months old) , and 30% of all users have been impacted at least once, just from the first routing bug. Scary stuff.
Their post mortem is basically "evaluations are hard, we relied on vibe checking, now we are going to have even more frequent vibe checking". I believe it was indeed unintentional, but in the future where investor's money wont come down from the skies, serving distilled models will be very tempting. And you can not be liable to any SLA currently, it's just vibes. I wonder how enterprise vendors are going to deal with this going forward, you cannot just degrade quality without client or vendor even being able to really prove it.
[1][https://www.anthropic.com/news/anthropic-raises-series-f-at-...]
- Looks like it’s time to go outside and touch some grass again
- They will probably also release sonnet 4.2 or something soon to make people jump back again to try it and hopefully restick
- Congrats! Doesn't replit have an integrated database as well? Lovable has supabase, and I'm pretty sure Base44 as well, plus other agent integrations.
Agree, and it's a nice reflection of the individual companie's goals. OpenAI is about AGI, and they have insane pressure from investors to show that that is still the goal, hence codex when works they could say look it worked for 5 hours! Discarding that 90% of the time it's just pure trash.
While Anthropic/Boris is more about value now, more grounded/realistic, providing more consistent hence trustable/intuitive experience that you can steer. (Even if Dario says the opposite). The ceiling/best case scenario of a claude code session is a bit lower than Codex maybe, but less variance.