Profile: deepdarkforest

deepdarkforest

Joined Jan 28, 2025 391 karma

deepdarkforest Dec 26, 2025 parent

> Codex is more hands off, I personally prefer that over claude's more hands-on approach
Agree, and it's a nice reflection of the individual companie's goals. OpenAI is about AGI, and they have insane pressure from investors to show that that is still the goal, hence codex when works they could say look it worked for 5 hours! Discarding that 90% of the time it's just pure trash.
While Anthropic/Boris is more about value now, more grounded/realistic, providing more consistent hence trustable/intuitive experience that you can steer. (Even if Dario says the opposite). The ceiling/best case scenario of a claude code session is a bit lower than Codex maybe, but less variance.
deepdarkforest Dec 6, 2025 parent

> which does pose interesting questions over nvidia's throne...
> Zebra-Llama is a family of hybrid large language models (LLMs) proposed by AMD that...
Hmmm
deepdarkforest Dec 3, 2025 parent

It's definitely cool and engineering wise close to SOTA given lovable and all of the app generators.
But, assuming you are trying to be in between lovable and google, how are you not going to be steamrolled by google or perplexity etc the moment you get solid traction? Like, if your insight for v3 was that the model should make its own tools, so even less hardcoded, then i just dont see a moat or any vertical direction. What really is the difference?
deepdarkforest Dec 3, 2025 parent

All VC's have preferred shares, meaning in case of liquation like now, they get their investment back, and then the remainder gets shared.
Additionally, depending on round, they also have multiples, like 2x meaning they get at least 2x their investment before anyone else gets anything
deepdarkforest Nov 23, 2025 parent

because the secret is that the web runs on advertising/targeted recommendations. Brezos(tm) wants you to actively browse Ramazon so he can harvest your data, search patterns etc. Amazon and most sites like that are very not crawl friendly for this reason. Why would Brezos let Saltman get all the juicy preference data?
deepdarkforest Nov 22, 2025 parent

On the foundational level, test time compute(reasoning), heavy RL post training, 1M+ plus context length etc.
On the application layer, connecting with sandboxes/VM's is one of the biggest shifts. (Cloudfares codemode etc). Giving an llm a sandbox unlocks on the fly computation, calculations, RPA, anything really.
MCP's, or rather standardized function calling is another one.
Also, local llm's are becoming almost viable because of better and better distillation, relying on quick web search for facts etc.
deepdarkforest Nov 16, 2025 parent

A very obvious AI review with 80 points(?) plus a couple of more comments. Discussion also here https://old.reddit.com/r/MachineLearning/comments/1oyce03/d_...
3 points Nov 16, 2025

ICLR review with 40 weaknesses and 40 additional questions

1 comment deepdarkforest openreview.net
deepdarkforest Nov 13, 2025 parent

Their page itself looks classic v0/ai generated, that yellow/orange warning box, plus the general shadows/borders screams LLM slop etc. Is it too hard these days to spend 30 minutes to think about UI/user experience?
I actually like the idea, not sure about monetization.
It also requires access to all the data?? And it's not even open source.
deepdarkforest Nov 10, 2025 parent

They introduced pay as you go recently. The limits on that is similar to the plans, 1 million tokens per minute, so if you stack a few keys and do a simple load balancing with redis, can cover a decent amount of traffic with no upfront cost. Eventually we would have to go enterprise though yes!
deepdarkforest Nov 10, 2025 parent

Great feedback thanks! We have added a synthetic e-commerce dataset as an example when you sign up so you can test it without your data first. Will also add a demo video ASAP.
deepdarkforest Nov 9, 2025 parent

I'm working on Flavia, an ultra-low latency voice AI data analyst that can join your meetings. You can throw in data(csv's, postgres db's, bigquery, posthog analytics for now) and you just talk and ask questions. Using cerebras(2000 tokens per second) and very low latency sandboxes on the fly, you can get back charts/tables/analysis in under 1 second. (excluding time of the actual SQL query if you are doing bigquery).
She can also join your google meet or teams meetings, share her screen and then everyone in the meeting can ask questions and see live results. Currently being used by product managers and executives for mainly analytics and data science use cases.
We plan to open-source it soon if there is demand. Very fast voice+actions is the future imo
https://www.tryflavia.com/
deepdarkforest Nov 1, 2025 parent

Breaking news: For profit company chases profit, briefly pretends it's not while it is
deepdarkforest Nov 1, 2025 parent

This has nothing to do with what i said. I said they are addicted. The free limits are designed this way. If openai suddenly removed the free plan, i guarantee you a lot of people would buy. They dont have an alternative they cannot think independently anymore
deepdarkforest Oct 30, 2025 parent

Everyone. People are becoming dependent on chatgpt. They literally cannot function professionally or even socially without it. They will pay their last 20-30 dollars if needed. It's literally like a drug especially when it's asking you if you want to followup/continue.
deepdarkforest Oct 28, 2025 parent

If my mom gives me 1000 dollars for 1% of my lemonade stand, that doesn't mean my stand is worth 100k. Tether is in talks with investors to mayb raise 20b at a 500b valuation. Keep in mind also that crypto investors overvalue companies to create the hype and then lobby for better regulations etc. It doesn't mean at all that someone would be interested to buy 100% of tether for 500b. Now, if they were public is a different story, like Tesla etc
deepdarkforest Oct 8, 2025 parent

The whole point of embeddings and tokens are that they are a compressed version of text, a lower dimensionality. now, how low depends on performance, lower amount of vectors=more lossy (usually). https://huggingface.co/spaces/mteb/leaderboard
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
deepdarkforest Oct 5, 2025 parent

Context editing is interesting because most agents work on the assumption that KV cache is the most important thing to optimise and are very hesitant to remove parts of the context during work. It also sometimes introduces hallucinations, because parts of the context are with the assumption that eg tool results are there, but theyre not. Example Manus [0]. Eg, read file A, make changes on A. Then prompt on some more changes. If you now remove the "read file A" tool results, not only you break the cache, but in my own agent implementations(on gpt 5 at least) can hallucinate now since my prompt etc all naturally point to the content of the tool still beeing there.
Plus, the model got trained and RLed with a continuous context, except if they now tune it with messing with the context as well.
https://manus.im/blog/Context-Engineering-for-AI-Agents-Less...
2 points Oct 4, 2025

Economics and AI (Tom Cunningham)

0 comments deepdarkforest github.io
deepdarkforest Sep 24, 2025 parent

Eh i mean often innovation is made just by letting a lot of fragmented, small teams of cracked nerds trying out stuff. It's way too early in the game. I mean, qwens release statements have anime etc. IBM, Bell, Google, Dell, many did it similarly, letting small focused teams having many attempts at cracking the same problem. All modern quant firms are doing basically the same as well. Anthropic is actually an exception, more like Apple.
deepdarkforest Sep 23, 2025 parent

The Chinese are doing what they have been doing to the manufacturing industry as well. Take the core technology and just optimize, optimize, optimize for 10x the cost/efficiency. As simple as that. Super impressive. These models might be bechmaxxed but as another comment said, i see so many that it might as well be the most impressive benchmaxxing today, if not just a genuinely SOTA open source model. They even released a closed source 1 trillion parameter model today as well that is sitting on no3(!) on lm arena. EVen their 80gb model is 17th, gpt-oss 120b is 52nd https://qwen.ai/blog?id=241398b9cd6353de490b0f82806c7848c5d2...
deepdarkforest Sep 23, 2025 parent

1. I was not talking about official MCP servers, those are often even free. Im talking about pricing of other devtools for aggregating tools/mcp's. I think this is an obvious space to build i agree, i just worry about differentiation. Its a search space not as big as web search, or complex (order doesnt matter).
2. Yes, i see, this is what i meant by agentic search. Essentially is a tiny subagent, taking list of tools in and out the relevant ones. Still implementable in 5 mins. But i guess if the experience is very smooth enterprise might pay?
deepdarkforest Sep 23, 2025 parent

1. Oh okay great, maybe clarify it in the pricing page? That mcp server call means just execute. But its' still 10x more expensive right?
2. From what i understand it's just nested search right? It is not anything different, if you do flat or embedding search or fuzzy/agentic nested is a choice for sure, but Im just saying not sure how defensible this is, if all other mcp competitors or even users themselves put in a nested search tool
deepdarkforest Sep 23, 2025 parent

1. Interesting approach, but the pricing seems 1-2 orders of magnitude too expensive. For your example for slack, It contains 4 calls for an action. Pricing shows 100 dollars per 10k cals, so 1 cent per call. This means, for an agent that lets say does 4 actions, so and your examples show at least 3-4 api calls per action , it's already 12 cents? Similar tools like composio.dev have 200k calls for 29 dollars, so around 70x cheaper (both for the cheapest tier). Even with needing only 1 call for subsequent calls, 1 cent per single api call sounds wrong, at least for our use case it does not economic sense to pay 5-10 cents on top of llm costs on every user query. Apologies if I'm missing something!
2. Could this not be replicated by others by just handmaking a fuzzy search tool on the tools? I think this is the approach that will win, even with rag for lets say 10k plus tools maybe in the future, but not sure how much differentiation this is in the long term, i've made this search tool myself a couple of times already
deepdarkforest Sep 17, 2025 parent

Of course not! But usually, you can quantify metrics for quality, like uptime, lost transactions, response time, throughput etc. Then you can have accountability, and remediate. Even for other bugs, you can often reproduce and show clearly the impact. But in this case, other than internal benchmarks, you cannot really prove it. There is no accountability yet
deepdarkforest Sep 17, 2025 parent

Wow. Sneaky. They do not even state the rate of impact for the XLA bug afaik, which affected everyone, not just claude code users, very vague. Interesting.
Claude code made almost half a billion so far[1] (>500m in ARR and its like 9 months old) , and 30% of all users have been impacted at least once, just from the first routing bug. Scary stuff.
Their post mortem is basically "evaluations are hard, we relied on vibe checking, now we are going to have even more frequent vibe checking". I believe it was indeed unintentional, but in the future where investor's money wont come down from the skies, serving distilled models will be very tempting. And you can not be liable to any SLA currently, it's just vibes. I wonder how enterprise vendors are going to deal with this going forward, you cannot just degrade quality without client or vendor even being able to really prove it.
[1][https://www.anthropic.com/news/anthropic-raises-series-f-at-...]
deepdarkforest Sep 10, 2025 parent

Looks like it’s time to go outside and touch some grass again
deepdarkforest Sep 9, 2025 parent

They will probably also release sonnet 4.2 or something soon to make people jump back again to try it and hopefully restick
deepdarkforest Aug 31, 2025 parent

Congrats! Doesn't replit have an integrated database as well? Lovable has supabase, and I'm pretty sure Base44 as well, plus other agent integrations.

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous