- davidsainezEver heard of Debian or Linux?
- Excited to put this through its paces. It seems most directly comparable to GPT-OSS-20B. Comparing their numbers on the Together API: Trinity Mini is slightly less expensive ($0.045/$0.15 v $0.05/$0.20) and seems to have better latency and throughput numbers.
- Why would that undermine its integrity? AFAICT there are a selection of "open" US-based LLMs to choose from: Google's Gemma, Microsoft's Phi, Meta's LLAMA, and OpenAI's GPT-OSS. With Phi licensed under MIT and GPT-OSS under Apache 2.
- I find the existence of opennext convincing proof of lock-in: https://blog.logrocket.com/opennext-next-js-portability/
Personally, I don’t bother with nextjs at all.
- But to determine its merit a maintainer must first donate their time and read through the PR.
LLMs reduce the effort to create a plausible PR down to virtually zero. Requiring a human to write the code is a good indicator that A. the PR has at least some technical merit and B. the human cares enough about the code to bother writing a PR in the first place.
- Not wanting to review and maintain code that someone didn't even bother to write themselves is childish?
- > works flawlessly
> intermittent outages
Those seem like conflicting statements to me. Last outage was only 13 days ago: https://www.hackerneue.com/item?id=45915731.
Also, there have been increasing reports of open source maintainers dealing with LLM generated PRs: https://www.hackerneue.com/item?id=46039274. GitHub seems perfectly positioned to help manage that issue, but in all likelihood will do nothing about it: '"Either you have to embrace the Al, or you get out of your career," Dohmke wrote, citing one of the developers who GitHub interviewed.'
I used to help maintain a popular open source library and I do not envy what open source maintainers are now up against.
- AFAICT, kimi k2 was the first to apply this technique [1]. I wonder if Anthropic came up with it independently or if they trained a model in 5 months after seeing kimi’s performance.
1: https://www.decodingdiscontinuity.com/p/open-source-inflecti...
- I never claimed that it was being done in secrecy. Here is another example: https://groq.com/blog/inside-the-lpu-deconstructing-groq-spe....
I have seen multiple people mention openrouter multiple times here on HN: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
Again, I'm not claiming malicious intent. But model performance depends on a number of factors and the end-user just sees benchmarks for a specific configuration. For me to have a high degree of confidence in a provider I would need to see open and continuous benchmarking of the end-user API.
- There are well documented cases of performance degradation: https://www.anthropic.com/engineering/a-postmortem-of-three-....
The real issue is that there is no reliable system currently in place for the end user (other than being willing to burn the cash and run your own benchmarks regularly) to detect changes in performance.
It feels to me like a perfect storm. A combination of high cost of inference, extreme competition, and the statistical nature of LLMs make it very tempting for a provider to tune their infrastructure in order to squeeze more volume from their hardware. I don't mean to imply bad faith actors: things are moving at breakneck speed and people are trying anything that sticks. But the problem persists, people are building on systems that are in constant flux (for better or for worse).
- Thanks for sharing. I hear people make extraordinary claims about LLMs (not saying that is what you are doing) but it's hard to evaluate exactly what they mean without seeing the results. I've been working on a similar project (a static analysis tool) and I've been using sonnet 4.5 to help me build it. On cursory review it produces acceptable results but closer inspection reveals obvious performance or architectural mistakes. In its current state, one-shotted llm code feels like wood filler: very useful in many cases but I would not trust it to be load bearing.
- Access to virtually infinite cash had more to do with Android's success than the source being proprietary.
- Golang I think (mostly) successfully resisted this temptation
- Doesn’t have to be an in house system, just basic redundancy is fine. eg a simple hook that pushes to both GitHub and gitlab
- Sure, we are still closer to alchemy than materials science, but its still early days. But consider this blogpost that was on the front page today: https://www.levs.fyi/blog/2-years-of-ml-vs-1-month-of-prompt.... The table on the bottom shows a generally steady increase in performance just by iterating on prompts. It feels like we are on the path to true engineering.
- Thanks for sharing! I am working on a rag engine and that document provides great guidance.
And, agreed, each individual technique seems marginal but they really add up. What seems to be missing is some automated layer that determines the best way to chunk documents into embeddings. My use case is mostly normalized mostly technical documents so I have a pretty clear idea of how to chunk to preserve semantics. But I imagine that for generalized documents it is a lot trickier.
- > We tried multiple vectorization and classification approaches. Our data was heavily imbalanced and skewed towards negative cases. We found that TF-IDF with 1-gram features paired with XGBoost consistently emerged as the winner.
- web version sonnet is down for me as well. https://status.claude.com/