Preferences

crazylogger
Joined 151 karma

  1. Proper vibe coding should involves tons of vibe refactoring.

    I'd say spending at least a quarter of my vibe coding time on refactoring + documentation refresh to ensure the codebase looking impeccable is the only way my projects can work at all long term. We don't want to confuse the coding agent.

  2. From a couple hours of usage in the CLI, 5.2-codex seems to burn through my plan's limit noticeably faster than 5.1-codex. So I guess the usage limit is a set dollar amount of API credits under the hood.
  3. The way you get structured output with Claude prior to this is via tool use.

    IMO this was the more elegant design if you think about it: tool calling is really just structured output and structured output is tool calling. The "do not provide multiple ways of doing the same thing" philosophy.

  4. This is just personal experience + reddit anecdotes. I've been using CC from day one (when API pricing was the only way to pay for CC), then I've been on the $20 Pro plan and am getting a solid $5+ worth of usage in each 5h session, times 5-10 sessions per week (so an overall 5-10x subsidy over one month.) And I extrapolated that $200 subscribers must be getting roughly 10x Pro's usage. I do feel the actual limit fluctuates each week as Claude Code engage in this new subsidy war with OAI Codex though.

    My theory is this:

    - we know from benchmarks that open-weight models like Deepseek R1 and Kimi K2's capabilities are not far behind SOTA GPT/Claude

    - open-weight API pricing (e.g. on openrouter) is roughly 1/10~1/5 that of GPT/Claude

    - users can more or less choose to hook their agent CLI/IDEs to either closed or open models

    If these points are true, then the only reason people are primarily on CC & Codex plans is because they are subsidized by at least 5~10x. When confronted with true costs, users will quickly switch to the lowest inference cost vendor, and we get perfect competition + zero margin for all vendors.

  5. Anecdotally, a Max subscriber gets something like $100 worth of usage per day. The more people use Claude Code, the more Anthropic loses, so it sounds like a classical "selling a dollar for 85 cents" business to me.

    As soon as users are confronted with their true API cost, the appearance of this being a good business falls apart. At the end of the day, there is no moat around large language models - OpenAI, Anthropic, Google, DeepSeek, Alibaba, Moonshot... any company can make a SOTA model if they wish, so in the long run it's guaranteed to be a race to the bottom where nobody can turn a profit.

  6. https://pure.md is exactly what you're looking for.

    But stripping complex formats like html & pdf down to simple markdown is a hard problem. It's nearly impossible to infer what the rendered page looks like by looking at the raw html / pdf code. https://github.com/mozilla/readability helps but it often breaks down over unconventional div structures. I heard the state of the art solution is using multimodal LLM OCR to really look at the rendered page and rewrite the thing in markdown.

    Which makes me wonder: how did OpenAI make their model read pdf, docx and images at all?

  7. I think the OP's point is that all those requirements are to be implemented outside the LLM layer, i.e. we don't need to conceive of any new model architecture. Even if LLMs don't progress any further beyond GPT-5 & Claude 4, we'll still get there.

    Take memory for example: give LLM a persistent computer and ask it to jot down its long-term memory as hierarchical directories of markdown documents. Recalling a piece of memory means a bunch of `tree` and `grep` commands. It's very, very rudimentary, but it kinda works, today. We just have to think of incrementally smarter ways to query & maintain this type of memory repo, which is a pure engineering problem.

  8. We have CONTRIBUTING.md for that. Seems to me the author just doesn't know about it?
  9. Today’s AI systems probably won’t excel, but they won’t completely fail either.

    Basically give the LLM a computer to do all kinds of stuff against the real world, kick it off with a high level goal like “build a startup”.

    The key is to instruct it to manage its own memory in its computer, and when context limit inevitably approaches, programmatically interrupt the LLM loop and instruct it to jot down everything it has for its future self.

    It already kinda works today, and I believe AI systems a year from now will excel at this:

    https://dwyer.co.za/static/claude-code-is-all-you-need.html

    https://www.anthropic.com/research/project-vend-1

  10. Yes, number-wise the wealth gap between the top and median is bigger than ever, but the actual quality-of-life difference has never been smaller — Elon and I probably both use an iPhone, wear similar T-shirts, mostly eat the same kind of food, get our information & entertainment from Google/ChatGPT/Youtube/X.

    I fully expect the distribution to be even more extreme in an ultra-productive AI future, yet nonetheless, the bottom 50% would have their every need met in the same manner that Elon has his. If you ever want anything or have something more ambitious in mind, say, start a company to build something no one’s thought of — you’d just call a robot to do it. And because the robots are themselves developed and maintained by an all-robot company, it costs nobody anything to provide this AGI robot service to everyone.

    A Google-like information query would have been unimaginably costly to execute a hundred years ago, and here we are, it’s totally free because running Google is so automated. Rich people don't even get a better Google just because they are willing to pay - everybody gets the best stuff when the best stuff costs 0 anyway.

  11. As long as we have a free market, nobody gets to say, “No, you shouldn’t have robots freeing you from work.”

    Individual people will decide what they want to build, with whatever tools they have. If AI tools become powerful enough that one-person companies can build serious products, I bet there will be thousands of those companies taking a swing at the “next big thing” like humanoid robots. It’s a matter of time those problems all get solved.

  12. AI services are widely available, and humans have agency. If my boss can outsource everything to AI and run a one-person company, soon everyone will be running their own one-person companies to compete. If OpenAI refuses to sell me AI, I’ll turn to Anthropic, DeepSeek, etc.

    AI is raising individual capability to a level that once required a full team. I believe it’s fundamentally a democratizing force rather than monopolizing. Everybody will try and get the most value out of AI, nobody holds the power to decide whether to share or not.

  13. My view is AGI will dramatically reduce cost of R&D in general, then developing humanoid robot will be an easy task - since it's all AI systems who will be doing the development.
  14. I wouldn’t worry about job safety when we have such utopian vision as the elimination of all human labor in our sight.

    Not only will AI run the company, it will run the world. Remember: a product/service only costs money because somewhere down the assembly line or in some office, there are human workers who need to feed their family. If AI can help gradually reduce human involvement to 0, with good market competition (AI can help with this too - if AI can be capable CEOs, starting your business will be insanely easy,) and we’ll get near absolute abundance. Then humanity will be basically printing any product & service on demand at 0 cost like how we print money today.

    I wouldn’t even worry about unequal distribution of wealth, because with absolute abundance, any piece of the pie is an infinitely large pie. Still think the world isn’t perfect in that future? Just one prompt, and the robot army will do whatever it takes to fix it for you.

  15. > When it’s able to create code that compiles, the code is invariably inefficient and ugly.

    At the end of the day this is a trivial problem. When Claude Code finishes a commit, just spin up another Claude Code instance and say "run a git diff, find and fix inefficient and ugly code, and make sure it still compiles."

  16. If payments are always approved by the payer with their PIN / FaceID, then the idea of a fraudulent charge is just undefined.

    Like you hand cash to someone. The transaction is done at the moment money changes hand. You don't get to call someone to snatch the money back to you against the payee's will.

    For online purchase, for example, buyer pays the marketplace (e.g. taobao.com) to temporarily hold the money -> seller ships the goods -> buyer confirms goods are received -> marketplace pays seller. If there is a question, you take it to the marketplace to sort things out according to marketplace & seller policy. Either way, the payment provider doesn't concern itself with any of this - it only routes money according to payer's request.

  17. Circumvention means you're following the law after all, i.e. you're circumventing, not breaking it.

    Every other Youtuber these days is thanking NordVPN for sponsoring their channel then proceeds to walk through specifically how to use it to view geoblocked Netflix content, and they (and NordVPN!) are fine.

  18. https://cubox.cc

    The greatest feature is that it limits you to 200 items saved on free tier.

    I also use https://github.com/yfzhou0904/go-to-kindle to email articles to kindle for reading on the go.

  19. Depends a lot on the way people use them.

    If you discusses a plan with CC well upfront, covering all integration points where things might go off rail, perhaps checkpoint the plan in a file then start a fresh CC session for coding, then CC is usually going to one shot a 2k-LoC feature uninterrupted, which is very token efficient.

    If the plan is not crystal clear, people end up arguing with CC over this and that. Token usage will be bad.

  20. This is like asking me "how much of your software is built by the compiler?" -> the answer is 100%.

    Ask "how much did you build then?" -> also 100%.

    The compiler and I operate on different layers.

  21. This is exactly how human society scaled from the cavemen era to today. We didn't need to make our brains bigger in order to get to the modern industrial age - increasingly sophisticated tool use and organization was all we did.

    It only mattered that human brains are just big enough to enable tool use and organization. It ceased to matter once our brains are past a certain threshold. I believed LLMs are past this threshold as well (it has not 100% matched human brain or ever will, but this doesn't matter.)

    An individual LLM call might lack domain knowledge, context and might hallucinate. The solution is not to scale the individual LLM and hope the problems are solved, but to direct your query to a team of LLMs each playing a different role: planner, designer, coder, reviewer, customer rep, ... each working with their unique perspective & context.

  22. I think next year's AI benchmarks are going to be like this project: https://www.anthropic.com/research/project-vend-1

    Give the AI tools and let it do real stuff in the world:

    "FounderBench": Ask the AI to build a successful business, whatever that business may be - the AI decides. Maybe try to get funded by YC - hiring a human presenter for Demo Day is allowed. They will be graded on profit / loss, and valuation.

    Testing plain LLM on whiteboard-style question is meaningless now. Going forward, it will all be multi-agent systems with computer use, long-term memory & goals, and delegation.

  23. Me and my LLM buddy together understand exactly how computers work!
  24. Humans are checked against various rules and laws (often carried out by other humans.) So this is how it's going to be implemented in an "AI organization" as well. Nothing strange about this really.

    LLM is easier to work with because you can stop a bad behavior before it happens. It can be done either with deterministic programs or using LLM. Claude Code uses a LLM to review every bash command to be run - simple prefix matching has loopholes.

  25. I can see this makes sense for simple { user_query -> search -> llm_answer } usage, where tool use is only a means to retrieve background info.

    For complex real-world agent flows though, tool use is often the only thing that the LLM is expected to do. Like in a coding agent:

    ```

    User: Develop a program to ...

    Agent: Bash("touch main.py") > 0, ""

    Agent: Edit("main.py", initial_patch) > 0, ""

    Agent: Bash("python main.py") > 1, "SyntaxError: ..."

    Agent: Edit("main.py", fix_patch) > 0, ""

    Agent: Bash("python main.py") > 0, "OK"

    Agent: FINISH

    ```

    Here, tool selection (+ writing the arguments) is actually the whole job. It's also easy to see that if you omit even one of the tool use records in the middle, the agent wouldn't work at all.

  26. US I-94: https://i94.cbp.dhs.gov/search/history-search

    Knowing the passport number + name + birthday gives you access to someone's US travel history.

  27. Ask the AI to document each module in a 100-line markdown. These should be very high level, don't contain any detail, but just include pointers to relevant files for AI to find out by itself. With a doc as the starting point, AI will have context to work on any module.

    If the module just can't be documented in this way in under 100 lines, it's a good time to refactor. Chances are if Claude's context window is not enough to work with a particular module, a human dev can't either. It's all about pointing your LLM precisely at the context that matters.

  28. I imagine Iran will just pick a 1000-meter mountain to dig under then?
  29. Sub-agent is another LLM loop that you simply import and provide as a tool to your orchestrator LLM. For example in Claude Code, sub-agent is a tool called "Task(<description>)" made available to the main LLM (the one that you chat with) along with other tools like patch_file and web_search.

    Concurrent tool call is when LLM writes multiple tool calls instead of one, and you can program your app to execute those sequentially or concurrently. This is a trivial concept.

    The "agent framework" layer here is so thin it might as well don't exist, and you can use Anthropic/OAI's sdk directly. I don't see a need for fancy graphs with circles here.

  30. It’s not a clear line though. Compilers have been writing programs for us. The plaintext programming language code that we talk about is but a spec for the actual program.

    From this perspective, English-as-spec is a natural progression in the direction we’ve been going all along.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal