Profile: ehsanu1 - Hacker Neue

ehsanu1

Joined Apr 26, 2011 1,342 karma

ehsanul@ehsanul.com

I'm actually ehsanul (http://news.ycombinator.com/user?id=ehsanul) here on HN, but I switched Google accounts.

ehsanu1 1 day ago parent

But I doubt you can opt in to them training on that data coming in via OpenCode.
ehsanu1 1 day ago parent

I understand them not wanting to allow non-coding agents to use the subscription, but why specifically block another coding agent? Is the value Anthropic gets from users specifically using claude code that high? Is it about the training data opt-ins?
ehsanu1 Jan 1, 2026 parent

What exactly do you mean by custom tools here? Just cli tools accessible to the agent?
ehsanu1 Dec 12, 2025 parent

Doing something merely requires I/O. Brains wouldn't be doing much without that. A sufficiently accurate simulation of a fundamentally computational process is really just the same process.
ehsanu1 Oct 19, 2025 parent

The DB specifically, or the concept of event sourcing? Event sourcing is not a new approach and has a lot of similarities with temporal's approach, though temporal events are not necessarily business events and deterministic event replay is required with temporal. In the general case of event sourcing, arbitrary processing might be done on the event stream to produce some final state or do whatever needs to happen for your use case. As long as you're persisting the events and using events as the basis for your business logic and state, you're doing event sourcing.
I dont know anything about this specific DB though, if that was what you were wondering about, that's more of an implementation-level detail. Temporal server just uses regular mysql and supports mutiple storage backends.
ehsanu1 Oct 19, 2025 parent

This seems like a good template to generate synthetic data, with positive/negative examples, allowing an embedding model to be aligned more semantically to underlying concepts.
Anyways, I'd hope reranking models do better, have you tried those?
ehsanu1 Oct 14, 2025 parent

Do you assign different responsibilities to different LSP servers when there multiple I suppose?
ehsanu1 Oct 11, 2025 parent

Using Research->Plan->Implement flow is orthogonal, though I notice parts of those do exist as skills too. But you sometimes need to do other things too, e.g. debugging in the course of implementing or specific techniquws to improve brainstorming/researching.
Some of these skills are probably better as programmed workflows that the LLM is forced to go through to improve reliability/consistency, that's what I've found in my own agents, rather than using English to guide the LLM and trusting it to follow the prescribed set of steps needed. Some mix of LLMs (choosing skills, executing the fuzzy parts of them) and just plain code (orchestration of skills) seems like the best bet to me and what I'm pursuing.
ehsanu1 Oct 6, 2025 parent

Seeing some stats would be fun. I wonder what the amount of data is here. And the distribution would be interesting too, especially since some pages are archived at multiple points in time, and pages have been getting heavier these days.
ehsanu1 Oct 1, 2025 parent

I see no conflict between AGPL and SaaS: https://opensource.stackexchange.com/a/12988
ehsanu1 Sep 12, 2025 parent

Are these actually different models vs just different names from the open weights releases?
ehsanu1 Jul 31, 2025 parent

I'm reading: the difference is that this is an agent as a judge rather than an LLM as a judge, paired with more structured judging parameters. Is that right? Is the agent just a loop over each criterium, or is it also reflecting somehow on its judging or similar?
ehsanu1 Jul 9, 2025 parent

I believe that's exactly the point: it's too easy to violate constraints like not allowing multiple mutable references. Unsafe is meant for cases where the validity of the code is difficult to prove with rust's lifetime analysis, but can be abused to do much more than that.
ehsanu1 Jun 9, 2025 parent

It's hard to attribute PR merge rate with higher tool quality here. Another likely reason is level of complexity of task. Just looking at the first PR I saw from the github search for codex PRs, it was this one-line change that any tool, even years ago, could have easily accomplished: https://github.com/maruyamamasaya/yasukaribike/pull/20/files
ehsanu1 Apr 29, 2025 parent

Where I work, our legal department requires making use of LLMs only through our own contractual relationships with model providers. Given that, BYOK is table stakes for me at least.
Litellm is what we use internally, so we can support any LLM backend with any open source tool, and create virtual keys for each developer to monitor and manage usage limits etc.
ehsanu1 Apr 17, 2025 parent

gVisor
ehsanu1 Mar 22, 2025 parent

There seems to be a couple of field-specific journals of negative results for similar purposes. It seems like there should be value in citing negative results to inform current research. Perhaps if there were more journals dedicated to this, or a single one not limited to specific fields, there would still be some incentive to publish there, if the effort required was low enough (another area where AI might be applied: writing it up).
ehsanu1 Feb 27, 2025 parent

It's the other way around on their new SWE-Lancer benchmark, which is pretty interesting: GPT-4.5 scores 32.6%, while o3-mini scores 10.8%.
ehsanu1 Nov 27, 2024 parent

IMO just a rolling message history works for only the simplest of AI tools. Useful agents will tend towards much more complex state that extends into specific verticals/domains.
ehsanu1 Nov 27, 2024 parent

Essentially, you don't need to think about time and space. You just write more or less normal looking code, using the Temporal SDK. Except it actually can resume from arbitrarily long pauses, waiting as long as it needs to for some signal, without any special effort beyond using the SDK. You also automatically get great observability into all running workflows, seeing inputs and outputs at each step, etc.
The cost of this is that you have to be careful in creating new versions of the workflow that are backwards compatible, and it's hard to understand backcompat requirements and easy to mess up. And, there's also additional infra you need, to run the Temporal server. Temporal Cloud isn't cheap at scale but does reduce that burden.
ehsanu1 Nov 26, 2024 parent

Temporal makes this easy and works great for such use cases. It's what I'm using for my own AI agents.
ehsanu1 Sep 18, 2024 parent

What does Jetstream lack wrt queues/persistence?
ehsanu1 Sep 15, 2024 parent

That was my initial position too, but I think there is a search efficiency story here as well. CoT comes in many flavors and improves when tailored to the problem domain. If the LLM can instead figure out the right strategy to use to problem solve for a given problem, this may improve performance per compute vs discovering this at inference time.
Tailoring prompts is likely still the best way to maximize performance when you can, but in broader domains you'd work around this through strategies like asking the LLM to combine predefined reasoning modules, or creating multiple reasoning chains and merging/comparing them, explicit MCTS etc. I think those strategies will still be useful for a good while, but pieces of that search process, especially directing the search more efficiently, move to the LLMs over time as they get trained with this kind of data.
ehsanu1 May 26, 2024 parent

It's due to how the RLHF and instruction tuning was done. IIRC, even the builtin system prompt works this way in ChatGPT.
ehsanu1 Apr 27, 2024 parent

How much extra state and computation is it per token exactly? Can we account for the improvement in just those terms?
ehsanu1 Apr 27, 2024 parent

I've only read the abstract, but also find this strange. I wonder if this is just tapping into the computational chains that are already available when tokens are further away, due to the positional encodings being trained that way. If so, that makes the reasoning/modeling powers of LLMs even more impressive and inscrutable.
ehsanu1 Jan 13, 2024 parent

I've used usearch successfully for a small project: https://github.com/unum-cloud/usearch/
ehsanu1 Jan 8, 2024 parent

What kind of best case are you imagining? I don't quite understand why the very best case would be dystopian.
ehsanu1 Nov 30, 2023 parent

Has the title of the paper changed from what it was initially? It says "Have we built machines that think like people?" now, whereas the HN title is "Large language models lack deep insights or a theory of mind".
ehsanu1 Jul 1, 2023 parent

If it works, and it's a one-off script, why do I care?

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous