Preferences

brynary
Joined 523 karma
Founder of Qlty Software

  1. We're at 100k LOC between the tests and code so far, running in about 500-600ms. We have a few CPU intensive tests (e.g. cryptography) which I recently moved over to the integration test suite.

    With no contention for shared resources and no async/IO, it just function calls running on Bun (JavaScriptCore) which measures function calling latency in nanoseconds. I haven't measured this myself, but the internet seems to suggest JavaScriptCore function calls can run in 2 to 5 nanoseconds.

    On a computer with 10 cores, fully concurrent, that would imply 10 billion nanoseconds of CPU time in one wall clock second. At 5 nanoseconds per function call, that would imply a theoretical maximum of 2 billion function calls per second.

    Real world is not going to be anywhere close to that performance, but where is the time going otherwise?

  2. Strong agreement with everything in this post.

    At Qlty, we are going so far as to rewrite hundreds of thousands of lines of code to ensure full test coverage, end-to-end type checking (including database-generated types).

    I’ll add a few more:

    1. Zero thrown errors. These effectively disable the type checker and act as goto statements. We use neverthrow for Rust-like Result types in TypeScript.

    2. Fast auto-formatting and linting. An AI code review is not a substitute for a deterministic result in sub-100ms to guarantee consistency. The auto-formatter is set up as a post-tool use Claude hook.

    3. Side-effect free imports and construction. You should be able to load all the code files and construct an instance of every class in your app without a network connection spawning. This is harder than it sounds and without it you run into all sorts of trouble with the rest.

    3. Zero mocks and shared global state. By mocks, I mean mocking frameworks which override functions on existing types or global. These effectively are injecting lies into the type checker.

    Should put to tsgo which has dramatically lowered our type checking latency. As the tok/sec of models keeps going up, all the time is going to get bottlenecked on tool calls (read: type checking and tests).

    With this approach we now have near 100% coverage with a test suite that runs in under 1,000ms.

  3. The most interesting parts of this to me are somewhat buried:

    - Claude Code has been added to iOS

    - Claude Code on the Web allows for seamless switching to Claude Code CLI

    - They have open sourced an OS-native sandboxing system which limits file system and network access _without_ needing containers

    However, I find the emphasis on limiting the outbound network access somewhat puzzling because the allowlists invariably include domains like gist.github.com and dozens of others which act effectively as public CMS’es and would still permit exfiltration with just a bit of extra effort.

  4. What benefits do you see from having the agent call a CLI like this via MCP as opposed to just executing the CLI as a shell command and taking action on the stdout?
  5. This looks great! Duplication and dead code are especially tricky to catch because they are not visible in diffs.

    Since you mentioned the implementation details, a couple questions come to mind:

    1. Are there any research papers you found helpful or influential when building this? For example, I need to read up on using tree edit distance for code duplication.

    2. How hard do you think this would be to generalize to support other programming languages?

    I see you are using tree-sitter which supports many languages, but I imagine a challenge might be CFGs and dependencies.

    I’ll add a Qlty plugin for this (https://github.com/qltysh/qlty) so it can be run with other code quality tools and reported back to GitHub as pass/fail commit statuses and comments. That way, the AI coding agents can take action based on the issues that pyscn finds directly in a cloud dev env.

  6. Historically, this kind of test optimization was done either with static analysis to understand dependency graphs and/or runtime data collected from executing the app.

    However, those methods are tightly bound to programming languages, frameworks, and interpreters so they are difficult to support across technology stacks.

    This approach substitutes the intelligence of the LLM to make educated guesses about what tests execute, to achieve the same goal of executing all of the tests that could fail and none of the rest (balancing a precision/recall tradeoff). What’s especially interesting about this to me is that the same technique could be applied to any language or stack with minimal modification.

    Has anyone seen LLMs in other contexts being substituted for traditional analysis to achieve language agnostic results?

  7. This rings similar to a recent post that was on the front page about red team vs. blue team.

    Before running LLM-generated code through yet more LLMs, you can run it through traditional static analysis (linters, SAST, auto-formatters). They aren’t flashy but they produce the same results 100% of the time.

    Consistency is critical if you want to pass/fail a build on the results. Nobody wants a flaky code reviewer robot, just like flaky tests are the worst.

    I imagine code review will evolve into a three tier pyramid:

    1. Static analysis (instant, consistent) — e.g using Qlty CLI (https://github.com/qltysh/qlty) as a Claude Code or Git hook

    2. LLMs — Has the advantage of being able to catch semantic issues

    3. Human

    We make sure commits pass each level in succession before moving on to the next.

  8. As an early June customer, this is a big disappointment. We specifically selected June over Mixpanel and Amplitude and were happy with it.

    I wish there was more honesty in the post about what happened. When you boil down the details, it basically just seems to say the founders decided they would rather become (the X-hundredth) engineers at Amplitude.

    Unless they were running out of money, I don’t see how they’ll have a “bigger impact” doing that instead of building a fresh take on the B2B analytics space.

  9. When using Claude Code cloud, in order to create signed commits, Claude uses the GitHub API to create commits instead of the git CLI
  10. This can be implemented at the line level if the linter is Git aware
  11. This closes a big feature gap. One thing that may not be obvious is that because of the way Claude Code generates commits, regular Git hooks won’t work. (At least, in most configurations.)

    We’ve been using CLAUDE.md instructions to tell Claude to auto-format code with the Qlty CLI (https://github.com/qltysh/qlty) but Claude a bit hit and miss in following them. The determinism here is a win.

    It looks like the events that can be hooked are somewhat limited to start, and I wonder if they will make it easy to hook Git commit and Git push.

  12. @jaimefjorge — Congrats on the launch!

    How would you compare this to the Qlty CLI (https://github.com/qltysh/qlty)?

    Do you plan to support CLI-based workflows for tools like Claude Code and linting?

  13. It's great to see auto-formatting continuing to become universal across all languages. As LLMs write more code, full auto-formatting helps keep diffs clean.

    For anyone looking to try dockerfmt, I just added a plugin to Qlty CLI, which is available in v0.508.0. The plugin took about ten minutes to add: https://github.com/qltysh/qlty/blob/main/qlty-plugins/plugin...

    Full disclosure: I'm the founder of Qlty, which produces a universal code linter and formatter, Qlty CLI (https://github.com/qltysh/qlty). It is completely free and published under a Fair Source license.

  14. This HBR article "Many Strategies Fail Because They’re Not Actually Strategies", while not entirely about metrics, has some great recommendations for how leaders can avoid these pitfalls:

    https://hbr.org/2017/11/many-strategies-fail-because-theyre-...

    Their top recommendations are: A) Communicate the logic behind what you are trying to achieve; B) Make strategy execution a two-way process, not top-down; C) Let selection happen organically, through systems that cause strong initiatives to rise up to to the top; D) Find ways to make change the default, to help move beyond the status quo and existing habits

  15. CEO of Code Climate here. We have a product Velocity (https://codeclimate.com/) which offers what we call Engineering Intelligence. There's some great discussion about the value and appropriate use of data in software engineering, so I thought I'd chime in.

    What we've seen is that engineers inherently want to be productive, and are happiest when they can work friction-free. Unfortunately, it can be quite difficult to get visibility into roadblocks that slow down developers (e.g. overly nitpicky code review, late changing product requirements, slow/flaky CI), especially for managers who are one or two levels removed from programming. These are situations where data-backed insights can be helpful for diagnosis.

    After diagnosing issues, with data or simply qualitative insights from a retrospective or 1:1, we also see teams sometimes struggle to set goals and achieve desired improvements. A common theme is the recurring retrospective item that people agree is important but doesn't seem to be resolved. When it comes to implementing improvements, data can be useful to make objectives concrete and make progress visible to the entire team.

    It’s important that metrics do not become the objectives themselves, but rather serve as a way to demonstrate the true outcome was achieved. Metrics also are not a strategy, and quantitative data cannot be used alone to understand performance of teams.

    When quantitative data is used properly in combination with qualitative information, strong communication, and trust, we’ve found the results can go beyond what can be achieved without metrics.

  16. (Founder of Code Climate here.)

    This is really good feedback, which we are addressing. We're going to change things up so that by default TODO issues are emitted as "Info" severity instead of "Minor", and we are going to change our PR integration so it does not fail PRs on "Info" issues.

    As an aside, on Tuesday we launched a Grep engine, which is much more powerful than FIXME: https://codeclimate.com/changelog/58ecfa297705a149790008b2

    It allows full customization of the emitted issues.

  17. Glad to hear you're thinking about this. Understanding the complexity provides power, but the tradeoff in this instance doesn't feel quite right.

    For example, etcd is a powerful primitive, and then more complex/sophisticated systems can be built on top of it.

    I wonder if an 80/20 solution that is simpler than fleet/systemd for pushing work into a CoreOS cluster would be a win, and then more complex systems (e.g. Kubernetes-esque orchestraction) could live on top of that.

  18. I'm very excited about this release. CoreOS, Docker and etcd are a great fit for one another. I love the separation of concerns that is provided.

    IMHO, the weakest part of CoreOS is fleet (https://github.com/coreos/fleet). Compared to the other components in the stack, it just feels very inelegant. The systemd configuration syntax is complex and ugly. I wonder if there will be work invested to upgrade fleet to something that is as elegant as e.g. etcd/Docker/CoreOS itself.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal