Preferences

pcwelder
Joined 348 karma

  1. ```

    from anthropic.types import MessageParam

    data: list[MessageParam] = [{"role": "user", "content": [{"type": "text", "text": ""}]}]

    ```

    This for example works both in mypy and pyright. (Also autocompletion of typedict keys / literals from pylance is missing)

  2. Displaying inferred types inline is a killer feature (inspired from rust lang server?). It was a pleasant surprise!

    It's fast too as promised.

    However, it doesn't work well with TypedDicts and that's a show-stopper for us. Hoping to see that support soon.

  3. To those who are not deterred and feel yolo mode is worth the risk, there are two patterns that should perk your ears up.

    - Cleanup or deletion tasks. Be ready to hit ctrl c anytime. Led to disastrous nukes in two reddit threads.

    - Errors impacting the whole repo, especially those that are difficult to solve. In such cases if it decides to reset and redo, it may remove sensitive paths as well.

    It removed my repo once because "it had multiple problems and was better to it write from scratch".

    - Any weird behavior, "this doesn't seem right", "looks like shell isn't working correctly" indicative of application bug. It might employ dangerous workarounds.

  4. It just fetched the HTML and replicated it. The usage of table is a giveaway.

    Any LLM with browser tool can do it (Kombai one shots it too for example), because it's just cheating.

  5. But that's cheating because it then has the source code containing the table and its styles.

    I can confirm that this is what it does.

    And if you ask it to not use tables, it cleverly uses div with the same layout as the table instead.

  6. In RNNs and Transformers we obtain probability distribution of target variable directly and sample using methods like top-k or temprature sampling.

    I don't see the equivalence to MCMC. It's not like we have a complex probability function that we are trying to sample from using a chain.

    It's just logistic regression at each step.

  7. I ϲаn guаrаntее thаt thе ОСR ϲаn't rеаd thіs sеntеnсе ϲоrrесtlу.
  8. There are many unicode characters that look alike. There are also those zero width characters.
  9. It doesn't throw error in the REPL though. Surely you meant to share some other example?
  10. >if they don't whatever happens, happens

    What happens is you get an error. So you immediately know something is wrong.

    Javascript goes the extra mile to avoid throwing errors.

    So you've 3>"2" succeeding in Javascript but it's an exception in python. This behavior leads to hard to catch bugs in the former.

    Standard operators and methods have runtime type checks in python and that's what examples in the article are replicating.

  11. Navigate to `https://claude.ai/settings/data-privacy-controls` and disable it before Sept 28. Isn't applicable to team plan.
  12. Agree. To reduce costs:

    1. Precompute frequently used knowledge and surface early. For example repository structure, os information, system time.

    2. Anticipate next tool calls. If a match is not found while editing, instead of simply failing, return closest matching snippet. If read file tool gets a directory, return directory contents.

    3. Parallel tool calls. Claude needs either a batch tool or special scaffolding to promote parallel tool calls. Single tool call per turn is very expensive.

    Are there any other such general ideas?

  13. >I found, a required sample size for just one thousand people would be 278

    It's interesting to note that for a billion people this number changes to a whopping ... 385. Doesn't change much.

    I was curious, with 22 sample size (assuming unbiased sample, yada yada), while estimating the proportion of people satisfying a criteria, the margin of error is 22%.

    While bad, if done properly, it may still be insightful.

  14. I get similar accuracy to claude code using claude desktop app with a file+bash mcp (different tools same performance).

    My guess for why GPT5 scores more on benchmarks is that they evaluate on well defined tasks with all instructions given at the start.

    Real life is multi turn. Multiple set of prompts to adhere to. This is where Claude is likely better.

  15. To be absolutely honest, this wasn't a very conscious choice :-)

    I don't think a direct similarity with domain specific languages is evident to me. I rather find the messaging similar to some "agents" from other domains. e.g. https://www.harvey.ai/

  16. PSA: don't generate code using tools (and MCPs) if you're using Gemini or Openai; both ask LLMs to generate JSON directly for function calling. Claude uses XML, so it escapes the issue.
  17. Losing the sense of cwd is the reason why I append it in the output of each command run in wcgw mcp [1]

    It rarely does it incorrectly after that.

    I won't be surprised if claude code does the same soon.

    However, they do have an env flag called CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1

    This should also fix the wrong dir behavior.

    [1] https://github.com/rusiaaman/wcgw

This user hasn’t submitted anything.