Preferences

aethelyon
Joined 13 karma
Co-Founder <a href="https://klu.ai">https://klu.ai</a>

  1. "some" or a single file?
  2. this is fake news, the xml tags break the output when the model output is the system prompt with the example tags, see screenshot: https://x.com/0xSMW/status/1944624089597137214

    same as what happens with claude

  3. comparing o3-pro reasoning to gemini 2.5 pro and claude 4 opus on a speculative, open-ended prompt
  4. No, I’ve seen this pattern as well. Will apologize and then when you ask to continue it will have a change of mind and refuse again. It’s a bad RLHF/AIF loop that it gets stuck into.
  5. Bloop is amazing. Once you use it you stop building your own DIY codebase QA setups.
  6. This is cool, but the data collection is the hard part, right?
  7. Spoiler: it's fast, cheap, overly protective, and has Kafkaesque DX
  8. Spoiler: it's fast, cheap, overly protective, and has Kafkaesque DX
  9. This is awesome, but there were a couple of great laptop interfaces from that movie too. Spent some quality time in the 90s getting AfterStep/Litestep to look like them.
  10. I used to be worried about face scanning. But sometimes I wonder if it's an inevitable evolution of technology.

    Which – to be clear – is not support for it, but a question about what is emergent from the new things we create.

  11. 100% agree, I think the 26% will greatly increase over time... or the ones that don't will decline as a business over time.

    the 13.4% is likely leaders in ML for some specific use case like fraud or recommendations. it would be great to have access to raw data with anonymized demographics.

    we are very, very early.

  12. great data – wish they provided the raw information to slice the respondent audience more, but aligns with what I've seen in the market re: concerns and models.
  13. We benchmarked retrieval, GPT-4 turbo vs GPT-4, and fine-tuned several models: https://klu.ai/blog/openai-devday-2023

    You can use the result of one here https://huberman.klu.ai/

  14. We benchmarked retrieval, GPT-4 turbo vs GPT-4, and fine-tuned several models. You can use the result of one here https://huberman.klu.ai/
  15. We benchmarked retrieval, GPT-4 turbo vs GPT-4, and fine-tuned several models. You can use the result of one here https://huberman.klu.ai/
  16. check out https://klu.ai – we built it for this reason – sign up, book some time, and I'll help you however I can
  17. Microsoft brought GPT-4 to GA for all customers on Azure OpenAI this week. This removes the endless waitlist for some. Wrote up a few notes from our experience with it.
  18. we built https://klu.ai/ for this

    ======

    outside of us, here's what I see happening

    80% of folks aren't building in prod

    if you pull apart the 20% that are building, I've seen this from largest to smallest population:

    1. most people are not monitoring, followed by 2. home-grown solutions logged into existing observe/analytics platforms, followed by 3. LLMOps tooling like Klu

    the 2 cents on the unfortunate truth: I think that many of the AI bolt-on features are living the classic feature lifecycle in that they are launched, no one is monitoring them for improvement, and the feature retention sucks so there's no top-down push to prioritize. the people measuring and improving are exceptional builders regardless of LLMs/RAG.

  19. I started compiling all of the known public information (ala geohotz, semianalysis, et al) in an attempt to build a model card for GPT-4. Am I missing anything?
  20. seems like it, but no one is talking about it – everyone I ask IRL says performance is bad, but not seeing in benchmarks

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal