- This is really nice and a very original take. It feels good on mobile / other touch devices.
I'd love to see it feel a bit more polished on desktop (maybe I'll give that a shot if I find a bit of spare time!) - I could see a few simple things like adding up/down arrows to the picked item and wiring into up and down arrow presses going a long way to making it work really well there too.
Genuinely, thank you for sharing this, it's something different and interesting.
- Following this logic, why write anything at all? Shakespeare's sonnets are arrangements of existing words that were possible before he wrote them. Every mathematical proof, novel, piece of journalism is simply a configuration of symbols that existed in the space of all possible configurations. The fact that something could be generated doesn't negate its value when it is generated for a specific purpose, context, and audience.
- Consider this (possibly very bad) take:
RAG could largely be replaced with tool use to a search engine. You could keep some of the approach around indexing/embeddings/semantic search, but it just becomes another tool call to a separate system.
How would you feel about becoming an expert in something that is so in flux and might disappear? That might help give you your answer.
That said, there's a lot of comparatively low hanging fruit in LLM adjacent areas atm.
- The Umbraco CMS was amazing during the time that it used and supported XSLT.
While it evaluated the xslt serverside it was a really neat and simple approach.
- I expect it will wind up like search engines where you either submit urls for indexing/inclusion or wait for a crawl to pick your information up.
Until the tech catches up it will have a stifling effect on progress toward and adoption of new things (which imo is pretty common of new/immature tech, eg how culture has more generally kind of stagnated since the early 2000s)
- Except value isnt polarised like that.
In a research context, it provides pointers, and keywords for further investigation. In a report-writing context it provides textual content.
Neither of these or the thousand other uses are worthless. Its when you expect working and complete work product that it's (subjectively, maybe) worthless but frankly aiming for that with current gen technology is a fool's errand.
- devoir de désobéissance is _duty_ of disobedience.
If they choose to follow orders they know are illegal they can be personally liable.
- AMD's offer was more than fair. Hotz was throwing a trantrum.
- The business model doesn't matter.
I can write something with Microsoft tech and expect it with reasonable likelihood to work in 10 years (even their service-based stuff), but can't say the same about anything from Google.
That alone stops me/my org buying stuff from Google.
- Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)
> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one
This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.
- Specifically within the last week, I have used Claude and Claude via cursor to:
- write some moderately complex powershell to perform a one-off process
- add typescript annotations to a random file in my org's codebase
- land a minor feature quickly in another codebase
- suggest libraries and write sample(ish) code to see what their rough use would look like to help choose between them for a future feature design
- provide text to fill out an extensive sales RFT spreadsheet based on notes and some RAG
- generat some very domain-specific realistic sounding test data (just naming)
- scaffold out some PowerPoint slides for a training session
There are likely others (LLMs have helped with research and in my personal life too)
All of these are things that I could do (and probably do better) but I have a young baby at the moment and the situation means that my focus windows are small and I'm time poor. With this workflow I'm achieving more than I was when I had fully uninterrupted time.
- This is great, but I wish there was a shorter and more to the point version for me to link folks to.
Each of the ideas in here is solid, but there's too much writing around the core idea -- a sentence or two for each point and then a tldr like "put in some basic level of effort if you're going to ask for others' valuable time." would do it for me personally.
- The aftermarket for these things means that the cost winds up being split between multiple parties in a lot of cases.
Anecdotally, most parents within my circle bought their Snoo used and sold it after use. I bought an unopened snoo from facebook marketplace for $X and sold it after 6 months for $X-200.
I was a little annoyed that Happiest Baby is meddling with the resale value (because I was expecting to be able to sell it on after a few months of use)
IMO even though the product is overpriced, I'd have happily paid 5k for the extra sleep I believe it gave me.
- My org uses codegen as a starting point for one of our test layers.
It works for us probably because we sidestep the pain points you list - the environments we run in are pristine complete copies of known datasets, we remove as many sources of randomness as possible, and our environment flakiness level is very low.
They still break but usually because the locators in use have been chosen poorly (or we've made planned changes to a page/component)
We're a web based b2b saas that runs an instance of the entire environment for each of our customers. Our non prod setup consists of a bajillion static test environments but more importantly we use testcontainers to spin up the transient test environments from database snapshots. Using the recorder on the static environments (before the transient ones existed) _was_ a pain
- A level of fear allows the introduction of regulatory moats that protect the organisations who are currently building and deploying these models at scale.
"It's dangerous" is a beneficial lie for eg openai to push because they can afford any compliance/certification process that's introduced (hell, they'd probably be heavily involved in designing the process)
- > Price discrimination is not illegal
The Play store operates in many jurisdictions, including some where this is borderline or could be deemed to be illegal
I hope that some of those jurisdictions start showing some teeth on this sort of anticompetitive behavior
- But it's free and you didn't really answer the question.
Were you looking for a soapbox to stand on?
- More interesting than the original statement is how many people seem to have a chip of their shoulder/take the statement as a personal insult.
The professor is correct in that the majority of Web developers could get by without much theoretical/academic background (and anecdotally, do get by without using those skills/knowledge much)
Maybe there's an industry-wide Dunning-Kruger effect?
Do you have customers who have faced/solved this problem? If so, how did they do it -- it seems like a killer on the approach?