- IIRC, AvatarMUD (avatar.outland.org) has 20,000+ rooms in it. It's been a long time since I played, but it's absolutely massive!
- That's exactly what the pin-pen merger is! As you know, it's not limited to pin/pen, and hearing ability (in my case, profound hearing loss) is not related to the ability to hear the difference. I don't understand the linguistics, but my very bad understanding is that there's actual brain chemistry here that means that you _can't_ hear the difference because you never learned it, never spoke it, and you pronounce them the same.
My partner is from the PNW and she pronounces "egg" as "ayg" (like "ayyyy-g") but when I say "egg" she can't hear the difference between what I'm saying and what she says. And she has perfect hearing. But she CAN hear the difference between "pin" and "pen", and she gets upset when i say them the same way. lol
But yeah, that's one of the things that makes accents accents. It's not just the sounds that come out of our mouths but the way we hear things, too. Kinda crazy. :)
- I'm also deaf, and I took 14 years of speech therapy. I grew up in Alabama. The only way you would know I'm from the South is because of the pin-pen merger[1]. Otherwise, you'd think I grew up in the American Midwest, due to how my speech therapy went. Almost nobody picks up on it, unless they are linguists that already knew about the pin-pen merger.
[1]https://www.acelinguist.com/2020/01/the-pin-pen-merger.html
- This is WILD. I love it. Congrats on shipping!
- Oh, interesting. Well, TIL.
- Does that even exist? It's basically what they described but with some additional installation? Once you install it, you can select the LLM on disk and run it? That's what they asked for.
Maybe I'm misunderstanding something.
- You should install it, because it's exactly what you just described.
Edit: From a UI perspective, it's exactly what you described. There's a dropdown where you select the LLM, and there's a ChatGPT-style chatbox. You just docker-up and go to town.
Maybe I don't understand the rest of the request, but I can't imagine a software where a webpage exists and it just magically has LLMs available in the browser with no installation?
- Probably because it's intentional. There are many theories why, but one might be that by saying "You're absolutely right," they are priming the LLM to agree with you and be more likely to continue with your solution than to try something else that might not be what you want.
- Sort of in the same vein as "you don't need to understand gravity to recognize that it's important," you not understanding what the answer is doesn't mean there isn't an answer and that the answer isn't important.
I don't have "the answer" (I have _an_ answer, see below), and I also don't need to know "the answer" in order to understand that the managerial class doesn't exist for shits and giggles. There's value there.
If Bezos thought getting rid of managers at Amazon would make him another half a billion dollars, you bet your ass he'd do it.
My answer? It's exactly what that group of seniorish people would do: make decisions. But the seniorish people can't make decisions all day and ALSO do the things they're senior at. You may not like that answer -- and you don't have to! -- but "making decisions" is something that needs to get done at scale without sacrificing the actual productive work that ICs do.
But again, I think you're asking a great question, and I think there's room to say "Is the current paradigm the best paradigm?" and explore other alternatives.
But the very clear answer from all research in addition to basic intuition is "As far as we know, yes."
Why? Doesn't really matter. We just know that if we didn't have managers, the world as we know it wouldn't exist. (For better or for worse!)
- But the larger point is still true: For every "why don't they just do X? It's so obvious!" you can look around and note that [almost] /nobody/ is doing that, and that should be a pretty big signal about the idea.
My original post is about the intuition behind how to approach questions like that. Whenever anyone says "Why don't they just do $OBVIOUS_THING?" the answer is "because nobody is doing it."
Now, with respect to that particular feature, I can provide some personal experience to explain /why/ [almost] nobody has that feature. As a disclaimer, I don't know HEB's website. They were just someone we dealt with, and I don't live in their service area, so it's interesting that they have the feature.
What I can tell you from experience is that it would not be a significant driver of revenue, certainly not enough to be a majorly supported feature by a major company that has other value props out there. By far the biggest revenue driver for a grocery company are the staples that people buy every single week: the same milk, the same bread, the same cereal, the same ground beef, the same mac 'n cheese.
People, as a general population, are not adventurous at home When you want something new and interesting, you go out to a restaurant. When you want something familiar and comfortable and, most importantly, easy, you make it at home. I would hazard that the number of times that the average American family of four would cook a brand new recipe they've never had before is probably less than a dozen times /per year/.
So if you're a company that gets >90% of its revenue from weekly recurring users and staples, and <1% of its revenue from recipe-driven results, and you have limited resources, which of those do you think you should focus on? Obviously, you focus on the former. A 10% increase in staples sales is worth millions and millions of dollars, whereas a 10% increase in recipe sales is worth, maybe, a few hundred thousand dollars. It's not nothing, but it's not really worth it from an ROI perspective. Maybe if it's a set-it-and-forget-it kind of feature, it might work?
But over the long run you'll have to come back and upgrade dependencies and migrate to the newest framework du jour, and blah blah blah, and the next thing you know you have a team of four full-time engineers working on a feature that brings in half their salary.
HEB doing it is, well, it's interesting. I am supremely confident it is not a significant revenue driver. It might be something that increases NPS scores or something to that effect, but it's not going to move the needle on revenue very much. So it's interesting that they have the feature.
So if you take all of that into account -- you'll just have to trust me that I know what I'm talking about, I'm sorry about that -- then you can see that someone saying "What if we just got senior people in a room to see what they think?" is a question that doesn't deserve much attention. Not because it's a dumb idea, but because it's actually an interesting idea that has no merits when you dig down and look at it.
If it worked, as the intuition goes, the industry at-large would be doing it. And the evidence bears that out.
- I think you should ask yourself that last question again, and this time really think about it. Why /do/ all companies seem to have managers?
(Tone clarification: I'm not approaching this in a condescending manner, but more of a "let's talk through this problem out loud and see where it gets us." So please don't take this as condescension.)
One way to think about "obvious" solutions to problems, such as a "no manager" solutuon, is this: if it's so obvious, why is no one doing it? For example, I worked for a grocery delivery startup for a while. Every single new hire, without fail, would show up at the end of the first week and say "I have a great idea, why don't we let users shop by recipe?"
On its face, it sounds like a brilliant idea! One intuition-based shortcut to find the answer is: if that's such an obvious thing, why doesn't Amazon or Kroger or Safeway or HEB or any of the major grocery chains let you do that?
And of course, the answer is: that's not how users shop. If it worked, the big players would be doing it. They're not. Are you smarter than Amazon? Probably not. That's not to say a smaller group can't innovate past Amazon, but Amazon has some /really fucking smart people/ working for them, and the odds are fantastically small that you'll out-think them. (You can certainly out-_pivot_ them by doing something faster than they can, but if it turns out to be valuable, in the long run, they'll do it too.)
So when you approach a conversation like this and say, "Maybe just see what a group of seniorish people think?", one way to do a quick sanity check on it is: can you think of successful companies that are run that way?
You probably can't. I certainly can't.
There's a similar problem in the theatre world. It is universally understood that someone doing a 60 second monologue for an audition is _the worst way to evaluate theatrical performance_... except for everything else.
And similarly, it appears, based on scanning the successful companies, that having managers is possibly also the worst way to ensure performance... except for everything else.
So... managers it is. It's unlikely that there's a better way to do this at scale. Many people have tried. Management chains always win.
- I'll second this. It's fantastic.
- I don't think I'm making the case that you shouldn't test things or care about the results, but rather a matter of degree of risk that should be acceptable. In medicine, if you get it wrong, people /die/. In software, if you get it wrong, /you sell fewer widgets/. That's a pretty major difference. You can't get it wrong in medicine, but you /can/ get it wrong in software without it being catastrophic failure.
I'm basically making the case that "Your startup deserves the same rigor [as medical testing]" is making a pretty bold assertion, and that the reality is that most of us can get away with much less rigor and still get ahead in terms of improving our outcomes.
In other words, it's still A/B testing if your p-value is 0.10 instead of 0.05. There's nothing magical about the 0.05 number. Most startups could probably get away with a 20% chance of being wrong on any particular test and still come out ahead. (Note: this assumes that the thing your testing is good science -- one thing we aren't talking about is how many tests are actually changing many variables at once and maybe that's not great!)
- Absolutely. If you're the business owner, selling fewer widgets is Very Bad!
But in my post, I specifically called out a line in OP's article that I disagreed with: (paraphrasing) "Your startup deserves the same rigor as medical testing."
To clarify -- and support your point --, we're shipping software, not irreversible medical procedures. If you get it wrong, you sell fewer widgets /temporarily/ and you revert back to a known better solution. With medicine, there aren't necessarily take-backsies -- but there absolutely are in software. Reverting deploys is something all of us do quite regularly!
Is it A/B testing? Maybe, maybe not. I'm not a data scientist. But I think saying that your startup deserves the same rigor as a medical test is misleading at best and harmful at worst.
I just think companies should be more okay with educated risks, rather than waiting days, weeks, months for statistical significance on a feature that has little chance of actually having a negative impact. As you said elsewhere in the thread, for startups, stasis is death.
(BTW, I've read a lot of your other comments in the thread. I think we're pretty well aligned!)
- I think you're being overly pedantic here. I'm not a data scientist, just an engineering manager who is frustrated with data scientists ;)
That said, I do appreciate your corrections, but I don't think anything you said fundamentally changes my philosophical approach to these problems.
- Excellent response. Thank you!
- I think there's a pretty big difference between QA (letting bugs go by) and A/B testing, and your post appears to me to be conflating the two. I would argue that you are better off spending your time QAing a feature that you have high confidence is positive ROI, than spending weeks waiting for an A/B test to reach stat sig.
I don't disagree with your statement, I just think you are addressing a different problem from A/B testing and statistical significance.
- > This isn't academic nit-picking. It's how medical research works when lives are on the line. Your startup's growth deserves the same rigor.
But does it, really? A lot of companies sell... well, let's say "not important" stuff. Most companies don't cost peoples' lives when you get it wrong. If you A/B test user signups for a startup that sells widgets, people aren't living or dying based on the results. The consequences of getting it wrong are... you sell fewer widgets?
While I understand the overall point of the post -- and agree with it! -- I do take issue with this particular point. A lot of companies are, arguably, _too rigorous_ when it comes to testing.
At my last company, we spent 6 weeks waiting for stat sig. But within 48 hours, we had a positive signal. Conversion was up! Not statistically significant, but trending in the direction we wanted. But to "maintain rigor," we waited 6 weeks before turning it... and the final numbers were virtually the same as the 48 hour numbers.
Note: I'm not advocating stopping tests as soon as something shows trending in the right direction. The third scenario on the post points this out as a flaw! I do like their proposal for "peeking" and subsequent testing.
But, really, let's just be realistic about what level of "rigor" is required to make decisions. We aren't shooting rockets into space. We're shipping software. We can change things if we get them wrong. It's okay. The world won't end.
IMO, the right framing here is: your startup deserves to be as rigorous as is necessary to achieve its goals. If its goals are "stat sig on every test," then sure, treat it like someone might die if you're wrong. (I would argue that you have the wrong goals, in this case, but I digress...)
But if your goals are "do no harm, see if we're heading in the right vector, and trust that you can pivot if it turns out you got a false positive," then you kind of explicitly don't need to treat it with the same rigor as a medical test.
- Certainly. There are many paths to victory here.
One thing to consider is whether you _want_ your producers to be aware of the clients or not. If you use SQS, then your producer needs to be aware of where it's sending the message. In event-driven architecture, ideally producers don't care who's listening. They just broadcast a message: "Hey, this thing just happened." And anyone who wants to subscribe can subscribe. The analogy is a radio tower -- the radio broadcaster has no idea who's listening, but thousands and thousands of people can tune in and listen.
Contrast to making a phone call, where you have to know who it is that you're dialing and you can only talk to one person at a time.
There are pros and cons to both, but there's tremendous value in large applications for making the producer responsible for producing, but not having to worry about who is consuming. Particularly in organizations with large teams where coordinating that kind of thing can be a big pain.
But you're absolutely right: queues/topics are basically free, and you can have as many as you want! I've certainly done it the SQS way that you describe many times!
As I mentioned, there are many paths to victory. Mine works really well for me, and it sounds like yours works really well for you. That's fantastic :)
We might disagree on what "efficient" means. OP is focusing on computer efficiency, where as you'll see, I tend to optimize for human efficiency (and, let's be clear, JSON is efficient _enough_ for 99% of computer cases).
I think the "human readable" part is often an overlooked pro by hardcore protobuf fans. One of my fundamental philosophies of engineering historically has been "clarity over cleverness." Perhaps the corollary to this is "...and simplicity over complexity." And I think protobuf, generally speaking, falls in the cleverness part, and certainly into the complexity part (with regards to dependencies).
JSON, on the other hand, is ubiquitous, human readable (clear), and simple (little-to-no dependencies).
I've found in my career that there's tremendous value in not needing to execute code to see what a payload contains. I've seen a lot of engineers (including myself, once upon a time!) take shortcuts like using bitwise values and protobufs and things like that to make things faster or to be clever or whatever. And then I've seen those same engineers, or perhaps their successors, find great difficulty in navigating years-old protobufs, when a JSON payload is immediately clear and understandable to any human, technical or not, upon a glance.
I write MUDs for fun, and one of the things that older MUD codebases do is that they use bit flags to compress a lot of information into a tiny integer. To know what conditions a player has (hunger, thirst, cursed, etc), you do some bit manipulation and you wind up with something like 31 that represents the player being thirsty (1), hungry (2), cursed (4), with haste (8), and with shield (16). Which is great, if you're optimizing for integer compression, but it's really bad when you want a human to look at it. You have to do a bunch of math to sort of de-compress that integer into something meaningful for humans.
Similarly with protobuf, I find that it usually optimizes for the wrong thing. To be clear, one of my other fundamental philosophies about engineering is that performance is king and that you should try to make things fast, but there are certainly diminishing returns, especially in codebases where humans interact frequently with the data. Protobufs make things fast at a cost, and that cost is typically clarity and human readability. Versioning also creates more friction. I've seen teams spend an inordinate amount of effort trying to ensure that both the producer and consumer are using the same versions.
This is not to say that protobufs are useless. It's great for enforcing API contracts at the code level, and it provides those speed improvements OP mentions. There are certain high-throughput use-cases where this complexity and relative opaqueness is not only an acceptable trade off, but the right one to make. But I've found that it's not particularly common, and people reaching for protobufs are often optimizing for the wrong things. Again, clarity over cleverness and simplicity over complexity.
I know one of the arguments is "it's better for situations where you control both sides," but if you're in any kind of team with more than a couple of engineers, this stops being true. Even if your internal API is controlled by "us," that "us" can sometimes span 100+ engineers, and you might as well consider it a public API.
I'm not a protobuf hater, I just think that the vast majority of engineers would go through their careers without ever touching protobufs, never miss it, never need it, and never find themselves where eking out that extra performance is truly worth the hassle.