- Just wanted to say, thanks for doing this! Now the old rant...
I started my career when on-prem was the norm and remember so much trouble. When you have long-lived hardware, eventually, no matter how hard you try, you just start to treat it as a pet and state naturally accumulates. Then, as the hardware starts to be not good enough, you need to upgrade. There's an internal team that presents the "commodity" interface, so you have to pick out your new hardware from their list and get the cost approved (it's a lot harder to just spend a little more and get a little more). Then your projects are delayed by them racking the new hardware and you properly "un-petting" your pets so they can respawn on the new devices, etc.
Anyways, when cloud came along, I was like, yeah we're switching and never going back. Buuut, come to find out that's part of the master plan: it's a no-brainer good deal until you and everyone in your org/company/industry forgets HTF to rack their own hardware, and then it starts to go from no-brainer to brainer. And basically unless you start to pull back and rebuild that muscle, it will go from brainer to no-brainer bad deal. So thanks for building this muscle!
- It is just as "vibe-ish" as vector search and notably does require chunking (document chunks are fed to the indexer to build the table of contents). That said, I don't find vector search any less "vibey". While "mathematical similarity" is a structured operation, the "conversion to high-dimensional vectors" part is predicated on the encoder, which can be trained towards any objective.
IIUC, retrieval is based on traversing a tree structure, so only the root nodes have to fit in the context window. I find that kinda cool about this approach.> scaling will become problematic as the doc structure approaches the context limit of the LLM doing the retrievalBut yes, still "vibe retrieval".
- That’s because every business is “scaled” to the point that the edge employees —- ie the people who interact with paying customers —- don’t own anything, and are 12 levels of management away from anyone who does.
My grandparents owned a grocery store. Their name was on the sign. If you brought home spoiled meat, that was their name and you as a member of their community that were put out.
When my mom brings home spoiled meat from Stop & Shop, she goes back there not just to exchange it, but to complain to someone about how it messed up her barbecue plans, etc. And I’m like seriously, why would anyone working at Stop & Shop give a rat’s ass about your family gathering? Stop & Shop is owned by a Dutch multinational “food retail” company.
But that’s not the capitalism she grew up with. She actually thinks capitalism is great because it allowed her parents to come over on a boat as teenagers and make lives for themselves, and have extra to send back home. But she hates it when she calls her cable company and ends up chatting with a girl in Singapore. Go figure.
- The company with the lowest median pay, Aptiv, I had to look up, and what a darling of a company this is:
Simply a masterclass in corporate irresponsibility, exactly why their CEO is so well comped.- guilty of systemic accounting fraud from 1999-2004 - manufactured a (allegedly known-to-be) faulty ignition switch that led to the deaths of 124 people and injured 275 others - currently forcing 20,000 former employees to fight a decades-long legal battle for their earned pension benefits - gleefully dumps massive quantities of carcinogens and poisons into the environment, including: lead compounds, chromium compounds, sulfuric and hydrochloride acid (lol), and glycol ethersCitations here: https://en.m.wikipedia.org/wiki/Aptiv
- > That’s why we use a tree structure rather than just a flat list of sections. This is what makes it different from traditional RAG
Ah ok, that’s a key piece I was missing. That’s really cool, thanks!
- So if I understand this correctly, this works on a single large document whose size exceeds what you can or want to put into a single context frame for answering a question? It first "indexes" the document by feeding successive "proto-chunks" to an LLM, along with an accumulator, which is like a running table of contents into the document with "sections" that the indexer LLM decides on and summarizes, until the table of contents is complete. (What we're calling "sections" here - these are still "chunks", they're just not a fixed size and are decided on by the indexer at build time?)
Then for the retrieval stage, it presents the table of contents to a "retriever" LLM, which decides which sections are relevant to the question based on the summaries the indexer LLM created. Then for the answer generation stage, it just presents those relevant sections along with the question.
That's pretty clever - does it work with a corpus of documents as well, or just a single large document? Does the "indexer" know the question ahead of time, or is the creation of sections and section summarization supposed to be question-agnostic? What if your table of contents gets too big? Seems like then it just becomes normal RAG, where you have to store the summaries and document-chunk pointers in some vector or lexical database?
- To save you a click, 19% is actually not a lot (I thought it was):
> 19% of California houses were owned by investors, ranking No. 36 among the states and just below the 20% national norm.
States with the highest share of investor-owned houses:
> Hawaii at 40%, Alaska at 35%, Vermont at 31%, West Virginia at 30%, and Wyoming at 30%.
States with the lowest are all in the Mid-Atlantic and lower New England:
> Connecticut at 10%, Rhode Island and Massachusetts at 12%, and Delaware at 13%.
Why so low in California (again, I'm baffled that this is "low")?
> the sky-high price tag for single-family homes, the third-highest nationally at $866,100
- Pretty amazing this all started as MLB Advanced Media more than 20 years ago streaming baseball games on the internet before YouTube existed! When I joined, they were streaming NHL, MLS, WWE, HBO, and many others. Then they spun off and sold to Disney and became Disney+. I wonder if any lines of code from those very early days still get executed.
Regardless, this is terrible for sports fans. Disney will chop this up into “baskets” with one watchable thing padded with lots of other unwatchable-to-mildly-interesting stuff (college hockey from a single camera angle anyone?). Then, when you want to watch some event, you’ll open your ESPN Premium +- whatever app, get excited when you see the event on the home screen, only to be upsold to start your free trial of ESPN NFL++ Hulu Fans Only Bundle. The only solution is to boycott the whole damn thing, which is what I am doing, and I love having the extra cash for tickets to local minor league games, etc.
- Why not do it before you need a job? While you're comfortable, submit your application for open roles and reject the AI interviewer.
- The system’s brightness decreases when the companion star swings around behind Betelgeuse. It also dips when Betelgeuse goes behind the companion star but much less so because Betelgeuse is so much larger.
- 3 points
- Don’t look at the concrete tiles in the foreground. Focus on the background. The world moves.
Awesome and terrible. I keep scrubbing back and forth and seeing new things.
- Ad free (for now): https://www.reddit.com/answers/D0BEF0B5-803A-4FD1-BCF4-DC358...
- Reminds me of the scene in Blood Meridian where they find a dog in an abandoned Apache camp and Brown says, “you won’t man that thing,” and Captain Glanton says, “I can man anything that eats.”
- 3 points
- Worked for Dun & Bradstreet for a bit in high school. Weird job, looking back. Sitting in a cubicle, a terminal would "randomly" pull up information about a small business in a kinda CRUD-like interface that a 14-year-old could spin up in a few minutes in RoR. You'd have to cold-call the business owner, and then ask all sorts of personal questions, like their address, how many customers they have. My "favorite" question, "What was your total revenue last year?" And some people would actually give it out! But most would spit in your face or hang up. The coup de grace was if they didn't hang up, you had to then upsell them on one of your data products. I remember one of them was just, here's the information we have on you (that you just gave me).
It was soooo soul-crushing, after every call I'd click away from the CRUD terminal to a browser and read a little "John Baez's stuff" for a few minutes. And then I'd hear the phone ring and it'd be my boss saying, "Let's take another call, okay?"
They used to tell us Abraham Lincoln worked there, I guess because it's hard to imagine Abe toiling at a fucking boring and embarrassing job like this, so maybe it's actually not that bad.
- 211 points
- Do feed and ranking algorithms have "sub-rational" influence over people's thoughts and opinions? If feeds are mind control, shouldn't courts and lawmakers just ban mind control? I'm glad we're taking it out of the hands of a foreign government, but why put it in the hands of domestic tech companies?
Learning LISP, Fortran, APL, Perl, or really any language that is different from what you’re used to, will also do this for you.