Preferences

bbor
Joined 2,768 karma
Self-unemployed R&D specializing in Unified AI approaches backed by philosophy. HMU -- always looking to collaborate, contract, or chat!

robb@doering.ai


  1. I’m glad someone else noticed the time frames — turns out the lead author here has published 28 distinct preprints in the past 60 days, almost all of which are marked as being officially published already/soon.

    Certainly some scientists are just absurdly efficient and all 28 involved teams, but that’s still a lot.

    Personally speaking, this gives me second thoughts about their dedication to truly accurately measuring something as notoriously tricky as corporate SWE performance. Any number of cut corners in a novel & empirical study like this would be hard to notice from the final product, especially for casual readers…TBH, the clickbait title doesn’t help either!

    I don’t have a specific critique on why 4 months is definitely too short to do it right tho. Just vibe-reviewing, I guess ;)

  2. Ah, indeed I did! It's a searchable term in general, but I'm particularly fond of this write-up: https://plato.stanford.edu/entries/frame-problem/

    yet again I am begging, weeping, pleading for markdown support on HN, my lord

  3.   Aside, why not link the original video instead of a reddit post?
    
    It's a compilation, but regardless, Reddit seems about as "original" as any other platform. I'd certainly rather see Reddit links here than YouTube links, all else being equal!

      the vast majority of equity of companies are held privately
    
    That's an good intuition, but it turns out to be false globally (TIL!): "There are nearly 25x more PE- and VC-backed companies than public markets [globally], but the total capitalization of private equity and venture capital is just 12% of public equity markets." per https://www.harbourvest.com/insights-news/insights/cpm-how-d...

      vulture PE firms do exist but are not as prevalent as people make it seem online. It's a meme that many people seem to have latched on when the vast majority of PE firms and companies work perfectly fine
    
    ...source? It's certainly possible that I'm suffering from confirmation bias, but "company goes through PE acquisition" headlines seem to be followed by "brand dissolved" headlines in way too many cases. Even if it's not a literal majority, the problem seems A) widespread, and B) behind many of the most harmful symptoms of the rot beneath the American(/global?) economy!
  4. This is just patently absurd:

      "AI" is just the vehicle (the excuse) - it's not the root of the problem nor is it the ultimate goal. 
    
    People are investing in AI because they believe the scientists' warnings that the Frame Problem[1] has been solved (or, in other words, "AGI is suddenly within reach").

    You can say they're fools if you want - you might even be right! But pretending like hundreds (thousands?) of board members across the world are conspiring to build a buyer's cartel (monopsony?) in order to starve out the PC Gaming market of all things is just myopic.

    I hope I'm not too vitriolic, especially if the guy in the video is here -- I certainly share a lot of politics with him, and absolutely share his priors regarding PE. I just think it's extremely clear that this particular subreddit has "lost the plot" as the ~~kids~~ mid-30s nerds say. If anyone's not familiar, I highly recommend a perusal through the top posts of the past week/month...

  5.    large LLM-generated texts just get in the way of reading real text from real humans
    
    In terms of reasons for platform-level censorship, "I have to scroll sometimes" seems like a bad one.
  6. Search is quite the undertaking, so I'm not really hoping that Mozilla takes that on in particular. I'm just pointing out the odd reality that I tend to trust Kagi (a for-profit) to fight the general good fight in a way I agree with more than I trust Mozilla (a non-profit).
  7. That reminds me that their new tab grouping feature is the first one to really impress me and immediately enter my workflow in… years? Probably since either reader mode or auto-translate first dropped.

    Highly recommend everyone check it out. Handily trounces all the tab management extensions I’ve tried over the years on FF and Chrome

  8. Random thought, but Kagi is acting like I wish Mozilla would. Their main product is a search engine, but they’ve been trying out a slew of other initiatives, all of which seem well thought out and integrate LLMs in an exclusively thoughtful, opt-in way. Surely many of them will end up being failures, but I can’t help but be impressed.

    Maybe it’s because I’m a power user and they tend to cater to power users, idk — that’s definitely what the comment above yours is hinting at.

    But at this point, I think we can all agree that whatever Mozilla is doing now isn’t working… so maybe power users are worth a shot again?

  9. I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.

    Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).

    FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].

    [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

    [2] https://huggingface.co/google/gemma-3n-E4B-it

    [3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

  10. Over the past few months, I've switched a few decently-sized python codebases from MyPy (which I used for years) to PyreFly (because the MyPy LSP ecosystem is somewhere between crumbling and deprecated at this point), and finally to Ty after it left beta this week. I'm now running a fully Astral-ized (rust-ized!) setup:

    1. packaging with uv (instead of pip or poetry),

    2. type checking with ty (instead of the default MyPy or Meta's Pyrefly),

    3. linting with ruff (instead of Jedi),

    4. building with uv build (instead of the default setuptools or poetry build),

    5. and publishing with uv publish (instead of the default twine)

    ...and I'm just here to just say that I highly recommend it!

    Obviously obsessing over type checking libraries can quickly become bikeshedding for the typical project, but I think the cohesive setup ends up adding a surprising amount of value. That goes double if you're running containers.[1]

    TBH I see Astral and Pydantic as a league of their own in terms of advancing Python, for one simple reason: I can trust them to almost always make opiniated decisions that I agree with. The FastApi/SQLModel guy is close, but there's still some headscratchers -- not the case with the former two. Whether it's docs, source code, or the actual interfaces, I feel like I'm in good hands.

    TL;DR: This newly-minted fanboy recommends you try out ty w/ uv & ruff!

    [1]https://docs.astral.sh/uv/guides/integration/docker/#availab...

  11. Great article that I haven't finished, but if the author ends up reading this: any good dictionary of terms needs an index!
  12. Well, expert systems aren’t machine learning, they’re symbolic. You mention perceptrons, but that timeline is proof for the power of scaling, not against — they didn’t start to really work until we built giant computers in the ~90s, and have been revolutionizing the field ever since.
  13. Maybe true in general, but Gary Marcus is an experienced researcher and entrepreneur who’s been writing about AI for literally decades.

    I’m quite critical, but I think we have to grant that he has plenty of credentials and understands the technical nature of what he’s critiquing quite well!

  14. I always love a Marcus hot take, but this one is more infuriating than usual. He’s taking all these prominent engineers saying “we need new techniques to build upon the massive, unexpected success we’ve had”, twisting it into “LLMs were never a success and sucked all along”, and listing them alongside people that no one should be taking seriously — namely, Emily Bender and Ed Zitron.

    Of course, he includes enough weasel phrases that you could never nail him down on any particular negative sentiment; LLMs aren’t bad, they just need to be “complemented”. But even if we didn’t have context, the whole thesis of the piece runs completely counter to this — you don’t “waste” a trillion dollars on something that just needs to be complemented!

    FWIW, I totally agree with his more mundane philosophical points about the need to finally unify the work of the Scruffies and the Neats. The problem is that he frames it like some rare insight that he and his fellow rebels found, rather than something that was being articulated in depth by one of the fields main leaders 35 years ago[1]. Every one of the tens of thousands of people currently working on “agential” AI knows it too, even if they don’t have the academic background to articulate it.

    I look forward to the day when Mr. Marcus can feel like he’s sufficiently won, and thus get back to collaborating with the rest of us… This level of vitriolic, sustained cynicism is just antithetical to the scientific method at this point. It is a social practice, after all!

    [1] https://www.mit.edu/~dxh/marvin/web.media.mit.edu/~minsky/pa...

  15. Yup, exactly this! To clarify a bit more for the lurkers:

    Obviously the line can be hard to draw for most (intentionally so, even!), but at the end of the day there’s people who work for their living and people who invest for their living. Besides not having to work, investors are very intentionally & explicitly tasked with directing society.

    Being raised in the US, I often assumed that “capitalism” meant “a system that involves markets”, or perhaps even “a system with personal freedom”. In reality, it’s much drier and more obvious: capitalism is a system where the capitalists rule, just like the monarchs of monarchism or the theocrats of theocracy. There are many possible market-based systems that don’t have the same notion of personal property and investment that we do.

  16. I see where you're coming from on a methodological level, but

    1. Capitalists control our society, and live completely different lives than the rest. A typical CEO is certainly quite privileged, and may even work their way up to true wealth eventually! But at the end of the day, they're still clocking in for at least 40 hours a week to do something they'd rather not do, and their life would be completely upended if they had to stop working for some reason. The difference between Pichai and Bezos dwarfs the difference between Pichai and me for these reasons, IMO.

    2. Capitalists directly control ~50% of the capital in the US last time I checked. It makes sense to split any given pie in half IMO, at least to start!

  17.   Traditionally when talking about money as it relates to social class, people refer to an income bracket
    
    I think this article is worth the read for the interesting data it highlights with the arbitrary framework, but it's hard to ignore the elephant in the room: the author's "traditional" experience here excludes a huge part of the economic thought of the last 200 years.

    I know this isn't exactly a forum predisposed to Marx, but I would encourage even the most fervent anti-communists to take some time to appreciate his economic work on a scientific level. Wealth is absolutely more important than income when analyzing society, because a certain amount of wealth makes one a "capitalist" (in a literal sense, not an ideological one). Capitalists live a life of luxury without working, and they are explicitly+intentionally tasked with the lions share of social responsibility (or, more pejoratively, social power).

    TL;DR: You don't need to be a Marxist to appreciate the utility of labor-based class analysis in our society! Given that the traditional SV goal is to become a capitalist as quickly as possible ("FIRE"), we'd do better to discuss this stuff more frequently...

  18. So the story behind the title is that the UK gov claims that the IP block wasn't working? And the author agrees that IP blocks can't really work, even?

    Separate from the free speech debate, the international law part of this seems pretty cut and dry. Here's the bolded parts:

      So, it appears, as with 4chan, Ofcom has elected to proceed with a mock execution... Ofcom is trying to set the precedent that... you have to follow its rules – even if you’re American and you’re engaged in constitutionally protected speech and conduct. To that end, Ofcom has renewed its previous threats of fines, arrest, and imprisonment, against SaSu and its operators – all Americans.
    
    Isn't that how laws work...? Like, it's illegal to be gay in some countries. Theoretically, those countries could open proceedings against every openly-gay person in the world, and try them in absentia. That would be evil and silly of course, but I don't understand what legal principle it would be violating?

    More pointedly: what is this lawyer actually "representing" these "clients" for? I don't see any mention of any US legal action, and presumably you need to be british to represent people in UK court. Isn't this just activism, not representation?

  19. I share your general emotional reaction, but to be fair, heart disease is far and away more important than other type of disease. More people die of it in the US than die of all cancers combined: https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm
  20. I'm already quite put off by the title (it's science -- if you have a better benchmark, publish it!), but the contents aren't great either. It keeps citing numbers about "445 LLM benchmarks" without confirming whether any of the ones they deem insufficiently statistical are used by any of the major players. I've seen a lot of benchmarks, but maybe 20 are used regularly by large labs, max.

      "For example, if a benchmark reuses questions from a calculator-free exam such as AIME," the study says, "numbers in each problem will have been chosen to facilitate basic arithmetic. Testing only on these problems would not predict performance on larger numbers, where LLMs struggle."
    
    For a math-based critique, this seems to ignore a glaring problem: is it even possible to randomly sample all natural numbers? As another comment pointed out we wouldn't even want to ("LLMs can't accurately multiply 6-digit numbers" isn't something anyone cares about/expected them to do in the first place), but regardless: this seems like a vacuous critique dressed up in a costume of mathematical rigor.

      At least some of those who design benchmark tests are aware of these concerns.
    
    In related news, at least some scientists studying climate change are aware that their methods are imperfect. More at 11!

    If anyone doubts my concerns and thinks this article is in good faith, just check out this site's "AI+ML" section: https://www.theregister.com/software/ai_ml/

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal