Profile: botro - Hacker Neue

botro

Joined Jan 5, 2023 162 karma

https://aimodelreview.com/

botro Dec 5, 2025 parent

Is it known or suspected whether Shingrix offers the same benefits as Zostavax for dementia?
botro Nov 8, 2025 parent

This is something I've stuggled with for my site, I made https://aimodelreview.com/ to compare the outputs of LLMs over a variety of prompts and categories, allowing a side by side comparison between them. I ran each prompt 4 times for each model with different temperature values available as a toggles.
My thinking was to just make the responses available to users and let them see how models perform. But from some feedback, turns out users don't want to have to evaluate the answers and would rather see a leaderboard and rankings.
The scalable solution to that would be LLM as judge that some benchmarks already use, but that just feels wrong to me.
LM Arena tries to solve this with the crowd sourced solution, but I think the right method would have to be domain expert human reviewers, so like Wirecutter VS IMDb, but that is expensive to pull off.
botro Oct 3, 2025 parent

I've suffered from dry eyes for many years and have tried all the over the counter options available in the US with no success, especially for overnight dryness. Could you please share a hint for the Irish pharmacy delivering to the US?
botro Sep 9, 2025 parent

I think that statista chart is month to month revisions, while the 900K figure is year over year, March 2024 to March 2025.
botro Jul 24, 2025 parent

Yes, they put this in footnote 1: "Throughout this article “training” can refer to either pre-training, or fine-tuning." But the article is just talking about fine-tuning.
botro Jul 19, 2025 parent

I posted this on HN back in 2023, reposting now because I don't think this article goes far enough:
I’ll make the bold claim that the following industries / companies would not exist without the USPS:
The Airline Industry: In the early days of American aviation, air transportation was unproven and not financially viable, until the USPS built the necessary infrastructure and gave contracts to airlines to allow them financial feasibility… starting in 1918! [1]
Machine Learning: In 1989 Yann LeCun wrote his seminal paper “Backpropagation Applied to Handwritten ZIP Code Recognition”, which used the USPS’s data set and has today become the hello world of machine learning tasks. More importantly this is the first commercial or industrial application of machine learning. [2]
Netflix: Before Streaming became a thing, Netflix was shipping DVDs via the USPS. The Postal Service adapted its processes and equipment to make this financially feasible, supporting Netflix through its transition to streaming. [3]
Amazon: Early Amazon was only a book vendor, the USPS offered special rates for books that made it possible for Bezos to be profitable from his garage … in 1994, thus birthing the behemoth it is today. [4]
Chickens: okay, not really. But the USPS ships millions of pounds of live chickens and other animals each year! [5]
[1] https://www.history.com/news/us-aviation-airmail-passenger-f...
[2] http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
[3] https://www.zdnet.com/article/u-s-postal-service-to-netflix-...
[4] https://faq.usps.com/s/article/What-is-Media-Mail-Book-Rate
[5] https://pe.usps.com/text/pub52/pub52c5_008.htm
botro Jul 2, 2025 parent

The International Space Station was an international effort by multiple (of the richest) countries specializing in different areas. There is no one country that does it all.
China built their own.
botro Jun 26, 2025 parent

Thanks for sharing your real world experience, it helps in seeing how regular folk are affected by policy decisions.
I understand from your post that you are a business person, buying product, performing value added services and selling for profit. Although I know little about business, I would guess that if one of your suppliers raised the prices on one of the inputs to your finished goods, you would likely increase the price of your product to preserve your profits and continue your business as a venture. I would guess that you would not pay the additional cost out of your own pocket.
My question is; why did you not expect the same logic to play out in the tariffs situation? That any country would pay the additional cost of doing business out of their own pocket and not pass it on to the consumer?
botro Apr 3, 2025 parent

This is damn near prescient, I'm having a hard time believing it was written in 2021.
He did get this part wrong though, we ended up calling them 'Mixture of Experts' instead of 'AI bureaucracies'.
botro Mar 29, 2025 parent

I read the article on archive and figured there was a big chunk missing. It really does not make any sense.
Sutskever and Murati were methodical, they waited until the board was favorable to the outcome they wanted, engaged with board members individually laying the groundwork... and then just changed their mind when it actually happened!?
botro Mar 22, 2025 parent

Thanks for writing this out, it's helpful for me as a layman.
Isn't part of the prohibition on trades among officers and directors also because of the inside knowledge they have? Public companies generally report quarterly but the insiders presumably have up to the minute information on sales etc.
And while we wait on the quarterly data, consistent insider selling is indicative of ... something.
botro Dec 20, 2024 parent

The LLM community has come up with tests they call 'Misguided Attention'[1] where they prompt the LLM with a slightly altered version of common riddles / tests etc. This often causes the LLM to fail.
For example I used the prompt "As an astronaut in China, would I be able to see the great wall?" and since the training data for all LLMs is full of text dispelling the common myth that the great wall is visible from space, LLMs do not notice the slight variation that the astronaut is IN China. This has been a sobering reminder to me as discussion of AGI heats up.
[1] https://github.com/cpldcpu/MisguidedAttention
botro Dec 7, 2024 parent

I made https://aimodelreview.com/ to compare the outputs of LLMs over a variety of prompts and categories, allowing a side by side comparison between them. I ran each prompt 4 times for different temperature values and that's available as a toggle.
I was going to add reviews on each model but ran out of steam. Some users have messaged me saying the comparisons are still helpful to them in getting a sense of how different models respond to the same prompt and how temperature affects the same models output on the same prompt.
botro Oct 22, 2024 parent

And to take a historic analogy, cars today are as wide as they are because that's about how wide a single lane roadway is. And a single lane roadway is as wide as it is because that's about the width of two horses drawing a carriage.
botro Oct 1, 2024 parent

Which LLM is used for the book summaries and which TTS for the audio?
botro Oct 1, 2024 parent

This is a great video, thank you for sharing. My favorite part:
"...next we have this rubber sheet, which is very clever, and very patented!"
botro Sep 27, 2024 parent

"Suddenly, the chat window on Sequoia’s side of the Zoom lights up with partners freaking out.
“I LOVE THIS FOUNDER,” typed one partner.
“I am a 10 out of 10,” pinged another.
“YES!!!” exclaimed a third.
What Sequoia was reacting to was the scale of SBF’s vision....We were incredibly impressed, Bailhe says. “It was one of those your-hair-is-blown-back type of meetings.”
This is 'smart money' in reference to Sam Bankman Fried.
botro Sep 5, 2024 parent

"The task consists of going from English-language specifications to Wolfram Language code. The test cases are exercises from Stephen Wolfram's An Elementary Introduction to the Wolfram Language."
I think this benchmark would really only tell me whether Wolframs book was in the training data.
botro Aug 29, 2024 parent

I have found that laptops with cracked or scrached screens offer a much better value in terms of newer hardware. The battery acts as a built in UPS.
For example this laptop:
Dell Latitude 7400 Intel i7-1165G7 16GB 256GB NVMe SSD
Is $180 shipped.
https://www.ebay.com/itm/266969891671?mkcid=16&mkevt=1&mkrid...
botro Aug 20, 2024 parent

In the model testing I've conducted, I've seen that LLMs from competing companies including GPT-4o, Gemini Flash 1.5, Llama 3.1 and Phi-3 all converge on the exact same joke. For a test of creativity this was alarming. They all tell slight variations of the same joke about ladders.
I've posted about it here: https://www.hackerneue.com/item?id=41125309
botro Aug 14, 2024 parent

That's fair. I'm trying to point out that the degree to which your statements illicit an emotional response from someone, by any means, has no bearing on the validity of that statement.
botro Aug 14, 2024 parent

>The degree of the provocation proves the value of the statement.
This is a great heuristic to practice often. I just told my wife she looks fat in her new dress, and the degree of her provocation proves the value of my statement.
botro Aug 5, 2024 parent

In this scenario, who are the institutions selling to?
12 points Aug 1, 2024

Consider the Ladder: Why do different LLMs tell the same joke?

1 comment botro aimodelreview.com
botro Jul 25, 2024 parent

Can you reccomend any resources for this?
botro Jul 1, 2024 parent

Thanks for sharing this, It's well written and informative. I noticed you used 'temperature=1' in the GPT test for the example in the post. Is this best practice for a task requiring structured output? Have you tested other temperature settings? My casual understanding was that a temperature of 0 is best for these types of workloads while higher temperatures would be more effective for more 'creative' workloads.
botro May 20, 2024 parent

I think if we follow your logic exactly, and make mathematically optimal decisions in every instance, leaving no space for the human spirit - we're robots anyway and may as well go to space!

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous