Preferences

I believe OP's point is that for a given model quality, inference cost decreases dramatically over time. The article you linked talks about effective total inference costs which seem to be increasing.

Those are not contradictory: a company's inference costs can increase due to deploying more models (Sora), deploying larger models, doing more reasoning, and an increase in demand.

However, if we look purely at how much it costs to run inference on a fixed amount of requests for a fixed model quality, I am quite convinced that the inference costs are decreasing dramatically. Here's a model from late 2025 (see Model performance section) [1] with benchmarks comparing a 72B parameter model (Qwen2.5) from early 2025 to the late 2025 8B Qwen3 model.

The 9x smaller model outperforms the larger one from earlier the same year on 27 of the 40 benchmarks they were evaluated on, which is just astounding.

[1] https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal