Profile: qeternity - Hacker Neue

qeternity

Joined May 4, 2016 8,693 karma

qeternity 5 days ago

It has been a thing. In a single request, this same cache is reused for each forward pass.
It took a while for companies to start metering it and charging accordingly.
Also companies invested in hierarchical caches that allow longer term and cross cluster caching.
qeternity 5 days ago

In a chat setting you hit the cache every time you add a new prompt: all historical question/answer pairs are part of the context and don’t need to be prefilled again.
On the API side imagine you are doing document processing and have a 50k token instruction prompt that you reuse for every document.
It’s extremely viable and used all the time.
qeternity Dec 15, 2025

> It’s much easier to tax the general population than businesses, as they don’t push back as much.
Businesses don't pay taxes. People do. Every dime that a corporation pays is a reduction of capital returns to shareholders, or a reduction of investment into business activity, both of which are taxed again by the people who ultimately receive the capital.
qeternity Dec 12, 2025

Quantization is not some magical dial you can just turn. In practice you basically have 3 choices: fp16, fp8 and fp4.
Also thinking time means more tokens which costs more especially at the API level where you are paying per token and would be trivially observable.
There is basically no evidence that either of these are occurring in the way you suggest (boosting up and down).
qeternity Dec 8, 2025

This does not prove, at all, what you are claiming.
qeternity Dec 8, 2025

Or perhaps more likely like a large corporation simply has economies of scale that smaller retailers cannot compete with.
qeternity Dec 8, 2025

> Reminder that infrastructure is ~3% of the budget, the military is ~13%. Almost all of the rest are benefits, either health or money, for old people or various poverty reduction schemes. Or debt.
The US Government is an enormous welfare program with a military on the side.
Whenever people talk about the rich paying their fair share, they simply fail to grasp the enormity of the problem. There is no taxing your way out of this problem.
Society's expectations have far outpaced our fiscal strength.
qeternity Dec 8, 2025

How about you cite something.
qeternity Dec 8, 2025

It would help people to consider your point if you made even a modest attempt to explain and justify what you mean.
qeternity Dec 6, 2025

> life is genuinely worse today than it was 20 years ago, mostly because of technology
Extraordinary claims require extraordinary evidence. Almost everything today in absolute terms is better than 20 years ago, even more so outside the developed world.
What specifically today is worse than 20 years ago?
qeternity Dec 3, 2025

These are completely different. Agents (aside from the model inference) are not CPU bound. You gain much more by having a wider user base than whatever marginal CPU cycles you would gain in Rust/Go.
Video games are of course a different story.
qeternity Dec 1, 2025

> but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1]
No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.
qeternity Dec 1, 2025

> DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.
qeternity Nov 27, 2025

Yes, absolutely in deep learning. Custom fused CUDA kernels everywhere.
qeternity Nov 27, 2025

This is not the case for LLMs. FP16/BF16 training precision is standard, with FP8 inference very common. But labs are moving to FP8 training and even FP4.
qeternity Nov 27, 2025

PyTorch is only part of it. There is still a huge amount of CUDA that isn’t just wrapped by PyTorch and isn’t easily portable.
qeternity Nov 3, 2025

> Also, all this vector stuff is going to fade away as context windows get larger (already started over the past 8 months or so).
People who say this really have not thought this through, or simply don't understand what the usecases for vector search are.
But even if you had infinite context, with perfect attention, attention isn't free. Even if you had linear attention. It's much much cheaper to index your data than it is to reprocess everything. You don't go around scanning entire databases when you're just interested in row id=X
qeternity Oct 22, 2025

People really just go on the internet and say stuff.
Code is speech. Speech is protected (at least in the US).
qeternity Oct 18, 2025

If the client can generate a uuid4 they can also reuse a known uuid4
qeternity Sep 25, 2025

> Are they trying to be a full cloud platform like everyone else?
Yes.

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous