Preferences

Gregaros parent
Curious if anyone has thoughts on going even further: eschewing soft-ware based inference in favor of a purely ASIC approach to a static LLM. Cost benefits? Software level additional, fine-tuneable layers to allow a degree of improvement and flexibility? We are quickly approaching ‘good enough’ for some tasks—at what point does that mean we’re comfortable locking something in for the ~2-4 year lifespan of a device if there _were_ advantages offered by a hyper-specialized chip?

Gregaros OP
Some further questions:

1. For tasks like autocomplete, keyword routing, or voice transcription, what would the latency and power savings look like on an ASIC vs. even a megakernel GPU setup? Would that justify a fixed-function approach in edge devices or embedded systems?

2. ASICs obviously kill retraining, but could we envision a hybrid setup where a base model is hardwired and a small, soft, learnable module (e.g., LoRA-style residual layers) runs on a general-purpose co-processor?

3. Would the transformer’s fixed topology lend itself to spatial reuse in ASIC design, or is the model’s size (e.g. GPT-3-class) still prohibitive without aggressive weight pruning or quantization?

This item has no comments currently.