nialse parent
In five to ten years, when LLMs have stabilized, mapping them straight onto hardware will probably make sense. With today’s processes a hundred billion parameters might fit onto a single silicon wafer using ~1.5 bit precision implemented directly in logic gates. Using higher precision raises the gate count exponentially, so it makes more sense to keep the weights in memory and reuse shared compute blocks for the math for now. We need to get the ultra low precision LLMs working for the future though.