To answer my own question, I've decided to look up the specs of Lattice Semiconductor's iCE-40 LM1K FPGA. This is very small, just 1k LUTs. But a lot of these "matrix multiplications" and Galois-field stuff simplify down into absurdly small linear-shift-feedback registers in practice (!!). At least for encoding (decoding is far more difficult).
With that in mind, these iCE-40 low-power devices are claiming to be of the ~10mA class, which puts them in the small microcontroller region. (Ex: RP2040 is 20mA, so we're already undercutting RP2040 let alone a proper Cortex-A level chip).
So... yeah. Okay, I see the use. But that's still a _lot_ of extra work compared to grabbing an off-the-shelf Cortex-A5, lol. But given the right power constraints, I can imagine that the $6 to $20 FPGA / iCE-40 would be more useful than adding a full size Cortex-A5 (or better) with SIMD / other such advance computational instruction sets.
Ex: I think I'd be able to program a LSFR for 8-bit Reed Solomon encoding (Galois add/multiply) that'd pair up with a standard microcontroller (think any ARM-Cortex M4 here), all for a total solution power consumption under 20mA going full tilt.
Since DDR2 RAM starts at like 100mA power consumption, there's a lot of FPGA+Microcontroller that you can fit before even the smallest microprocessors (aka: Cortex-A5) make sense.
----------
So I'm thinking that a small microcontroller that needs to write-only communicate over a noisy channel could in practice, require a Reed Solomon encoder (or turbocodes or whatever modern crap exists. I'm not up-to-date with the latest techniques). Reed Solomon encoder is 100% better on an FPGA since its just a linear shift feedback register.
Or heck, the matrix-multiplication to decode a Reed Solomon error correction scheme is surprisingly compute heavy, and might also be superior on an FPGA than the 10mA class uC.
With that in mind, these iCE-40 low-power devices are claiming to be of the ~10mA class, which puts them in the small microcontroller region. (Ex: RP2040 is 20mA, so we're already undercutting RP2040 let alone a proper Cortex-A level chip).
So... yeah. Okay, I see the use. But that's still a _lot_ of extra work compared to grabbing an off-the-shelf Cortex-A5, lol. But given the right power constraints, I can imagine that the $6 to $20 FPGA / iCE-40 would be more useful than adding a full size Cortex-A5 (or better) with SIMD / other such advance computational instruction sets.
Ex: I think I'd be able to program a LSFR for 8-bit Reed Solomon encoding (Galois add/multiply) that'd pair up with a standard microcontroller (think any ARM-Cortex M4 here), all for a total solution power consumption under 20mA going full tilt.
Since DDR2 RAM starts at like 100mA power consumption, there's a lot of FPGA+Microcontroller that you can fit before even the smallest microprocessors (aka: Cortex-A5) make sense.
----------
So I'm thinking that a small microcontroller that needs to write-only communicate over a noisy channel could in practice, require a Reed Solomon encoder (or turbocodes or whatever modern crap exists. I'm not up-to-date with the latest techniques). Reed Solomon encoder is 100% better on an FPGA since its just a linear shift feedback register.
Or heck, the matrix-multiplication to decode a Reed Solomon error correction scheme is surprisingly compute heavy, and might also be superior on an FPGA than the 10mA class uC.