I honestly don't think its possible to implement SHA256 on 1k LUTs that's discussed by these FPGA dev boards in this post. (Let alone an implementation that's going to beat out traditional CPUs or GPUs).
Like seriously: 1k x 4-LUTs means that these iCE40 FPGAs has 4096-total inputs to all of their logic. SHA256 has ya know, 256-bits of input and probably takes more than 16 "steps" to implement even with a perfectly route. (But if anyone proves me wrong, consider me happy).
You're thinking orders of magnitude too big here. The FPGAs described in this post are much, much, much smaller.
Any boolean-logic heavy workload such as password cracking or SHA256-mining (Bitcoin) is perfectly suited for FPGA platforms and will outperform any microprocessor or GPU in terms of performance per watt. For example in the early days of Bitcoin, FPGAs such as the Xilinx XC6SLX150 ruled mining, and many such implementations were developed by hobbyists.