Open source toolchain for the development of FPGAs of multiple vendors. Currently, it targets the Xilinx 7-Series, Lattice iCE40, Lattice ECP5 FPGAs, QuickLogic EOS S3 and is gradually being expanded to provide a comprehensive end-to-end FPGA synthesis flow.
> There’s already IP cores for DRAM, PCI express, ethernet, video, a softcore CPU (your choice of or1k or lm32) and more.. LiteX produces a design that uses about 20% of an XC7A50 FPGA with a runtime of about 10 minutes, whereas Vivado produces a design that consumes 85% of the same FPGA with a runtime of about 30-45 minutes.
> you can.. get [FPGA] parts for significant discounts in 1-off quantities through legit Chinese distributors like LCSC. For example, a XC7A35T-2FGG484I is 90$ on Digikey and 20$ at LCSC. I think a personalized deal for that part would be cheaper than 20$ though...
hardolaf
You're conflating open source IP cores such as LiteX with open source FPGA tooling to try to make the latter look better by using the former. Everyone knows that vendor IP is pretty terrible if you don't use it in the very narrow window in which it's validated and tested. That's why big defense contractors all use either the Northwest Logic or Rambus PCI-e cores on everything prior to Versal.
But at the same time, those cores are big and powerful, and optimize horribly because the customers who actually use them need all of those features. Those customers aren't really concerned with area but rather with meeting performance requirements. Using the Xilinx provided QDMA core, I've been able to achieve line rate performance on PCI-e 4.0 x16 for large DMA transactions with a setup time of about 3 total days of work. I'd like to see an open source solution that could even do that with just ACKing raw TLPs because I haven't found one yet.
As for pricing, AMD/Xilinx and Altera don't want you as a customer. They want to sign $10M+/yr accounts or accounts which push the envelope of what's possible in terms of frequency (HFT). And they price their products accordingly for the public. If you actually end up as a direct customer, the prices are significantly cheaper to the point where those cheaper Chinese vendors don't make sense to use.
phendrenad2
How well does this actually work on the larger Kintex chips? Are there any projects that show it in action?
> There's been some interesting recent work to get the QMTech Kintex7-325 board (among others) supported under yosys/nextpnr.. It works well enough now to build a RISC-V SoC capable of running Linux
32-bit MMU/No-MMU Linux-capable RISC-V softcore with rich peripherals is implemented by pure Verilog, and supported by OpenXC7, the FOSS FPGA toolchain.. These are still modern devices: 7-Series lifetime extended to 2035.
Do you think its possible for someone to enter the industry through this open source solution? I have always wanted to play around with FPGAs but have no idea where to even begin.
transpute
Some contributors to the open hardware community (https://fossi-foundation.org/events/archive) can be followed on social media. See videos from FOSSI conferences and comments in these HN threads:
Not an FPGA, but if you already have a recent Ryzen device, the AMD NPU might be worth a look, with Xilinx lineage and current AI/LLM market frenzy, https://www.hackerneue.com/item?id=43671940
> The Versal AI Engine is the NPU. And the Ryzen CPUs NPU is almost exactly a Versal AI Engine IP block to the point that in the Linux kernel they share the same driver (amdxdna) and the reference material the kernel docs link to for the Ryzen NPUs is the Versal SoC's AI Engine architecture reference manual.
No. It's not competitive. You'll spend previous time (which should be spent on prototyping an design) on solving bugs and writing infrastructure code. Reverse engineering has not been a viable effort for a long time now.
burnt-resistor
Need fundamentals of combinational and sequential logic, followed by perhaps a course on hardware/software interfacing to grok timing parameters like setup and hold time, propagation delay, fan in, fan out.
FPGAs can be developed using CAE-like systems or SystemVerilog, VHDL, or something modern like Veryl. Real FPGAs include acceleration blocks like DRAM, SRAM, CAM, shifters, ALU elements, and/or ARM cores.
At the end of the day though, the best teacher is to learn by doing and finding venues to ask questions.
FPGAs are an amazing product that almost shouldn't exist if you think about the business and marketing concerns. They are a product that is too expensive at scale. If an application takes off, it is eventually cheaper and more performant to switch to ASICs, which is obvious when you see the 4-digit prices of the most sophisticated FPGAs.
Given how ruinously expensive silicon products are to bring to market, it's amazing that there are multiple companies competing (albeit in distinct segments).
FPGAs also seem like a largely untapped domain in general purpose computing, a bit like GPUs used to be. The ability to reprogram an FPGA to implement a new digital circuit in milliseconds would be a game changer for many workloads, except that current CPUs and GPUs are already very capable.
inamberclad
The problem is that the tools are still weak. The languages are difficult to use, nobody has made something more widely adopted than Verilog or VHDL. In addition, the IDEs are proprietary and the tools are fragile and not reproduceable. Synthesis results can vary from run to run on the exact same code with the same parameters, with real world impacts on performance. This all conspires to make FPGA development only suitable for bespoke products with narrow use cases.
I would love to see the open source world come to the rescue here. There are some very nice open source tools for Lattice FPGAs and Lattice's lawyers have essentially agreed to let the open source tools continue unimpeded (they're undoubtedly driving sales), but the chips themselves can't compete with the likes of Xilinx.
JoachimS
Systemverilog (SV) is the dominating language for both ASIC and FPGA development. SV is evolving, and the tools are updated quite fast. SV allows you to do abstractions through interfaces, enums, types etc. The verification part of contains a lot of modern-ish language constructions, support for formal verification. The important thing is really to understand that what is being described is hardware. Your design is supposed to be possible to implement on a die, with physical wires, gates, register, I/Os etc. There will be clocks, wire delays. It actually one of the problems one encounter with more SWE people tries to implement FPGAs and ASICs. The language, tools may help you, but you also need to understand that it is not programing, but design you are doing.
SV requires a linter for literally every single line change that you do because the language is rotten to the core by being based on Verilog. Heck, it has an entire chapter of it's LRM dedicated to the non-deterministic behavior inherent to its description of the hardware. VHDL has no such section because it is deterministic.
Both languages suck for different reasons but no one has figured out how to make a better language and output a netlist from it (yes, there is an open interchange standard that almost every proprietary tool supports).
sweetjuly
I don't disagree that the tools are rough, but to the person you're replying to's point, would perfect tools and languages actually solve the underlying problem?
As much as I love FPGAs, GPUs really ate their lunch in the acceleration sphere (trying to leverage the FPGA's parallelism to overcome a >20x clock speed disadvantage is REALLY hard, especially if power is a concern) and so it seems the only niche left for them is circuit emulation. Of course, circuit emulation is a sizable market (low volume designs which don't make sense as ASICs, verification, research, etc.) and so it's not exactly a death sentence.
hardolaf
The FPGA market has been growing in size despite GPGPU taking off. And clock speed difference is closer to 4-5x not 20x. Despite that and the lower area efficiency of FPGAs, there have been price and power competitive FPGA accelerators cards released over the last 5 years. Sure, you're not going to get an A100's performance, but you can get deterministic latency below 5us for something that the A100 would take a minimum of 50us to process. GPGPU isn't ideal for its current use case either so FPGA based designs have a lot of room to work in to get better, application specific accelerators.
JoachimS
The non-deterministic part of the toolchain is not a universal truth. Most, all tools allow you to set, control the seeds and you can get deterministic results. Tillitis use this fact to allow you to verify that the FPGA bitstream used is the exact one you get from the source. Just clone the design repo, install the tkey-builder docker image for the release and run 'make run-make' and of course all tools in the tkey-builder are open source with known versions to that you can verify the integrity of the tools.
And all this is due to the actually very good open source toolchain, including synthesis (Yosys) P&R (NextPNR, Trellis etc), Verilator, Icarus, Surfer and many more. Lattice being more friendly than other vendors has seen an uptake in sales because of this. They make money on the devices, not their tools.
And even if you move to ASICs, open source tools are being used more and more, esp at simulation, front end design. As an ASIC and FPGA designer for 25 odd years I spend most of my time in open source tools.
I never understood why FPGA vendors think the tools should do this and not the designer. Most do a terrible job at it too. E.g., Quartus doing place and route in a single thread and then bailing out after X hours/days with a cryptic error message... As a designer I would be much happier to tell it exactly where to put my adder, where to place the sram, and where to run the wires connecting the ports. You'd build your design by making larger and larger components.
variadix
As I understand it, the physical FPGA layout and timing information used for placement and routing is proprietary, and the vendors don’t want to share it. They’ll let you specify constraints for connections, but it has to go through their opaque solver. And to be fair, they do have to try to solve an NP-complete problem, so the slowness isn’t unjustified compared to all the other slow buggy software people have to deal with nowadays.
JoachimS
The competitiveness between Lattice and Xilinx is also not a univeral truth. It totally depends on the applications. Small to medium designs Lattice have very competitive offerings. Hard ARM cores, not as much. Very large designs not at all. But if you need internal config memory (for some devices), small footprint etc Lattice is really a good choice. And then support in open source tools to boot.
zxexz
I’m sure the Lattice open source situation is driving sales in a more than substantial way. I’ve definitely spent >2k in the past five years on lattice boards and chips (not counting used or otherwise aftermarket). Most people I’ve met with any disposable income and an interest in hardware or DSP have done similar. I know 2k USD is nothing compared to even a single high end chip from another vendor, but it sticks. I’ve known more amazing devs than I can count on two hands that ended up actually using Lattice FPGAs in their job (several times changing industries to do so!). I honestly feel that if Lattice embraced open source further, they could be a leading player in the space. Hell, I’m sure they could do that and still find a way to make money on software. Something, something, enterprise.
fake-name
> Synthesis results can vary from run to run on the exact same code with the same parameters, with real world impacts on performance.
This is because some of the challenges for the synthesis/routing process are effectively NP-hard. Instead, the compiler uses heuristics and a random process to try to find a valid solution that meets the timing constraints, rather then the best possible solution.
I believe you can control the synthesis seed to make things repeatable, I believe that the stochastic nature of the process means that any change to the input can substantially change the output.
JoachimS
Yes you can control the seeds and get determinstic bitstreams. Depending on device, tools you can also assist the tools by providing floorplanning constraints. And one can of course try out seeds to get designs that meet results you need. Tillitis use this to find seeds that generate implementations that meet the timing requirements. Its in ther custom tool flow.
kev009
For a while in the 2000s Cisco was one of the biggest users of FPGAs. If you consider how complicated digital designs have been for many decades, and the costs of associated failures, FPGAs can certainly be cost neutral at scale, especially accounting for risk and reputational damage, into production lines.
Also there is a large gamut and pretty much always has been for decades of programmable logic.. some useful parts are not much more than a mid range microcontroller. The top end is for DoD, system emulation, novel frontier/capture regimes (like "AI", autonomous vehicles).. few people ever work on those compared to the cheaper parts.
duskwuff
FPGAs are still quite common in niche hardware like oscilloscopes or cell towers, where the manufacturer needs some sophisticated hardware capabilities but isn't manufacturing enough units to make the NRE for an ASIC worthwhile.
stephen_g
Also time to market - I have a friend who worked for Alcatel Lucent and they would use FPGAs while Nokia would use ASICs, they saw it as a big advantage since if there was a problem in part of the ASIC, or if you needed new features that were outside the original scope, the time and cost to respin was massive over fixing problems or implementing new standards in the FPGA bitstream!
Eventually Nokia ended up buying Alcatel Lucent and not too long after he left, not sure what their current strategy is.
//If an application takes off, it is eventually cheaper and more performant to switch to ASICs,
That's part of the FPGA business model - they have an automated way to take an FPGA design and turn it into a validated semi-custom ASIC, at low NRE, at silicon nodes(10nm?) you wouldn't have access to otherwise.
And all of that at a much lower risk. This is a strong rational but also emotional appeal. And people are highly influenced by that.
duskwuff
Is this still an active thing? My understanding is that both Xilinx and Altera/Intel have effectively discontinued their ASIC programs (Xilinx EasyPath, Altera HardCopy); they aren't available for modern part families.
For what it's worth, Xilinx EasyPath was never actually ASIC. The parts delivered were still FPGAs; they were just FPGAs with a reduced testing program focusing on functionality used by the customer's design.
I'd be amazed if that were still possible, in fact. Real-world FPGA designs lean heavily on the vendor's proprietary IP, which won't port straight across to ASICs any more than the LUT-based FPGA fabric will.
Anyone who claims to turn a modern FPGA design into an ASIC "automatically" is selling snake oil.
duskwuff
Oh, these programs were always in-house. The offering was essentially "if you pay an up-front fee and give us your FPGA design, we'll sell you some chips that run that design for cheaper than the FPGAs". If there was ever any custom silicon involved - which there may have been for Altera, but probably not for Xilinx - the design files for it were never made available to the customer.
15155
> Real-world FPGA designs lean heavily on the vendor's proprietary IP
No, not always - I use no vendor IP whatsoever for extremely large designs.
For ASICs is basically required to use fab IP (for physical production/electrical/verification reasons,) but that's absolutely not the case for FPGAs.
esseph
It is very possible and many vendors are still doing this. One of them was fairly recently acquired by Cisco.
> The ability to reprogram an FPGA to implement a new digital circuit in milliseconds would be a game changer for many workloads
Someone has to design each of those reconfigurable digital circuits and take them through an implementation flow.
Only certain problems map well to easy FPGA implementation: anything involving memory access is quite tedious.
JoachimS
The ability to optimize the memory access and memory configuration is sometimes a game changer. And modern FPGA tools have functionality to make mem access quite easy. Not as easy as a MCU/CPU, but basically the same as for an ASIC.
I would also question the premise that mem access is less tedious, easy for MCUs/CPU. Esp if you need determinstic performance and response times. Most CPUs have memory hierarchies.
The more practial attempts at dynamic, partial reconfiguration involves swapping out accelerators for specific functions. Encoders, fecoders for different wireless standards, Different curves in crypto for example. And yes somebody has to implement those.
15155
> modern FPGA tools have functionality
HLS is not good, so I don't know what you are referring to as "modern." I am primarily experienced with large UltraScale+ and Versal chips - nothing has changed in 15 years here.
> basically the same as for an ASIC
What does this even mean, specifically? Use RTL examples. ASIC memory access isn't "easy," either (though it is basically the "same.")
> partial reconfiguration involves swapping out accelerators for specific functions
Tell me you've never used PR without telling me. Current vendor implementations of this are terrible (with Xilinx leading the pack.)
bigfatkitten
> which is obvious when you see the 4-digit prices of the most sophisticated FPGAs.
I mean, it's not like people producing products with those parts actually pay that for production though, except for some really tiny volume ones (such as some defence projects).
Companies make products based around FPGAs and can sell the whole thing for less than you could buy just the single FPGA part for on a place like Digi-key. It's just part of the FPGA companies' business models. In volume the price will be far smaller.
bigfatkitten
At that end of the market they cost an astronomical amount of money, no matter what.
The $140,000 device doesn’t become a $400 device in any volume; it might become a $90,000 device.
YakBizzarro
these prices are like airplanes: no one with volume pays list prices, it's something else. moreover, this FPGA is very peculiar. it's used to simulate ASIC during validation, so it's not really the typical FPGA that gets used in a project
stephen_g
No, I expect you could get it under $20K with not that much volume and potentially in the single digit thousands in high volume. The FPGA vendors' business models are weird, the price breaks are unlike what we see with most other parts.
15155
> The $140,000 device doesn’t become a $400 device in any volume; it might become a $90,000 device.
VU13Ps are quoted $300/ea at tray quantities from Xilinx, yet are $89k on DigiKey with no price breaks.
bluGill
There are a large number of products that will never sell enough to be worth going to an ASIC.
Aromasin
Part of why Lattice Semi has been so successful in recent years is they've broken the paradigm slightly in that their FPGAs are much most cost-effective, while still coming with all of the things we expect of FPGAs. Lots of high-speed IO, half decent software, and a pretty broad IP portfolio. Something like the Certus-NX comes in at ~$5 at the 17k LUT count, and Avant only ~$550 at the 600k LUT mark. That's almost a 1/4 or more of what the equivalent Xilinx or Altera device goes for. There's very little licensing cost too, which make them really appealing. I see them going into so many designs now because they can scale. You'd have to be making 100Ks+ of boards to justify the ASIC expense when there's a commodity product like it.
tverbeure
At volume, $550 for 600k LUTs is outrageously high and much more than what you’d pay Xilinx or Altera.
Aromasin
Absolutely, but I quoted shelf price. Volume pricing can be anywhere between 30-80% less than that. Agilex 5 equivalent is ~$900 and much the same for the Ultrascale+ equivalent, so Lattice is the most competitive on cost.
15155
And then you have to use Lattice's toolchain, which you couldn't pay me to use.
kvemkon
> The ability to reprogram an FPGA to implement a new digital circuit in milliseconds would be a game changer for many workloads,..
Only 47 milliseconds from power-on to operational.
Lattice Avant™-G FPGA: Boot Up Time Demo (12.12.2023)
CertusPro-NX has I/O Configuration in about 4 ms and full fabric config within 30 ms (for ~100 K logic cell device). Certus, full‑device configuration within ~ 8 ms.
Lattice make some really cool devices. Not the fastest fmax speeds, but hell if the time to config and tiny power draw don't half make up for it.
pjc50
> Only 47 milliseconds from power-on to operational.
Absolute eternity by modern computer standards. GPU will be a trillion operations ahead of you before you even start. Or for another view, that's a whole seven frames at 144Hz.
People say FPGAs will be great for many workloads, but then don't give examples. In my experience the only real ones are those requiring low-latency hardware comms. ADC->FPGA->DAC is a powerful combo. Everything else gets run over by either CPU doing integer work or GPU doing FP.
KeplerBoy
That's completely besides the point. How long does an embedded linux box need to get it's GPU up and ready for number crunching? But yes, FPGAs are best-suited for deterministic low latency stuff.
With the jetsons (agx orin) I have on my desk it would take a bit of tinkering to even get it under a minute.
JoachimS
You also need to bring time to market, product lifetime, the need for upgrades, fixes and flexibility, risks and R&D cost including skillset and NRE when comparing FPGAs and ASICs. Most, basically all ASICs start out as FPGAs, either in labs or in real products.
Another aspect where FPGAs are interesting alternatives are security. Open up a fairly competent HSM and you will find FPGAs. FPGAs, esp ones that can be locked to a bitstream - for example anti-fuse or Flash based FPGAs from Microchip are used in high security systems. The machines can be built in a less secure setting, and the injection, provisioning of a machine can be done in a high security setting.
Dynamically reconfigurable systems was a very interesting idea. With support for partial reconfiguration, which allowed you to change accelarator cores connected to a CPU platform seemed to bring a lot of promise. Xilinx was an early provider with the C6x family IRRC through company they bought. AMD also provided devices with support for partial reconfiguration. There were also some research devices and startups for this in the early 2000s. I planned to do a PhD around this topic. But tool, language support and the added cost in the devices seemed to have killed this. At least for now.
Today, in for example mobile phone systems, FPGAs provide the compute power CPUs can't do with the added ability do add new features as the standards evolve, regional market requirements affect the HW. But this is more like FW upgrades.
esseph
I know many network vendors that have been selling FPGA driven products for decades, and I have contributed to some of the product development.
ASICs require a certain scale and a very high up-front cost.
SlowTao
There has been an idea for a very long time of building an OS and software that targets FPGA systems so it can dynamically change its function for each task. The idea being that it would potentially be faster than a general purpose processor.
Still practically theory as I have never seen anything come of it. It is going up against ASIC design which is a great middle ground for those thing even if it means you are not free to do it yourself.
artiscode
The military loves FPGAs. They can do what ASICs can, but without involving extra people.
15155
Except analog (save for very recently with devices such as Xilinx RFSoC).
mrheosuper
> They are a product that is too expensive at scale. If an application takes off, it is eventually cheaper and more performant to switch to ASICs
Isn't that same thing: Too expensive to scale, so you switch to ASIC ?
maxdamantus
> Too expensive to scale, so you switch to ASIC ?
I think it's not so much about too expensive, but once you've got the resources it will always be better to switch to an ASIC.
Not a hardware engineer, but it seems obvious to me that any circuitry implemented using an FPGA will be physically bigger with more "wiring" (more resistance, more energy, more heat) than the equivalent ASIC, and accordingly the tolerances will need to be larger so clock speeds will be lower.
Basically, at scale an ASIC will always win out over an FPGA, unless your application is basically "give the user an FPGA" (but this is begging the question—unless your users are hardware engineers this can't be a goal).
esseph
ASICs require scale that doesn't always make sense. Many things using FPGAs aren't necessarily mass-market / consumer devices.
geerlingguy
One area I see almost exclusively FPGA designs is for high power broadcast equipment like transmitters and exciters. The polar opposite of mass-market, and the price is high enough the FPGA is just one of many expensive components.
esseph
You'll also find them in tons and tons of ISP equipment. Radios, optical gear, QoE equipment, etc.
mrheosuper
Better in what, performance ?, maybe, profit ?, heavily depends on your market.
maxdamantus
Yes, performance (per watt, or per mass of silicon).
Profit is dependent on scale. FPGAs are useful if the scale is so small that an ASIC production line is more expensive than buying a couple of FPGAs.
If the scale is large enough that ASIC production is cheaper, you reap the performance improvements.
Think of it this way: FPGAs are programmed using ASIC circuitry. If you programmed an FPGA using an FPGA (using ASIC circuitry), do you think you'll achieve the same performance as the underlying FPGA? Of course not (assuming you're not cheating with some "identity" compilation). Same thing applies with any other ASIC.
Each layer of FPGA abstraction incurs a cost: more silicon/circuitry/resistance/heat/energy and lower clock speeds.
FPGA vs ASIC is a boring and tired comparison. Yeah obviously. For a fixed configuration an ASIC is basically just an FPGA, but without any of the parts that make an FPGA programmable.
If you don't need programmability, then all that flexibility represents pure waste. But then again, we can make the same argument with ASIC vs CPUs and GPUs. The ASIC always wins, because CPUs and GPUs come with unnecessary flexibility.
The real problem with FPGAs isn't even that they get beaten by ASICs, because you can always come up with a low volume market for them, especially as modern process nodes get more and more expensive to the point where bleeding edge FPGAs are becoming more and more viable. You can now have FPGAs on 7nm with better performance than ASICs with older but more affordable process nodes that fit in your budget.
The real problem is that the vast majority of FPGA manufacturers don't even play the same game as GPUs and CPUs. You can have fast single and double precision floats on a CPU and really really fast single precision floats on GPUs, but on FPGAs? Those are reserved for the elite Versal series (or Intel's equivalent). Every other FPGA manufacturer? Fixed point arithmetic plus bfloat16 if you are lucky.
Now let me tell you. For AI this doesn't really matter. The FPGAs that do AI, focus primarily on supporting a truckload of simultaneous of camera inputs. There is no real competition here. No CPU or GPU will let you connect as many cameras as an FPGA, unless its an SoC specifically built for VR headsets.
Meanwhile for everything else, not having single precision floats is a curse. Porting an algorithm from floating point to fixed point arithmetic is non-trivial and requires extensive engineering effort. You not only need to know how to work with hardware, but also need to understand the algorithm in its entirety and all the numerical consequences that entails. You go from dropping someone's algorithm into your code and having it work from the get go, to needing to understand every single line and having it break anyway.
These problems aren't impossible to fix, but they are guaranteed to go away the very instant you get your hands on floating point arithmetic. This leads to a paradox. FPGAs are extremely flexible, but simultaneously extremely constricting. The appeal is lost.
15155
Floating point arithmetic isn't a "basic element of logic" and likely will never become one in FPGA world: floating point multipliers take up a lot of area and require specific binary implementation details.
Simboo(dead)
[dead]
checker659
I think FPGAs (or CGRAs really) will make a comeback once LLMs can directly generate FPGA bitstreams.
throwawayabcdef
No need. I gave ChatGPT this prompt: "Write a data mover in Xilinx HLS with Vitis flow that takes in a stream of bytes, swaps pairs of bytes, then streams the bytes out"
And it did a good job. The code it made probably works fine and will run on most Xilinx FPGAs.
pjc50
> The code it made probably works fine
Solve your silicon verification workflow with this one weird trick: "looks good to me"!
throwawayabcdef
Its how I saved cost and schedule on this project.
ben_w
I don't even work in hardware, and yet even I have still heard of the Pentium FDIV bug, which happened despite people looking a lot more closely than "probably works fine".
15155
What does "directly generate FPGA bitstreams" mean?
Placement and routing is an NP-Complete problem.
duskwuff
And I certainly can't imagine how a language model would be of any use here, in a problem which doesn't involve language.
15155
They are "okay" at generating RTL, but are likely never going to be able to generate actual bitstreams without some classical implementation flow in there.
For me, the promise of computation is not in the current CPU (or now GPU) world, but one in which the hardware can dynamically run the most optimized logic for a given operation.
Sadly, this has yet to materialize (with some exceptions[0][1][2]).
Hopefully in an LLM-first world, we can start to utilize FPGAs more effectively at (small) scale :)
For some additional context, I believe most cloud providers who started on FPGAs for network devices have switched to “smart” ASICs, where some limited portion of the device is programmable.
The issue with FPGAs, is that if you have enough scale, an ASIC starts to make more sense.
Still, I believe they provide a glimpse into the hardware of the future!
zackmorris
Same here. I believe that the lack of programmable hardware is akin to the lack of high multicore CPUs. As in, either would be too disruptive to the status quo.
GPU/TPU etc are just domain-specific hardware, not much help for exploring new paradigms.
To really solve this, we'll likely need someone who's won the internet lottery and is willing to invest serious capital ($1 billion plus) to solve real problems and get real work done. Until then, it's going to be tech bros playing with VR and LLMs.
I'll see you all in 10 years when the situation still hasn't changed.
amelius
40 years and the industry still doesn't understand that they should embrace open source for tooling.
> OpenFPGA.. aims to automate the design, verification and layout of highly versatile FPGA architectures. OpenFPGA offers a high-level architecture description language for users to customize their FPGA architectures down to circuit-level details. Based on the architecture modeling, OpenFPGA can auto-generate Verilog netlists, with which users can perform verification as well as generate production-ready layouts using modern EDA tools. OpenFPGA includes a generic Verilog-to-Bitstream generator, as a native EDA toolchain for any FPGAs that are prototyped by OpenFPGA.
> workshop on open-source hardware which addresses the wide variety of challenges encountered by both hardware and software engineers in dealing with the increasing heterogeneity of next-generation computer architectures. By providing a venue which brings together researchers from academia, industry and government labs, OSCAR promotes a collaborative approach to foster the efforts of the open-source hardware community
monocasa
I'm reminded of Ken Sheriff's reverse engineering of the original FPGA, the Xilinx XC2064.
If you look at the XC2064 datasheet[1], you'll see there are 12,038 configuration bits, only 1024 of those bits actually program the LUTs, and 90% are for routing. For general purpose logic replacement, that offers great flexibility.
If you want to do general purpose computing, it's my strong (and minority) opinion that routing fabrics are a premature optimization. The trend has been in the wrong direction.
If you were to go the other way, and just build a systolic array of look up tables, as I have hypothesized for years with my BitGrid, you could save 90% of the silicon, and still get almost all of the compute. It gets better when you consider the active logic would only be between neighboring cells, thus capacitive power would be much lower, and speeds could be higher.
> If you were to go the other way, and just build a systolic array of look up tables, as I have hypothesized for years with my BitGrid, you could save 90% of the silicon, and still get almost all of the compute.
Do you have a link where I could read more about "systolic array of lookup tables" and "BitGrid"?
I've no idea what those two things are but it sure sounds interesting.
National Instruments made programming FPGAs accessible to engineers who are not EEs like me via LabVIEW. It was really cool to have loops running at tens of kHz up to 100MHz speeds
It's really too bad that it was locked to NI products and they've kind faded away.
I sometimes like to think of what could have been, if the ability to program FPGAs so easily would have become more popular.
lnsru
It’s my second decade in FPGA development and almost every year someone was promising easy development method. High level synthesis works with limitations, but often the single working solution is hand written VHDL. ChatGPT is really bad at VHDL. Maybe Verilog works better, didn’t try that yet.
LarsKrimi
I can confirm that ChatGPT is laughably bad at Verilog
Waterluvian
On the 40th anniversary I will confess that for most of my career I thought the “field” in FPGA meant like a field of gates.
b0a04gl
fpgas are underrated. insane control, hard realtime, tweak logic without touching hardware. and still holding grounds in hft, telecom basebands, satellite payloads, broadcast gear, lab instruments, even some cloud accelerators like aws f1. anywhere latency's tight but logic keeps changing. most move to asic when design's stable but if your workload evolves or margins depend on latency, fpga's the only sane option
EarlKing
Just a note: At the bottom of the page they mention a book on Embedded System Design that they contributed to but there's no link. It took a little digging but I found the book they're talking about: https://us.artechhouse.com/A-Hands-On-Guide-to-Designing-Emb...
burnt-resistor
I'm surprised there's still an (now AMD) Xilinx campus at Union Ave & Hwy 85.
And pour one out for Altera that was actually founded a year before but took 5 years to arrive at something similar. It was eventually acqui-crushed by Intel.
kvemkon
Recently posted:
The FPGA turns 40. Where does it go from here? (11.06.2025)
> There’s already IP cores for DRAM, PCI express, ethernet, video, a softcore CPU (your choice of or1k or lm32) and more.. LiteX produces a design that uses about 20% of an XC7A50 FPGA with a runtime of about 10 minutes, whereas Vivado produces a design that consumes 85% of the same FPGA with a runtime of about 30-45 minutes.
https://www.hackerneue.com/item?id=39836745#39922534
> you can.. get [FPGA] parts for significant discounts in 1-off quantities through legit Chinese distributors like LCSC. For example, a XC7A35T-2FGG484I is 90$ on Digikey and 20$ at LCSC. I think a personalized deal for that part would be cheaper than 20$ though...
But at the same time, those cores are big and powerful, and optimize horribly because the customers who actually use them need all of those features. Those customers aren't really concerned with area but rather with meeting performance requirements. Using the Xilinx provided QDMA core, I've been able to achieve line rate performance on PCI-e 4.0 x16 for large DMA transactions with a setup time of about 3 total days of work. I'd like to see an open source solution that could even do that with just ACKing raw TLPs because I haven't found one yet.
As for pricing, AMD/Xilinx and Altera don't want you as a customer. They want to sign $10M+/yr accounts or accounts which push the envelope of what's possible in terms of frequency (HFT). And they price their products accordingly for the public. If you actually end up as a direct customer, the prices are significantly cheaper to the point where those cheaper Chinese vendors don't make sense to use.
> There's been some interesting recent work to get the QMTech Kintex7-325 board (among others) supported under yosys/nextpnr.. It works well enough now to build a RISC-V SoC capable of running Linux
https://riscv.or.jp/wp-content/uploads/RV-Days_Tokyo_2024_Su...
https://github.com/regymm/quasiSoC2023, "FPGA Dev Boards for $150 or Less", 80 comments, https://www.hackerneue.com/item?id=38161215
2021, "FPGA dev board that's cheap, simple and supported by OSS toolchain", 70 comments, https://www.hackerneue.com/item?id=25720531
Not an FPGA, but if you already have a recent Ryzen device, the AMD NPU might be worth a look, with Xilinx lineage and current AI/LLM market frenzy, https://www.hackerneue.com/item?id=43671940
> The Versal AI Engine is the NPU. And the Ryzen CPUs NPU is almost exactly a Versal AI Engine IP block to the point that in the Linux kernel they share the same driver (amdxdna) and the reference material the kernel docs link to for the Ryzen NPUs is the Versal SoC's AI Engine architecture reference manual.
At one point, cheap ex-miner FPGAs were on eBay, https://hackaday.com/2020/12/10/a-xilinx-zynq-linux-fpga-boa.... The Zynq (Arm + Xilinx FPGA) dev board is around $200, https://www.avnet.com/americas/products/avnet-boards/avnet-b.... There was an M.2 Xilinx FPGA (PicoEVB) that conveniently fit into a laptop for portable development, but it's not sold anymore. PCIe FPGAs are used for DMA security testing, some of those boards are available, https://github.com/ufrisk/pcileech-fpga
FPGAs can be developed using CAE-like systems or SystemVerilog, VHDL, or something modern like Veryl. Real FPGAs include acceleration blocks like DRAM, SRAM, CAM, shifters, ALU elements, and/or ARM cores.
At the end of the day though, the best teacher is to learn by doing and finding venues to ask questions.
Given how ruinously expensive silicon products are to bring to market, it's amazing that there are multiple companies competing (albeit in distinct segments).
FPGAs also seem like a largely untapped domain in general purpose computing, a bit like GPUs used to be. The ability to reprogram an FPGA to implement a new digital circuit in milliseconds would be a game changer for many workloads, except that current CPUs and GPUs are already very capable.
I would love to see the open source world come to the rescue here. There are some very nice open source tools for Lattice FPGAs and Lattice's lawyers have essentially agreed to let the open source tools continue unimpeded (they're undoubtedly driving sales), but the chips themselves can't compete with the likes of Xilinx.
https://en.wikipedia.org/wiki/SystemVerilog
Both languages suck for different reasons but no one has figured out how to make a better language and output a netlist from it (yes, there is an open interchange standard that almost every proprietary tool supports).
As much as I love FPGAs, GPUs really ate their lunch in the acceleration sphere (trying to leverage the FPGA's parallelism to overcome a >20x clock speed disadvantage is REALLY hard, especially if power is a concern) and so it seems the only niche left for them is circuit emulation. Of course, circuit emulation is a sizable market (low volume designs which don't make sense as ASICs, verification, research, etc.) and so it's not exactly a death sentence.
And all this is due to the actually very good open source toolchain, including synthesis (Yosys) P&R (NextPNR, Trellis etc), Verilator, Icarus, Surfer and many more. Lattice being more friendly than other vendors has seen an uptake in sales because of this. They make money on the devices, not their tools.
And even if you move to ASICs, open source tools are being used more and more, esp at simulation, front end design. As an ASIC and FPGA designer for 25 odd years I spend most of my time in open source tools.
https://github.com/tillitis/tillitis-key1 https://github.com/tillitis/tillitis-key1/pkgs/container/tke...
This is because some of the challenges for the synthesis/routing process are effectively NP-hard. Instead, the compiler uses heuristics and a random process to try to find a valid solution that meets the timing constraints, rather then the best possible solution.
I believe you can control the synthesis seed to make things repeatable, I believe that the stochastic nature of the process means that any change to the input can substantially change the output.
Also there is a large gamut and pretty much always has been for decades of programmable logic.. some useful parts are not much more than a mid range microcontroller. The top end is for DoD, system emulation, novel frontier/capture regimes (like "AI", autonomous vehicles).. few people ever work on those compared to the cheaper parts.
Eventually Nokia ended up buying Alcatel Lucent and not too long after he left, not sure what their current strategy is.
That's part of the FPGA business model - they have an automated way to take an FPGA design and turn it into a validated semi-custom ASIC, at low NRE, at silicon nodes(10nm?) you wouldn't have access to otherwise.
And all of that at a much lower risk. This is a strong rational but also emotional appeal. And people are highly influenced by that.
For what it's worth, Xilinx EasyPath was never actually ASIC. The parts delivered were still FPGAs; they were just FPGAs with a reduced testing program focusing on functionality used by the customer's design.
https://www.intel.com/content/www/us/en/products/details/eas...
Anyone who claims to turn a modern FPGA design into an ASIC "automatically" is selling snake oil.
No, not always - I use no vendor IP whatsoever for extremely large designs.
For ASICs is basically required to use fab IP (for physical production/electrical/verification reasons,) but that's absolutely not the case for FPGAs.
Someone has to design each of those reconfigurable digital circuits and take them through an implementation flow.
Only certain problems map well to easy FPGA implementation: anything involving memory access is quite tedious.
I would also question the premise that mem access is less tedious, easy for MCUs/CPU. Esp if you need determinstic performance and response times. Most CPUs have memory hierarchies.
The more practial attempts at dynamic, partial reconfiguration involves swapping out accelerators for specific functions. Encoders, fecoders for different wireless standards, Different curves in crypto for example. And yes somebody has to implement those.
HLS is not good, so I don't know what you are referring to as "modern." I am primarily experienced with large UltraScale+ and Versal chips - nothing has changed in 15 years here.
> basically the same as for an ASIC
What does this even mean, specifically? Use RTL examples. ASIC memory access isn't "easy," either (though it is basically the "same.")
> partial reconfiguration involves swapping out accelerators for specific functions
Tell me you've never used PR without telling me. Current vendor implementations of this are terrible (with Xilinx leading the pack.)
6-digit at the high end.
https://www.digikey.com/en/products/detail/amd/XCVU29P-3FSGA...
These chips are <$3000 new.
Companies make products based around FPGAs and can sell the whole thing for less than you could buy just the single FPGA part for on a place like Digi-key. It's just part of the FPGA companies' business models. In volume the price will be far smaller.
The $140,000 device doesn’t become a $400 device in any volume; it might become a $90,000 device.
VU13Ps are quoted $300/ea at tray quantities from Xilinx, yet are $89k on DigiKey with no price breaks.
Only 47 milliseconds from power-on to operational.
Lattice Avant™-G FPGA: Boot Up Time Demo (12.12.2023)
https://www.youtube.com/watch?v=s4NUVYyLUxc
Lattice make some really cool devices. Not the fastest fmax speeds, but hell if the time to config and tiny power draw don't half make up for it.
Absolute eternity by modern computer standards. GPU will be a trillion operations ahead of you before you even start. Or for another view, that's a whole seven frames at 144Hz.
People say FPGAs will be great for many workloads, but then don't give examples. In my experience the only real ones are those requiring low-latency hardware comms. ADC->FPGA->DAC is a powerful combo. Everything else gets run over by either CPU doing integer work or GPU doing FP.
With the jetsons (agx orin) I have on my desk it would take a bit of tinkering to even get it under a minute.
Another aspect where FPGAs are interesting alternatives are security. Open up a fairly competent HSM and you will find FPGAs. FPGAs, esp ones that can be locked to a bitstream - for example anti-fuse or Flash based FPGAs from Microchip are used in high security systems. The machines can be built in a less secure setting, and the injection, provisioning of a machine can be done in a high security setting.
Dynamically reconfigurable systems was a very interesting idea. With support for partial reconfiguration, which allowed you to change accelarator cores connected to a CPU platform seemed to bring a lot of promise. Xilinx was an early provider with the C6x family IRRC through company they bought. AMD also provided devices with support for partial reconfiguration. There were also some research devices and startups for this in the early 2000s. I planned to do a PhD around this topic. But tool, language support and the added cost in the devices seemed to have killed this. At least for now.
Today, in for example mobile phone systems, FPGAs provide the compute power CPUs can't do with the added ability do add new features as the standards evolve, regional market requirements affect the HW. But this is more like FW upgrades.
ASICs require a certain scale and a very high up-front cost.
Still practically theory as I have never seen anything come of it. It is going up against ASIC design which is a great middle ground for those thing even if it means you are not free to do it yourself.
I think it's not so much about too expensive, but once you've got the resources it will always be better to switch to an ASIC.
Not a hardware engineer, but it seems obvious to me that any circuitry implemented using an FPGA will be physically bigger with more "wiring" (more resistance, more energy, more heat) than the equivalent ASIC, and accordingly the tolerances will need to be larger so clock speeds will be lower.
Basically, at scale an ASIC will always win out over an FPGA, unless your application is basically "give the user an FPGA" (but this is begging the question—unless your users are hardware engineers this can't be a goal).
Profit is dependent on scale. FPGAs are useful if the scale is so small that an ASIC production line is more expensive than buying a couple of FPGAs.
If the scale is large enough that ASIC production is cheaper, you reap the performance improvements.
Think of it this way: FPGAs are programmed using ASIC circuitry. If you programmed an FPGA using an FPGA (using ASIC circuitry), do you think you'll achieve the same performance as the underlying FPGA? Of course not (assuming you're not cheating with some "identity" compilation). Same thing applies with any other ASIC.
Each layer of FPGA abstraction incurs a cost: more silicon/circuitry/resistance/heat/energy and lower clock speeds.
https://www.digikey.co.uk/short/5pz3nnfj
If you don't need programmability, then all that flexibility represents pure waste. But then again, we can make the same argument with ASIC vs CPUs and GPUs. The ASIC always wins, because CPUs and GPUs come with unnecessary flexibility.
The real problem with FPGAs isn't even that they get beaten by ASICs, because you can always come up with a low volume market for them, especially as modern process nodes get more and more expensive to the point where bleeding edge FPGAs are becoming more and more viable. You can now have FPGAs on 7nm with better performance than ASICs with older but more affordable process nodes that fit in your budget.
The real problem is that the vast majority of FPGA manufacturers don't even play the same game as GPUs and CPUs. You can have fast single and double precision floats on a CPU and really really fast single precision floats on GPUs, but on FPGAs? Those are reserved for the elite Versal series (or Intel's equivalent). Every other FPGA manufacturer? Fixed point arithmetic plus bfloat16 if you are lucky.
Now let me tell you. For AI this doesn't really matter. The FPGAs that do AI, focus primarily on supporting a truckload of simultaneous of camera inputs. There is no real competition here. No CPU or GPU will let you connect as many cameras as an FPGA, unless its an SoC specifically built for VR headsets.
Meanwhile for everything else, not having single precision floats is a curse. Porting an algorithm from floating point to fixed point arithmetic is non-trivial and requires extensive engineering effort. You not only need to know how to work with hardware, but also need to understand the algorithm in its entirety and all the numerical consequences that entails. You go from dropping someone's algorithm into your code and having it work from the get go, to needing to understand every single line and having it break anyway.
These problems aren't impossible to fix, but they are guaranteed to go away the very instant you get your hands on floating point arithmetic. This leads to a paradox. FPGAs are extremely flexible, but simultaneously extremely constricting. The appeal is lost.
And it did a good job. The code it made probably works fine and will run on most Xilinx FPGAs.
Solve your silicon verification workflow with this one weird trick: "looks good to me"!
Placement and routing is an NP-Complete problem.
For me, the promise of computation is not in the current CPU (or now GPU) world, but one in which the hardware can dynamically run the most optimized logic for a given operation.
Sadly, this has yet to materialize (with some exceptions[0][1][2]).
Hopefully in an LLM-first world, we can start to utilize FPGAs more effectively at (small) scale :)
[0]https://www.microsoft.com/en-us/research/wp-content/uploads/...
[1]https://novasparks.com/
[2]https://www.alibabacloud.com/blog/deep-dive-into-alibaba-clo...
The issue with FPGAs, is that if you have enough scale, an ASIC starts to make more sense.
Still, I believe they provide a glimpse into the hardware of the future!
GPU/TPU etc are just domain-specific hardware, not much help for exploring new paradigms.
To really solve this, we'll likely need someone who's won the internet lottery and is willing to invest serious capital ($1 billion plus) to solve real problems and get real work done. Until then, it's going to be tech bros playing with VR and LLMs.
I'll see you all in 10 years when the situation still hasn't changed.
> OpenFPGA.. aims to automate the design, verification and layout of highly versatile FPGA architectures. OpenFPGA offers a high-level architecture description language for users to customize their FPGA architectures down to circuit-level details. Based on the architecture modeling, OpenFPGA can auto-generate Verilog netlists, with which users can perform verification as well as generate production-ready layouts using modern EDA tools. OpenFPGA includes a generic Verilog-to-Bitstream generator, as a native EDA toolchain for any FPGAs that are prototyped by OpenFPGA.
2020 DARPA ERI video on open-source accelerated chip design, https://www.youtube.com/watch?v=xKxv7Bdm7Do
2022-2024 presentations on open-source computer architecture, https://oscar-workshop.github.io/Archive.html
> workshop on open-source hardware which addresses the wide variety of challenges encountered by both hardware and software engineers in dealing with the increasing heterogeneity of next-generation computer architectures. By providing a venue which brings together researchers from academia, industry and government labs, OSCAR promotes a collaborative approach to foster the efforts of the open-source hardware community
https://www.righto.com/2020/09/reverse-engineering-first-fpg...
If you want to do general purpose computing, it's my strong (and minority) opinion that routing fabrics are a premature optimization. The trend has been in the wrong direction.
If you were to go the other way, and just build a systolic array of look up tables, as I have hypothesized for years with my BitGrid, you could save 90% of the silicon, and still get almost all of the compute. It gets better when you consider the active logic would only be between neighboring cells, thus capacitive power would be much lower, and speeds could be higher.
[1] https://downloads.reactivemicro.com/Electronics/FPGA/Xilinx%...
Do you have a link where I could read more about "systolic array of lookup tables" and "BitGrid"?
I've no idea what those two things are but it sure sounds interesting.
It's really too bad that it was locked to NI products and they've kind faded away.
I sometimes like to think of what could have been, if the ability to program FPGAs so easily would have become more popular.
And pour one out for Altera that was actually founded a year before but took 5 years to arrive at something similar. It was eventually acqui-crushed by Intel.
The FPGA turns 40. Where does it go from here? (11.06.2025)
https://www.hackerneue.com/item?id=44246700