Architecture All Access Modern FPGA Architecture Intel Technology

Show video

If you've talked on a cell phone or browsed the Internet, you've more than likely benefited from an FPGA. While not commonly known outside of technology circles, FPGAs, or Field Programmable Gate Arrays, are more often used in applications where huge amounts of data must be processed and routed in real time. Invented in the mid 80s, FPGAs have rapidly added capabilities over the past decade, and are being used in an expanding range of applications: Everything from medical imaging equipment, to factory automation, to AI speech recognition systems, to broadband cellular networks. And what sets the FPGA apart from other electronic devices is that it can be programmed at any time to update or even completely change its functionality. It's truly the chameleon of the electronics world.

Thanks for joining me here on Architecture All Access FPGA. First, a bit about myself. My name is Prakash Iyer, and I've worked for close to three decades in this computing industry as a technology designer and implementer, almost all of it at Intel after my graduate studies. I've experienced three major epochs during this time, each energizing the kid in the candy store inside of me to make an impact.

My first decade was all about the compute spiral. More software begets more and faster CPUs, which then begets more software. The second decade was about creating mobile connected- everywhere devices and unifying the experiences around data, voice, and media. The third decade, building on, has been about Big Data and intelligence and acceleration of data processing in industries all around us, which is precisely the setup that was tailor made for FPGAs. In my talk today, we'll look at the basic building blocks of the FPGA, and how those building blocks are combined to create the FPGA as we know it today.

We'll also talk about how FPGAs are programmed, and review the cutting edge applications that FPGAs are enabling. Ready? Let's get started. Field programmable gate arrays, or FPGAs as they're called by the community of engineers that use them, are truly a modern marvel.

As a graphic shows at their most basic level, they let designers build a computer chip from scratch. The job of an FPGA developer is more like of someone who's designing CPUs rather than someone who writes an application that runs on a CPU. When most software developers code in a language like Python or C++, they're describing what operations they want to happen on the data. In a CPU, those two numbers are fetched from memory, they're fed into a digital circuit with an instruction that says to add the two numbers together, and the result comes out and can be used however the programmer needs.

FPGA developers are writing code that describes a physical structure to be built, compared to a programmer who already has a structure and is controlling how the structure operates or is used. The FPGA developer, in the case of adding two numbers, would write code that would describe the digital circuit that adds the two numbers together. But maybe we are getting ahead of ourselves here. Let's take a step back and first talk about digital logic design, what it is, and how it relates to an FPGA.

Before we can truly understand FPGAs, we need to understand the basic building blocks for these devices. The first thing you need to understand is that hardware designers use digital logic design to create all of the structures that make up a modern chip like a CPU using three basic gates. These are the AND gate, the OR gate, and the Inverter or NOT gate.

The AND gate means that both inputs must be a 1 for the output to be a 1. The OR gate output is a 1 when either of its inputs is a 1, and the inverter simply transforms a 0 to a 1, or a 1 to a 0. DeMorgan's Theorem states that with these three gates we can build all digital logic circuits. Now underlying these gates there are transistors created out of semiconductor material.

While that is beyond the scope of what we want to get into, these details are abstracted away to make the FPGA designer's life much easier. Now, there are a couple more concepts we need to talk about before we delve into FPGAs. The first is a concept of a clock and a flip flop.

A clock is a voltage that rises and falls at a set frequency. The time between the rising edges of the clock is a period which is measured in seconds, or some fractions thereof. Once we know the period, we can figure out how many times a clock rises and falls in some unit of time.

You'll often see this number when you buy a new computer called out as megahertz or gigahertz. FPGAs typically operate in hundreds of megahertz, meaning the clock rises one hundred million times in a second. The second most important component for a logic design is a flipflop.

The flipflop, also called a flop in short, is a storage device that can be created using the gates we just described. But what makes a flipflop useful is that its input, which is called the D input, is only captured when the clock goes from a 0 to a 1, and that output is then stored on the output of the flipflop. No matter if the D input changes, the value of D won't be replaced on the Q output until the clock rises again.

Let's look at how we can use logic gates to build a simple function that adds two numbers together. Since transistors can only handle ones and zeros, all of our math must also only use ones and zeros. Here is what happens when you add two single digit binary numbers together. If I add a 0 and a 0, I get a 0. If I add a 1 and a 0 or a 0 and a 1, I get a 1. But if I add a 1 and a 1, I should get a 0 because there is no 2 in the binary system.

Instead I would assert the carry bit, saying that the addition overflowed. This is very similar to what you do in decimal math when you add from right to left. If you now add flipflops to the output of adder circuit, then every time the clock rises our circuit will add two numbers, and carry on the input, and store the result on the output of the flipflop to be used by some other part of our circuit.

Congratulations. You've now taken your first step to becoming a logic designer. If you were to now go on and build a structure in a silicon device, using the latest fab process, you would need to spend tens if not hundreds of millions of dollars. But the FPGA, which is also built on these latest fab processes, lets designers build a circuit at a fraction of the cost. Now that you understand basic logic gates, flip flops, clocks, you know the basics of what it takes to make an ASIC.

An ASIC is a custom built chip that only a single company uses. Now, these chips are all around you, from your microwave, to your smartphone, to your car. The semiconductor material this chip is built from might be tweaked for the application.

For instance, a low power wireless camera compared to a high-end gaming console. The benefit of designing these chips is that they're extremely fast and power efficient for the target application. The downside is that they cost millions of dollars to design, and if you mess up the design, say you build a subtractor instead of an adder, they can cost millions of dollars to fix, and take months and months to build a new device. And finally, they only really work for a specific set of applications. This is where the FPGA can really shine. While it is not quite as fast or quite as low power as what you built with an ASIC, an FPGA can be reprogrammed in a matter of seconds with a new feature, a new application, or a fix for a current application, even being reprogrammed when it's deployed in the field.

Where the per unit cost is quite a bit higher to a specific ASIC, mid-volume applications often are more cost beneficial when built with an FPGA because of the ability to modify the implementation to adapt to the evolving solution requirements. Now let's take a look at the basic building blocks of an FPGA. The first fundamental building block that started this whole FPGA revolution was a Lookup Table. We call it a LUT, and that's connected to a flipflop. As this picture shows, the lookup table allows for logic functions to be programmed.

Now, how do we do this? We do this by populating the outputs of a logic function for some number of input variables into a specific set of memory locations. And we call these the LUT masks. And then we use logic input values to select the outputs of a set of multiplexers to produce an output. Suppose we have a four input logic circuit where the output is a 1 only when exactly two of its inputs are a 1, and that's what's shown in that truth table alongside that graphic.

The outputs are loaded into this LUT RAM mask bits, then when an input is applied, in my example, I've chosen 0 1 0 1 for A, B, C, and D, the appropriate value is selected from the mask and presented at the output Y. These LUTs and flipflops, as you can now see in this graphic, are now put together into larger structures called Adaptive Logic Modules, or ALMs. As FPGAs have evolved, FPGA architects have added additional helper logic to make common functions like addition easier to implement.

The basic LUT has also been enhanced so that it can perform as a single 8 input LUT, or split into two smaller LUTs, depending on the logic that needs to be implemented. The other fundamental building block of an FPGA is what we call Programmable Routing and Interconnect. As the saying goes, you buy the FPGA for the logic, but you pay for routing. In modern day design, circuits can be incredibly complex.

In ASIC designs, the routes are physically built as a wire that connects these flipflops and logic gates. In FPGAs, these same routing wires are designed in a huge array that can be programmed to connect these LUTs and flops. The programmability is what makes the FPGA so flexible compared to the ASIC, where the programmability comes at the cost of making the wires slower in comparison. Imagine a top down view of a city. The roads would be our routing and interconnects.

The buildings would be our logic modules. A typical city has roads which allow traffic to flow between the lots of different buildings, so there isn't really a programmed route. Empty buildings also don't really have a purpose, so they don't have a specific task that they perform. This is the state that an FPGA starts in. The act of programming an FPGA is like configuring each building for a specific task, such as a restaurant or an office building, and setting the streets as a series of one-way routes.

This lets us route the traffic, the data, from one building or a logic module to another in a predefined way. It's like extreme city planning. As FPGAs have evolved, more and more hardware features have been added to make them both easier to use and better suited to a larger variety of applications. The first hardware feature to be added to FPGAs were larger on-chip writable memory blocks.

The larger memory blocks are in the range of tens of thousands of bits, with multiple read and write interfaces, with hundreds or even thousands of these memory blocks dispersed all through the FPGA. Along with memory, Digital Signal Processors, or DSPs, were added to increase the computational power to speed up floating point and other matrix math functions that are resource intensive to implement in discrete logic gates. Finally, many recent FPGAs have added complete processing subsystems that have CPUs and CPU peripherals built right next to the FPGA fabric. All of these building blocks are then paired with a rich set of programmable IOs to serve a broad range of applications, from video and media, to Ethernet, to platform connectivity. Modern FPGAs implement these IOs using chiplets, or tiles, that are then connected to this main FPGA chip. This not only allows different combinations of IOs to be introduced into our products, but it also allows IO features to evolve on a different time scale from the main FPGA.

To go back and to extend our city analogy, memory blocks are like warehouses spread throughout the city where we can store data close to where it needs to be processed at a later time. DSPs and these CPU blocks are specialized buildings that can process the data with a fraction of the power, and the programmable IOs are those highways that connect our city to cities off our chip. The dilemma that FPGA designers face is how to balance programmable and hardened features.

Since FPGA developers buy the entire chip, they are still paying for features that they may not need or use. On the other hand, if there are too few hardened features, then the device may not be competitive in the market because too much of the precious programmable logic has to be used to implement features that could have been hardened. And hardened features are typically faster and lower power than what can be implemented in the FPGA programmable area. So what you end up with in the modern FPGA is a mixture of programmable logic and hardened functionality such as CPUs, or networking and storage protocol logic, security processes, cryptographic algorithms, or AI specialized DSPs. FPGAs are no longer a sea of gates. Rather, they are a complex mixture of programmable logic and hardened optimized digital logic blocks, often targeted at specific markets, but still very useful in a wide variety of general markets.

A discussion of FPGAs would not be complete without a discussion of how one programs for the FPGA. When FPGA developers write code, they are writing code to build the logic inside the FPGA, just like someone building an ASIC would do. And this is often done in one of two popular languages called VHDL or Verilog. Now once a hardware design is complete, though, there is some amount of software that needs to be designed to interface the FPGA, and in the cases where the FPGA has an embedded CPU, software has to be developed for that CPU. More recently, there has been a push in the industry to abstract the programming language of the FPGA even further. So in this case, the developer will write in a high level language like data parallel C++ using a framework like oneAPI.

In this design entry method, the designer explains the transforms that they want to happen to data, and then the language will figure out the right structure to build into a targeted FPGA. You'll often see a FPGAs in applications that are leading edge and requiring real time low latency decisions to be made. We see these types of applications across a variety of use cases, from real time Internet of Things and embedded edge applications, to network security, to wireless networks, to cloud and enterprise data centers. The other application area we're very excited about is AI.

AI is still in its infancy and we see AI in just about every industry. It's an interesting area in that the technology continues to evolve, which means there are newer and newer algorithms, newer and newer protocols which are being defined and deployed, and those lend themselves very well to FPGAs, because FPGAs have those programmable blocks on which you can actually implement those AI algorithms and deploy them. The capabilities of the programmable fabric and the hardened logic in FPGAs will continue to evolve in time as this industry evolves. In the bike race of technology, the FPGA is a sprinter that will jump out ahead of the pack. If the market lasts long enough, or has enough value, an ASIC will eventually be made that will overtake the FPGA.

But the number of races that the FPGA can compete in is greater than ever, and even expanding as FPGAs add more and more hardened functions. We expect to see a very broad range of FPGAs find increasing applicability. If or how the FPGAs may converge with ASICs, only time will tell.

I hope you found this enlightening, and thanks for joining me here on Architecture All Access FPGA.

2021-05-17

Show video