How Chips are made Worse to get Better
How can you make a chip better by making it worse? It has to do with the height of transistors, the height of cells and the question, how can we continue to scale chips when transistors just don't get smaller anymore. A special technique called "fin depopulation" is one of the reasons why modern chips still continue to scale, even though new process nodes struggle to keep up with Moore's Law. It's a less talked about and often misunderstood technology. Especially when it comes to its most advanced implementation in the form of TSMCs FinFlex. A technology that even experts sometimes seem to get wrong. Lets take a look at the tricks of modern semiconductor manufacturing and the secret behind TSMCs FinFlex. Which I promise, you will be able to understand by the end of this video. In order to figure out what fin depopulation really is, how FinFlex works, and why it's so important for Moore's Law, we have to set the stage with three basic, but very important concepts of semiconductors. The first one are the electrical properties of
transistors. At the most basic level, a transistor is a electrical switch that can control the flow of current between two areas, called "source" and "drain". This control is enabled by physically placing a so called "gate" between source and drain. If the gate is active, it emits a electric field that affects the flow of current between source and drain.
Now you might ask yourself, when does the current actually flow? When the gate is on or when it's off? Does the electric field stop the flow of the current or does it facilitate it? The answer is: it depends. There are actually two different types of transistors, "N"-type and "P"-type. Since we are talking about electricity, "N" stands for negative, and "P" for positive. A N-type, or NMOS transistor, uses n-type material in the source and drain regions of the transistor, while the body of the transistor is made out of the opposite, p-type material. If you apply a positive voltage to the gate, it emmits a electric field that attracts the electrons inside source and drain, which then forms a channel and lets the current flow through the p-type material. If you turn the gate off, the electric field stops and the flow of current stops too. A P-type, or PMOS transistor, uses p-type materials for source and drain and n-type for the body of the transistor. A PMOS transistor works when you apply a negative voltage to the gate, which repells the electron holes in the n-type body and allows the positive source and drain regions to form a channel. NMOS and PMOS are very similar,
they just function in electrically opposite ways. A NMOS transistor needs a positive voltage to be turned on, a PMOS transistor needs a negative voltage. As such they also have their own advantages and disadvantages. NMOS switches faster and PMOS is more power efficient. For a while, NMOS and PMOS transistors fought over which one is the best to use in semiconductors. Today, almost all semiconductors use so called CMOS transistors, where the "C" stands for complementary. CMOS is a transistor type which combines one NMOS and one PMOS transistor. It really just two transistors next to each other. CMOS offers the best combination of switching
speed for high performance and energy efficiency. But it also means, CMOS transistors are larger, because they are quiet literally made out of two transistors, instead of only one. That's the first concept we need to understand. CMOS means double the transistors. The second concept we have to understand in order to make sense of fin depopulation are the actual fins that are being depopulated. What are those? When we talked about NMOS, PMOS and CMOS, we've talked about the electrical properties of the transistor. But there's another aspect to a transistor: its physical form. How source and drain and gate are manufactured. Up until around 10 years ago, transistors were manufactured on a planar space
and fittingly called planar transistors. In this layout, the gate sits above the channel between source and drain and the electric field affects the channel from the top. But this 2D approach has a big flaw. If you want to reduce the size of a transistor, in order to increase transistor density, which is what new process nodes are supposed to do, you have to shrink the size of all its components. This means that source and drain are getting
closer and closer to each other and at the same time, the gate itself also gets smaller. At some point, source and drain are physically so close to each other, that not even the strongest gate can control the flow of the current anymore. So called "short channel effects" ruin everything and the electrons just do whatever they want, no matter if the gate is turned on or off. The planar transistor couldn't scale any further. In 2012, Intel pioneered a new type of 3D transistor, called a FinFET. In FinFETs, source, drain, channel and gate are build as elevated 3D
structures. This means the gate wraps around the channel from three sides instead of just one, which allows the electric field of the gate to assert greater control over the channel. 3D fin-based transistors enable stronger gates, so source and drain can be physically closer without negative short channel effects. Which means you can continue to make transistors smaller. If we look at the cross-section of a FinFET, we can actually see the physical height of a FinFET transistor, as the fins are build upwards from the silicon wafer. They are called fins, because they look somewhat like the fin of a fish. And while early FinFET nodes had less pronounced fins, their structure and height started to improve with following process node. In this comparison
we can see Intel's first 22nm FinFET node next to 14nm and 10nm fins. The 14nm process already shows a great improvement and the 10nm node, now called Intel 7, has much taller fins. But this isn't just for show, there's a reason why fins are being build taller with each new generation. In a FinFET, the height of the fin is pretty much proportional to the transistors conductive strength. By increasing the height of the fin,
the transistor can transport more current. So with the evolution of the FinFET, the transistors not only got smaller, as in they were more tightly packed, but also better at the same time. Right now we are at the cusp of new generation of transistor, the successor to the FinFET. I've
talked about it in detail in my last video, go check that out if you are interested. But we are not quiet there yet. Almost all current process nodes still use FinFETs, with TSMCs N3 process node family being the most widespread one. That's why further optimizing fin-based transistors is important to improve transistor scaling, even in the years to come. So far, we have covered two of the three basic concepts we need in order to understand what fin depopulation is and how FinFlex works. We know that current transistors are CMOS based,
meaning they combine a NMOS and a PMOS transistor. And we know that these transistors are build as so called FinFETs, with elevated 3D structures. The higher the fins, the better the conductive performance of the transistor. The third and last concept has to do with how these transistors are combined to actually serve a pupose. Because a electrical switch alone
doesn't do anything besides turning on and off. Even if it's super fancy with CMOS and FinFET. If you want to build a chip, lets say we're designing a new CPU, you need to create structures that can perform calculations, store and read data, and much more. That's where the so called logic gate comes in. A logic gate takes an input in the form of voltage and produces an output, also in the form of voltage. Perfect for a transistor,
which work with voltages as in- and output. Logic gates always act in a predictable manner. For example, a logic gate might have two inputs and only if both inputs are the same, as in the same voltage, there is an output. If the input voltages are different, there is no output.
This logic gate could be used to check if outputs from other logic gates are the same or not. There are many different types of logic gates. The most common ones perform operations like AND, OR, NOT-AND (which is also called NAND) and many more. Don't worry, you don't need to understand how they work in order to understand FinFlex. Logic gates are first described as a technical diagram, but in order to do actual work, they have to be build by using transistors. And that's where the CMOS FinFET transistors come in. Based on the electrical diagram of the logic gate,
transistors are combined in such a way, that they perform the same function as described by the electrical drawing. For example, only generate an output when two input voltages are the same. Let's get back to our chip design example. We are still working on our new CPU, which means we need to combine a lot of different logic gates into even larger structures, so that these combined logic gates can actually perform calculations. But we are in luck, because we don't have to worry about creating the logic gates that we need to build our CPU. Our foundry partner, such as TSMC, Samsung or Intel Foundries, already did that for us. When you design a chip on a specific process node, your foundry partner already prepared a large variety of different logic gates for you to use. And I'm not talking about the electrical drawings,
I mean real world implementations based on transistors of the specific node you selected. They are all part of the process node development kit and are commonly called "standard cells". There are so many of them that there is even a "standard cell library". When someone talks about libraries related to process nodes, that's what they are talking about. It's literally
a huge library of pre-designed logic gates to choose from. What a great customer service. I'm guessing that's why foundries are so expensive. But it gets even better. Because not only did our foundry partner already pre-designed every logic gate we could possible need as a standard cell, they also offer different versions of the same logic gate for us to choose from. But why would they do that? What's the use for different versions of the same logic gate? Every time I'm approaching a complex topic in semiconductors I try to decrease complexity by looking back in time. It's often easier to understand a concept in a chip that was designed 20 or even 30 years ago. A fantastic resource for that is Ken Shirriff's blog, which I not
only highly recommend, but it also provides the perfect explanation for our question regarding different implementations of the same logic gate. Ken Shirriff has a blog post about reverse engineering a Intel 386 processor from 1985, which already used CMOS transistors and a standard cell approach. Let's take a look at how Intel was designing CPUs back then. Like every CPU, Intel's 386 also made heavy use of the NAND logic gate. Ken Shirriff went the
extra mile and produced high-res die shots, that allow us to take a look at real photos of the NAND standard cells inside the 386. As you can see, it's a CMOS based standard cell, as it uses both NMOS and PMOS transistors. This specific combination of transistors is able to perform a NAND logic operation, that's why it's a NAND standard cell. But there's more. The Intel 386 also implemented much smaller and much larger NAND standard cells. All three cell types perform the same function. But they do so with different electrical characteristics. The small NAND cell uses the same amount of transistors as the normal one,
but each transistor is smaller. The gates are only about half as wide as on the normal sized NAND cell, which reduces the output current. The large NAND cell, although performing the same function as both the normal and the small cell, uses more transistors. Ken Shirriff notes that it's essentially two standard NAND gates in parallel. This allows
the large cell to provide double the current. Even back in 1985, the same logic gates were implemented in the form of different standard cells. Each with their own unique electrical and density characteristics. Smaller cells take up less space and use less power, but also provide lower current. Larger cells provide higher currents, but use more power
and need more space. It's always a trade off. The reason why engineers are creating these different versions of the same logic gates is that different areas of a chip have different clock speed and current requirements. If we are selecting a standard cell to use inside our high performance CPU cores, we want to select the extra beefy implementation that can handle very high switching speeds. Our low power efficiency cores on the other can do with the normal cells, we save power and silicon area at the same time. Less important areas of the chip might even be best suited for the half-size cells, because we don't need a lot of current there. Even if
we had infinite die space available, using only large cells would lead to a very high power consumption, something we want to avoid. Once we have chosen the standard cells we want to use, we have to place them on the chip next. Something that of course doesn't happen at random, there's always rules to follow. Standard cells are placed in so called "cell rows", which is exactly what it sounds like: a row of standard cells next to each other. And many of these rows combined form a so called logic block. With chip design, it's all about efficiency. You start with the smallest parts, transistors, and build up from there. Transistors are
used to create standard cells, which go into rows, which build the logic blocks. And many logic blocks combined create a chip. Actually pretty simple, if you look at it that way. Now we have all the ingredients. We know that CMOS transistors are a combination of N- and PMOS, that current gen transistors are build with elevated 3D fins and that these transistors are then used to design various types of standard cells which are placed in neat little rows. And the standard cell is exactly where "fin depopulation" is starting to work its magic.
Remember the reverse engineered Intel 386 from Ken Shirriff's blog? While it was a CMOS design, it was long before the invention of FinFETs, so it used planar transistors. Today, standard cells are based on FinFET transistors. And FinFET transistors don't just have a single fin, they usually come in flavors of between two and four fins per transistor. The reason for that is performance, drive strength and reliability. As you saw before, early FinFET nodes didn't have very pronounced fins. Multiple fins were needed to achieve the required transistor performance. That's why FinFETs use multiple fins. And since we are talking about CMOS transistors, which contain two transistors, the actual number of fins is doubled. Now we are talking about between four and eight fins
per CMOS FinFET. That's a lot of fins. And these fins take up space. Not because of their height, but because these fins are sitting next to each other, which takes up space on the silicon waver. In the past, FinFET process node scaling was achieved by reducing the so called Fin Pitch, how close the individual fins are to each other, and the Gate Pitch, which determines how close the gates are too each other. If we look at a 3D representation of FinFETs, we can see that Fin and Gate Pitch are related to the size of the structure. If we reduce these, we can pack more transistors into the same area, thus increasing transistor density. When it comes to the size of cells, the next step up from transistors, the most important variable is the"cell height". Though the term "cell height" is a bit confusing,
because unlike transistor height for FinFETs, we are not actually talking about height as in going upwards from the silicon wafer. Instead, the layout of most cells is depicted by looking at the cell from above. In this view, cell "height" is more akin to cell "length". Reducing the fin pitch reduces the length of the cell and thus its area. But the miniaturization has slowed down. New process nodes are not able to achieve significant reductions in fin pitch and gate pitch. And that's where fin depopulation comes in. On the left we can see a Intel 7 standard cell. That's the
process node used for Alder Lake and Raptor Lake CPUs. We can see the CMOS character of the cell, with PMOS transistors on top and NMOS transistors below. And we can see that each area is using a 4-fin implementation. The whole cell has eight fins in total. Now let's compare that to a cell from the new Intel 4 process node. That's
Intel's first EUV node and it's used to manufacture parts of Meteor Lake. As you can see, the Intel 4 cell is a lot smaller than the Intel 7 cell, and that's due to multiple factors. First of all, Intel was able to reduce the space between the P- and NMOS areas of the cell. Next, Intel 4 also offers a slightly reduced Fin Pitch, so the node still does some traditional scaling. But the biggest change is the reduction in fin count. Instead of four fins, the Intel 4 based standard cell is designed with only 3 fins. And since it's
a CMOS transistor, both P- and NMOS transistors get a reduction in fins. The Intel 7 cell on the left has 8 fins, while the Intel 4 cell on the right has only 6 fins. This reduction in area isn't due to shrinking features with a more advanced process node, it's simply based on removing features. The fins are being depopulated. On paper, it's a worse transistor. But it doesn't have to negatively affect the performance, because as you remember, the height of the fin is proportional to its conductive strength. It's possible that a 3-fin transistor based on Intel 4 could offer a better performance than a 4-fin Intel 7 transistor, if the increased fin height adds more conductive strenght than what is lost by removing one fin. This example is based on a Intel node, but it's the same for all other process nodes, no matter if TSMC, Samsung or any other FinFET design. At the beginning of FinFETs,
designs with four or even more fins were the standard. Nowadays, even three fin cells are usually reserved for high-performance cells, while two fin designs are becoming the new standard. And 1-fin cells are not too far away. That's why recent FinFET process nodes still offer a decent increase in transistor density. Because once the traditional scaling of FinFETs slowed down, the foundries started to reduce the amount of fins used in their standard cell libraries.
Modern FinFET process nodes are intentionally made worse, by removing features fundamental to the performance of the transistors. But because the ever increasing height of the fins can somewhat compensate the removal, and removing them achieves higher density, the result of a worse transistor is a better chip. At least until you have remove all the fins. Because just like scaling the planar transistor, there is a finite amount of fins to remove. You will always need at least one per transistor and two per CMOS FinFET. Now that we know what fin depopulation is, how it works and what it actually achieves - what about FinFlex? TSMC has been hyping their FinFlex technology and at the same time, many people still misunderstand what it actually does.
At its core, FinFlex is a type of fin depopulation. But it has one advantage, that normal fin depopulation techniques don't have. It's a lot more flexible, hence the name. To understand the difference, we have to circle back to standard cells and how they are placed. And here Ken Shirriff's Intel 386 deep dive comes in handy again. As I mentioned before, we can't just place the standard cells we selected from the cell library anywhere we want. We have to follow certain design rules. And one of them is that standard cells are always placed in rows. These cell rows are exactly what it sounds like,
just different cells next to each other. On the Intel 386 die-shot we can actually see the cell rows with our bare eyes. The combination of multiple cell rows then creates a logic block. But not only do we have to place our cells in rows, we also have to follow another important guideline: only cells with the same cell height can be placed together, and by extension this also applies to the whole logic block. Remember, cell height was that term that refers to
the length of a cell and not its actual height. The reasons for this rule is simple. First of all, it simplifies placing the cell with automated processes. Second, placing the cells is just the first step of manufacturing. In a second step, all the cells have to be connected with tiny wires,
so they can actually input and output voltage. That's the so called metal layer. Manufacturing the metal layer is already a very complex production step, as you have to connect billions of individual transistors to super small wires. If the cells placed in a row would have different cell heights, planning and placing the metal layer would be even more complex. It's not impossible, but because of automation and workflow optimizations, cells inside the same row and the same logic block usually use the same cell height. And since height is determined by the amount of fins, that means the cells in a logic block use the same amount of fins.
And that's where the unique properties of TSMCs FinFlex come into play. With the N5 process node, the standard cell library was based on a 2-fin design. The N3 process node also offers a 2-fin standard cell library, but adds two new options: a 3-2 fin and a 2-1 fin based library. And no, this doesn't indicate different fin counts for N- and PMOS transistors. Instead, TSMC is adding the option of so called alternating row heights. You still have to use cells with the
same height for a cell row, but with FinFlex, the very next row can now use a different cell height. The 3-2 FinFlex library for example alternates rows with three fins and rows with two fins. That means, a single logic block can now contain cell rows with different cell heights. As a result, chip engineers have more options to fine-tune the process to achieve the highest transistor density. Before FinFlex, you could choose if you want to use cells based on a 3-fin or on a 2-fin design. That was the entire amount of customization. With FinFlex, you now have two more in-between steps.
A combination of 3-fin and 2-fin cells inside a single logic block results in an average of 2.5-fins. And the combination of 2-fin and 1-fin rows has a average fin count of only 1.5. FinFlex is explicitly not the use of different libraries in different areas of a chip, like some claim. This has been always been possible, even before FinFlex. FinFlex brings this flexibility into the logic blocks itself. With FinFlex, TSMC offers a 1.5-, a 2- and a 2.5-fin cell library. Something that no other foundry is able to offer at this point. But it's not all sunshine. Placing alternating rows with different fin counts increases design complexity. You have to make sure that very important logic gates and so called critical
paths are not placed in the rows with the lower fin count, where they would suffer from reduced performance. Instead, it's important to use the alternating rows in such a way that less important cells are placed on rows with lower fin counts, while important cells can benefit from the rows with more fins. Of course this is all done in software. No modern chip can be designed without Electronic Design Automation. Companies like Cadence and Synopsis already have updated their tools for FinFlex designs. As process nodes and their design rules becomes more complex, EDA providers will continue to be on the winning side. SemiAnalysis has a very interesting blog
post about the use of FinFlex and other ways to increase transistor density outside of the process node itself. It's very in-dept, but they give real world examples and explain how it affects different companies and products. And if you understand the basics of fin depopulation, it might be up your alley. I've put a link in the video description.
Chip design and semiconductor manufacturing have always been a complicated subject. But when we break it down into little pieces, even complex technologies become tangible. And I hope I didn't promise too much at the start of the video and you now actually understand the concept behind fin depopulation and what makes FinFlex so special. Fin depopulation and FinFlex are not a very hot topic. But considering their importance, I think they should be. They are one of the major reasons why FinFET process nodes still continue to scale. Without fin depopulation and other design technology co-optimization, Moore's Law would have died long ago. Cutting
the fins off of transistors has been a lifeline for semiconductor manufacturing. Even though, at face value, the idea looks like a brute force approach. But at some point, you just have to start cutting things off, if you want to stay competitive. Even in the semiconductor industry. I hope you found this video interesting and see you in the next one.
2024-09-01 18:05