（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40156275

TSMSCO（台湾积体电路制造公司）继续在晶体管密度竞争中处于领先地位，据称每平方毫米 (MTr/mm²) 拥有 197 个兆晶体管。他们的竞争对手三星和英特尔分别以 150 MTr/mm2 和 123 MTr/mm2 落后。随着即将推出的 1.6 纳米工艺，台积电的目标是达到 230 MTr/mm² 左右。英特尔在这一领域的落后构成了重大挑战。双图案技术 (DSAT) 是一种用于制造更小、更均匀特征的方法。尽管它并不是全新的，但它是制造先进半导体的基本技术。它的实现可以提高晶体管密度。然而，图案化方法的选择具有影响。例如，双 EUV 集成可以利用间隔辅助来最小化线边缘粗糙度 (LER) 并增强临界尺寸不确定性 (CDU)。相反，覆盖树往往会形成更茂密的结构，通过 CD 方差增加，这可能导致严重的电方差。这些复杂性凸显了该领域持续创新的重要性。尽管台积电取得了实质性进展，但英特尔的问题超出了晶体管密度。该公司业绩下滑的根源在于战略失误，包括过去与 NVIDIA 就 CUDA 问题发生的纠纷，以及对人工智能 (AI) 和高性能计算 (HPC) 等新兴市场缺乏关注。英特尔未能创新和适应不断变化的环境，使其相对于竞争对手处于不利地位。总而言之，台积电在晶体管密度方面取得的令人印象深刻的进步继续使其与竞争对手三星和英特尔区分开来。双图案化技术在增强小型化和降低功耗方面发挥着至关重要的作用。技术主导地位的竞争仍在持续，竞争对手不断努力创新并克服先进半导体制造带来的挑战。

Comments about the marketing driven nm measurements aside, this still looks like another solid advance for TSMC. They are already significantly ahead of Samsung and Intel on transistor density. TSMC is at 197 MTr/mm2 wile Samsung is at 150 MTr/mm2 and Intel is at 123 MTr/mm2. This 1.6nm process will put them around 230 MTr/mm2 by 2026. When viewed by this metric, Intel is really falling behind.

Intel has a 1.4nm process in the pipeline for ~2027. They just took delivery on their first high NA EUV machine in order to start working on it.

Their gamble however is that they need to figure out DSA, a long storied technology that uses self-forming polymers to allow less light to sharply etch smaller features.

If they figure out DSA, they will likely be ahead of TSMC. If not, it will just be more very expensive languishing.

The nomenclature for microchip manufacturing left reality a couple generations ago. Intel’s 14A process is not a true 14A half-pitch. It’s kind of like how they started naming CPUs off “performance equivalents” instead of using raw clock speed. And this isn’t just Intel. TSMC, Samsung, everyone is doing half-pitch equivalent naming now a days.

This is the industry roadmap from 2022: https://irds.ieee.org/images/files/pdf/2022/2022IRDS_Litho.p... If you look at page 6 there is a nice table that kind of explains it.

Certain feature sizes have hit a point of diminishing returns, so they are finding new ways to increase performance. Each generation is better than the last but we have moved beyond simple shrinkage.

Comparing Intel’s 14A label to TSMCs 16A is meaningless without performance benchmarks. They are both just marketing terms. Like the Intel/AMD CPU wars. You can’t say one is better because the label says it’s faster. There’s so much other stuff to consider.

"Likely better" doesn't come from 14A vs 16A. It comes from Intel using High NA-EUV vs TSMC using double pattern Low NA-EUV.

If Intel pulls off DSA, they will be using a newer generation of technology compared to TSMC using an optimized older generation. Could TSMC still make better chips? Maybe. But Intel will likely be better.

I am not sure where that would come from. There is nothing about dsa that means this.

Dsa is one of many patterning assist technologies, just...an old one. Neat, but not 'new'. You use patterning assist to make smaller, more regular features, which is exactly what the 16a vs 18a refers to.

That has somewhat less to do with performance, which is tied as much to material, stress, and interface parameters. Nothing gets better from being smaller in the post dennard scaling era, the work of integration is making better devices anyway.

Patterning choices imply different consequences. For example,.a.double euv integration can take advantage of spacer assists to reduce ler and actually improve cdu even with a double expose. Selective etch can improve bias, spacer trickery can create uniquely small regular features that cannot be done with single patterns. Conversely, overlay trees get bushier, and via CD variance can cause horrific electrical variance. It is complicated, history dependent, and everything is on the developmental edge.

DSA is what is going to make it possible for Intel to compete at all. Without it, they are going to have fancy machines in fancy foundries that are too expensive to attract any customers.

To the best of my knowledge, DSA never made it out of the lab.

But still, what is stopping others from also developing DSA? I am not sure the technology alone will be Intel's savior. They've been on the decline for a while, ever since they took a jab at Nvidia for releasing CUDA, they demonstrated a narrow vision, consistently, and now they're playing a catch up game.

> If Intel pulls off DSA, they will be using a newer generation of technology compared to TSMC using an optimized older generation. Could TSMC still make better chips? Maybe. But Intel will likely be better.

Is Intel working on "an optimized older generation" as a backup plan? I don't follow semiconductors very closely, but my impression is the reason they're "behind" is they bet aggressively on an advanced technology that didn't pan out.

From what I remember the aprocyphal story is that Intel dragged their feet on adopting EUV and instead tried to push multi-patterning past it's reasonable limits.

If that's the actual root cause, then Intel's lagging is due to optimizing their balance sheets (investors like low capital expenditures) at the expense of their technology dominance.

Benchmarks are tricky because it all depends on workload and use case. If you are in a VR headset for example, it’s all about power envelope and GPU flops. If you are in a Desktop used for productivity it might be all about peak CPU performance.

When comparing fab processes, you wouldn't want performance of a whole processor but rather the voltage vs frequency curves for the different transistor libraries offered by each fab process.

Do you know the name of the company that produces the EUV machine? is it ASML?

It is my understanding that only ASML had cracked the EUV litography, but if there's another company out there, that would be an interesting development to watch.

>It is my understanding that only ASML had cracked the EUV litography

Ackshually, EUV was cracked by Sandia Labs research in the US, with EUV light sources built by Cymer in the US. ASML was the only one allowed to license the tech and integrate it into their steppers after they bough Cymer in 2013. Hence why US has veto rights to whom Dutch based ASML can sell their EUV steppers to, as in not to China, despite ow much ASML shareholders would like that extra Chinese money.

>EUV was cracked

More like it was started. There were a ton of gnarly problems left that took over ten years and billions of € to solve.

Producing a few flashes of EUV and getting a few photons on the target is relatively easy. Producing a lot of EUV for a long time and getting a significant fraction (...like 1%) of the photons on the target is very hard.

EUV was supposed to ship in the early 2000s and the work on it started in earnest the 90s. It turned out that EUV is way harder than anyone imagined. The only reason we did stuff like immersion DUV was because we couldn’t get EUV to work.

>They don't need anyone else's money

There's no such thing in capitalism. Limited supply semi is(was) a bidding war, where China had a blank cheque and was willing to outbid everyone else to secure semi manufacturing supremacy.

Do you think Intel or TSMC could have scored all that ASML supply at that price if China would have been allowed to bid as well? ASML would have been able to score way higher market prices per EUV stepper had China been allowed in the game, and therefore higher profits.

You think ASML shareholders hate higher profits or what? Nvidia sure didn't during the pandemic. You wanted a GPU? Great, it would cost you now 2x-3x the original MSRP because free market capitalism and the laws of supply and demand.

Well, "extreme UV" these days. 13.5nm, larger than the "feature size". And even that required heroic effort in development of light sources and optics.

Translating 197 M/mm2 into the dimensions of a square, we get a dimension of 71nm. If we compute the "half-pitch", that's 35.5nm.

230 M/mm2 translates to 33nm "half-pitch".

Of course, transistors aren't square and aren't so densely packed, but these numbers are more real IMO.

Stupid beginner question: is MTr/mm² really the right thing to be looking at? Shouldn't it be more like mm²/MTr ? This feels kind of like these weird "miles per gallon" units, when "gallons per mile" is much more useful...

gallons per hundred miles make much more sense than that.

incidentally, this is the measure rest of the world is advertising, except usually in liters per 100km.

there's a good reason for this: comparisons linear instead of inversely proportional. 6l/100km is 50% better than 9l/100km. 30mpg vs 20mpg is... not as simple.

Gallons per hundred miles would be more useful.

I guess it's a matter of approach. Europeans are traveling familiar, constant distances and worry about fuel cost. Americans just fill up their tank and worry how far they can go :)

If I buy 2 gallons of gas, and my car gets 30 mpg, then I can go 60 miles. Doesn't seem that hard to me. Need the other way around? I need to go 100 miles. At 30 miles per gallon, that's a little over 3 gallons. This is simple mental math.

> This 1.6nm process will put them around 230 MTr/mm2

Would it be that x2 (for front & back)?

E.g., 230 on front side and another 230 on back side = 460 MTr/mm2 TOTAL

BSPDN is not about putting devices on the front and back, the logic layer is still mostly 2D, it's about the power connects moving to the back of the chip so there's less interference with logic and larger wiring can be used.

Not understanding chip design - but is it possible to get more computational bang with less transistors - are there some optimizations to be had? Better design that could compensate for bigger nodes?

Yes, generally one of the trends has been movement toward specialized coprocessors/accelerators. This was happening before the recent AI push and has picked up steam.

If you think of an SOC, the chip in your phone, more and more of the real estate is being dedicated to specialized compute (AI accelerators, GPUs, etc. vs general purpose compute (CPU).

At the enterprise scale, one of the big arguments NVIDIA has been making, beyond their value in the AI market, has been the value of moving massive, resource intense workloads from CPU to more specialized GPU acceleration. In return for the investment to move their workload, customers can get a massive increase in performance per watt/dollar.

There are some other factors at play in that example, and it may not always be true that the transistors/mm^2 is always lower, but I think it illustrates the overall point.

JavaScript accelerator would probably half the power consumption of the world. The problem is just, that as soon as it would have widespread usage it would probably already be too old.

Reminiscent of the Java CPUs: Not even used for embedded (ironically the reason Java was created?). And not used at all by the massive Java compute needed for corporate software worldwide?

Weren't they used in Java Cards?

Basically, every single credit card sized security chip (including actual credit cards, of course) is a small processor running Java applets. Pretty much everyone has one or more in their wallet. I'd assume those were actual Java CPUs directly executing bytecode?

The design optimization software for modern semiconductors is arguably the most advanced design software on earth with likely tens if not hundreds of millions of man-years put into it. It takes into account not only the complex physics that apply at the nano-scale but also the interplay of the various manufacturing steps and optimizes trillions of features. Every process change brings about new potential optimizations, so rather than compensating for bigger nodes it actually widens the gap further. By analogy, the jump from hatchet to scalpel in the hands a layman is far less than the jump from hatchet to scalpel for a skilled surgeon.

In general, yes, but all the chip design companies have already invested a whole lot of time and engineering resources into squeezing as much as possible from each transistor. But those kinds of optimizations are certainly part of why we sometimes see new CPU generations released on the same node.

Everyone generally does that before sending the design to the fab.

Not to say that improvements and doing more with less are impossible, they probably aren't, but it's going to require significant per design human effort to do that.

Some yeah, but many of these optimizations aren't across-the-board performance improvements, but rather specializations that favor specific kinds of workloads.

There are the really obvious ones like on-board GPUs and AI accelerators, but even within the CPU you have optimizations that apply to specific kinds of workloads like specialized instructions for video encode/decode.

The main "issue", such as it is, is that this setup advantages vertically integrated players - the ones who can release software quickly to use these optimizations, or even going as far as to build specific new features on top of these optimizations.

For more open platforms you have a chicken-and-egg problem. Chip designers have little incentive to dedicate valuable and finite transistors to specialized computations if the software market in general hasn't shown an interest. Even after these optimizations/specialized hardware have been released, software makers often are slow in adopting them, resulting in consumers not seeing the benefit for a long time.

See for example the many years it took for Microsoft to even accelerate the rendering of Windows' core UI with the GPU.

I'd say they haven't been very competitive with AMD in performance per watt/dollar in the ryzen era, specifically due to process advantage. (On CPU dies especially, with less advantage for AMDs I/O dies.) I'd agree they have done a good job advancing other aspects of their designs to close the gap, though.

The metric means what it has always meant: lower numbers mean higher transistor density on some area (the digital part) of the chip.

What more do you want? It’s as meaningful as a single number ever could be.

If you’re a consumer - if you don’t design chips - lower number means there’s some improvement somewhere. That’s all you need to know. It’s like horse powers on a car. The number doesn’t tell you everything about the performance under all conditions but it gives a rough idea comparatively speaking.

If you’re a chip designer then you never cared about the number they used in naming the process anyway. You would dig into the specification and design rules from the fab to understand the process. The “nm” number might show up in the design rules somewhere.. but that’s never been relevant to anyone, ever.

I really don’t understand why so many commenters feel the need to point out that “nm” doesn’t refer to a physical dimension. Who cares? It doesn’t have any impact on anyone. It’s very mildly interesting at best. It’s a near meaningless comment.

what is a rough rule of thumb for each 1nm reduction? What does that increase in transistor density look like each year? If the jump is bigger (5nm->3nm), does that rate of change increase too?

trying to understand the economic impact of these announcements as I don't understand this topic well enough

Nope that's what it meant a long time ago. Nowadays, the nm number represents the smallest possible element on the chip, typically the gate length, which is smaller than the size of a transistor.

This means that when different manufacturers use a different transistor design, their 'nm' process could be the same but their transistor density different.

You have it the wrong way around. The nm number used to mean minimum gate length. But when moving to FinFET and other transistor innovations that provided increased density, they decreases the “nm” number AS IF the increased transistor density came from shrinking gate length with a traditional planar transistor.

The nm has always been a very rough approximate proxy for transistor density. Nothing has really changed about how what the number implies.

I find it so weird how so many people on hacker is so hung up on the “nm” numbering. None of these people are designing standard cell libraries and those that actually design standard cell libraries never cared about what’s in the marketing material anyway.

Lowe number means higher transistor density in some ways, on some parts of the chip. That’s all

Not even that. Those numbers just mean, "this 3D 2nm-ish process performs similar to a 2D 2nm process".

It's the same situation as when AMD started putting numbers in their Athlon XP SKUs to tell "this CPU performs just as good as intel of that frequency".

I, for one, think it makes sense. Because as I consumer I don't care about actual transited length or density, all I care about is: performance, efficiency, price. I can assume that going from TSMC N5 to TSMC N3 would result in an increase in efficiency and performance.

People that do care about those things (i.e. electrical engineers) most likely don't use marketing materials...

OP isn't wrong. Once the switch was made from 2D to 3D gates, transistor numbering became "this is the performance you would expect if you COULD shrink a gate to x nm", and while it's inaccurate it lets people understand how differentiated a new generation is from the previous ones. It also lets them stay consistent with understanding the progression of chips over time as in most of the history of silicon the generations were defined by the gates' nm sizing.

We should have switched to gates per mm² long ago. IMO that covers the 3D problem fine. Doesn’t cover variance in feature size by task though. If the gates for a multiplier or cache need more space per gate than a logical OR then the marketing guys still get a little victory over truth.

We could just divide total gates by surface area for the chip, but that would confuse the process efficiency with the implementation details and then we sacrifice instructions per second for a marketing number.

I don't see it as an issue if the marketing term maintains the original context:

"how large a planar FET should be for the transistor density used in this process"

Is this the case, I'm not entirely sure. If it is, there be a better unit for this measurement, absolutely.

PLANETS: PLANar Equivalent Transistor Size

Or hire a marketing team to figure a good one.

Think of them as "node metrics". 1.4 node metrics, not nanometers. They haven't referred to any distance in years, so make up your favorite backronym. Å can be Åwesomes, so 1.4 node metrics = 14 Åwesomes. Embrace nonsense, the chip makers did!

I have more fundamental concerns to talk with a theoretical physicist about first :D

Time to kick an ants nest here...

Quantum mechanics is a useless study that had no business being a scientific field. The most fundamental principles, like quantum superposition, can never be tested or validated and therefore could never be used to be predictive of anything other than similarly theoretical ideas.

Let me see if I understand what this is.

It used to be the actual feature size which was quadratically proportional to transistor density.

Once feature size stopped being meaningful they continued using it by extrapolating from transistor density, ie whenever they would quadruple density they'd half this massive number. This, therefore, includes innovations like finfets and other " 2.5D" technologies

Is that largely correct? Because it's gross.

it also seems to be a roughly proportional change in density so why would it be unhelpful?

Just because you can't point to a specific feature and say that is 1.6nm, doesn't make the label meaningless as so many try to assert. It is a label that represents a process that results in a specific transistor density. What label would you prefer?

That you think it needs to convey anything other than "better than the last version" suggests a fundamental misunderstanding of marketing. It's actually even kind of decent because it is actually approximately more dense as you might suspect given the numbers.

And it's not that much better than the previous 2nm node, so it seems like the number is just meant to be smaller than Intel 18A. I wish Intel in the switch to "A" would have at least returned to a physically meaningful measure. Then just maybe the industry could agree to use that for a while. Like 180A or whatever it needed to be - maybe just the smallest metal pitch or something.

Because it lost meaning somewhere between a micron and 100nm.

From roughly the 1960s through the end of the 1990s, the number meant printed gate lengths or half-pitch (which were identical).

At some point, companies started using "equivalences" which became increasingly detached from reality. If my 50nm node had better performance than your 30nm node because I have FinFETs or SOI or whatever, shouldn't I call mine 30nm? But at that point, if you have something less than 30nm somewhere in your process, shouldn't you call it 20nm? And so the numbers detached from reality.

So now when you see a 1.6nm process, it's think "1.6nm class", rather than that corresponding to any specific feature size, and furthermore, understand that companies invent class number exaggerations differently. For example, an Intel 10nm roughly corresponds to Samsung / TSMC 7nm (and all would probably be around 15-30nm before equivalences).

That should give you enough for a web search if you want all the dirty details.

The same thing happened with chip frequency around the end of the 1990s.

Chip frequencies stagnated (end of Dennard scaling if I remember correctly) giving the impression that single threaded performance had stagnated, but since then chip makers have used increasing data and instruction parallelism to squeeze even more apparent single threaded performance out of chips. A 3ghz chip today is usually way faster on average code than a 3ghz chip 15 years ago. They also started expanding into multiple cores and adding more cache of course.

For fab processes we should just switch to transistor density. That still wouldn't capture everything (e.g. power efficiency) but would be a lot better than not-actually-nanometers.

For performance we really don't have a good single metric anymore since all these performance hacks mean different levels of gain on different code.

"For fab processes we should just switch to transistor density."

Indeed

May I propose transistor density divided by the (ergodic) average transistor switching power?

> The same thing happened with chip frequency around the end of the 1990s.

Not really the same thing, though. If I buy a chip that's advertised as 3.4GHz, it'll run at 3.4GHz. Maybe not all the time, but it'll achieve that speed. If I buy a chip advertised as being produced with a 3nm process, there's nothing even resembling 3nm on there.

The problem is it sounds like something any engineer can understand without domain knowledge, but interpreting it that way is completely wrong. The worst kind of naming. Not just IKEA-style random names (and I say that as a Swede,) but reusing a standard, while not keeping to what the standard is normally used for, and what it previously meant even in this domain.

N1.6 is much better for naming node processes. Or even TSMC16.

Yes. By standard I mean that nanometers ("nm") is used to describe physical distances. That's it's normal use, understood by all engineers. That's also how it was born into the semiconductor domain. In that domain, it should have either stayed a description of physical distances, or been replaced.

They could invent codenames, or start using a better (physical) metric, if this one is no longer relevant.

I see MTr/mm2 a lot, it feels like a better measure, and it feels like the time is ripe for marketing to make a jump. Bigger is better, "mega" is cool, the numbers are hundreds to thousands in the foreseeable future so no chance of confusion with nm. What's not to love? But hey, I don't make these decisions, and I see no indication that anyone in a position to make these decisions has actually made them. Shrug.

It stopped being an actual measure of size long ago. The nm is t Nanometer anything it’s just a vague marketing thing attempting some sort of measure of generations

Intel is the one trying to catch up to TSMC, not vice versa!

The link you give doesn't have any details of Intel's 18A process, including no indication of it innovating in any way, as opposed to TSMC with their "backside power delivery" which is going to be critical for power-hungry SOTA AI chips.

While you are correct that it is Intel trying to catch TSMC, you are wrong about the origin of backside power delivery. The idea originated at Intel sometime ago, but it would be very ironic if TSMC implements it before Intel...

Intel is not the inventor of backside power, they are the first planning to commercialize it. It's similar to finfets and GAA where Intel or Samsung may be first to commercialize an implementation of those technologies, but the actual conceptual origin and first demonstrations are at universities or research consortiums like IMEC. Example Imec demonstrating backside power in 2019 https://spectrum.ieee.org/buried-power-lines-make-memory-fas... far before powerVia was announced.

Intel has been out of the game for so long. Their deadlines are just PR speak, in reality, they'd definitely run into Road blocks.

Not saying TSMC won't, but they have so much more experience in the cutting edge.

Except for backside power. Intel published and had a timeline to ship at least one generation ahead of TSMC. I haven’t been tracking what happened since, but Intel was ahead on this one process improvement. And it’s not a small one, but it doesn’t cancel out their other missteps. Not by half.

ELI5:

ICs are manufactured on silicon disks called wafers. Discs have two sides, and traditionally, everything was done on top. We can now do power on the bottom. This makes things go faster and use less power:

* Power wires are big (and can be a bit crude). The bigger the better. Signal wires are small and precise. Smaller is generally better.

* Big wires, if near signal wires, can interfere with them working optimally (called "capacitance").

* Capacitance can slow down signals on the fine signal wires.

* Capacitance also increases power usage when ones become zeros and vice-versa on signal wires.

* Big wires also take up a lot of space.

* Putting them on the back of the wafer means that things can go faster and use less power, since you don't have big power wires near your fine signal wires.

* Putting them in back leaves a lot more space for things in front.

* Capacitance between power wires (which don't carry signal) actually helps deliver cleaner power too, which is a free bonus!

This is hard to do, since it means somehow routing power through the wafer. That's why we didn't do this before. You need very tiny wires through very tiny holes in locations very precisely aligned on both sides. Aligning things on the scale of nanometers is very, very hard.

How did I do?

There's still the question though of why they didn't do this decades ago - seems very obvious that this layout is better. What changed that made it possible only now and not earlier?

My knowledge isn't current enough to offer more than speculation.

However, something like an 80286 didn't even require a heatsink, while my 80486 had a dinky heat sink similar to what you might find on a modern motherboard chipset. At the same time, on a micron node, wires were huge. A few special cases aside (DEC Alpha comes to mind), power distribution didn't require anything special beyond what you'd see on your signal wires, and wasn't a major part of the interconnect space.

Mapping out to 2024:

1) Signal wires became smaller than ever.

2) Power density is higher than ever, requiring bigger power wires.

So there is a growing disparity between the needs of the two.

At the same time, there is continued progress in figuring out how to make through-wafer vias more practical (see https://en.wikipedia.org/wiki/Three-dimensional_integrated_c...).

I suspect in 2000, this would have been basically restricted to $$$$ military-grade special processes and similar types of very expensive applications. In 2024, this can be practically done for consumer devices. As costs go down, and utility goes up, at some point, the two cross, leading to practical devices.

I suspect a lot of this is driven by progress in imagers. There, the gains are huge. You want a top wafer which is as close as possible to 100% sensor, but you need non-sensor area if you want any kind of realtime processing, full frame readout (e.g. avoiding rolling shutter), or rapid readout (e.g. high framerate). The first time I saw 3D IC technology in mainstream consumer use were prosumer-/professional-grade Sony cameras.

I have strong fundamentals, but again, I stopped following this closely maybe 15 years ago, so much of the above is speculative.

> seems very obvious that this layout is better

"Better" is relative, the layout introduces more fabrication steps so it's only better if you actually get some benefit from it. Decades ago designs didn't require as much power or have as many transistors to wire so it wasn't an issue.

> why they didn't do this decades ago

You might as well ask why, since we can do it now, Shockley didn't simply start at 3nm. It's all a very long road of individual process techniques.

> You need very tiny wires through very tiny holes in locations very precisely aligned on both sides.

Key word here is "both sides". It has challenges similar to solder reflow on double sided boards: you need to ensure that work done on the first side isn't ruined/ruining work on the second side.

https://semiwiki.com/semiconductor-services/techinsights/288... seems to be a good description.

"The challenges with BPR are that you need a low resistance and reliable metal line that does not contaminate the Front End Of Line (FEOL). BPR is inserted early in the process flow and must stand up to all the heat of the device formation steps."

Contamination = metals used musn't "poison" the front-side chemistry. So they end up using tungsten rather than the more usual aluminium. (Copper is forbidden for similar chemistry reasons)

It also (obviously) adds a bunch of processing steps, each of which adds to the cost, more so than putting the rails on the front side.

I think it's more that it's only necessary now. It's such a pain to make that you'll only do it given no other option. A lot of what modern processes are going was possible ages ago, but it was basically always better (in terms of ease for performance gain) to go smaller instead. Now that smaller doesn't help so much, you can see them pulling out all the other tricks in the book to try to improve (and costs going up as a result).

> You need very tiny wires through very tiny holes in locations very precisely aligned on both sides. Aligning things on the scale of nanometers is very, very hard.

Do you need to align that precisely? Can't the power side have very large landing pads for the wires from the signal side to make it much easier?

The wafer is thick. Let's call it a mm thick (not quite, but close). Devices are tiny. The claim is 1.6nm, which isn't quite true, but let's pretend it is, since for the qualitative argument, it doesn't make a difference. That's on the order of a million times smaller than the thickness of the wafer.

Historically, everything was etched, grown, deposited, and sputtered on one side of the wafer. The rest of the wafer was mostly mechanical support. The other side of the wafer is a universe away.

The world is more complex today, but that's a good model to keep in mind.

For a 3d integrated circuit, you would do this, and then e.g. grind away the whole wafer, and be left with a few micron thick sheet of just the electronics, which you'd mechanically place on top of another similar sheet. That's every bit as complex as it sounds. That's why this was restricted to very high-end applications.

As for whether the wafer is a huge ground plane, that's complex too, since it depends on the top of the device and the IC:

* First, it's worth remembering a pure silicon crystal is an insulator. It's only when you dope it that it becomes a conductor. The wafer starts out undoped.

* Early ICs had the whole wafer doped, and the collector of all the NPN transistors was just the wafer. There, it was a ground plane.

* SOI processes deposit a layer of glass on top of the wafer, and everything else on the glass. There, the wafer is insulated from the circuit.

So all of this can very quickly go in many directions, depending on generation of technology and application.

I'm not sure this post is helpful, since it's a lot of complexity in an ELI5, so I'll do a TL;DR: It's complicated. (or: Ask your dad)

Ergo, the TL;DR :)

Even so, I oversimplified things a lot (a lot of the processes to leverage the silicon wafer, but some don't):

https://en.wikipedia.org/wiki/Silicon_on_insulator

One of the things to keep in mind is that a silicon wafer starts with a near-perfect silicon ingot crystal:

https://en.wikipedia.org/wiki/Monocrystalline_silicon

The level of purity and perfection there is a little bit crazy to conceive.

It's also worth noting how insanely tiny devices are. A virus is ≈100nm. DNA is 2nm diameter. We're at << 10nm for a device. That's really quite close to atomic-scale.

There are something like ≈100 billion transistors per IC for something like a high-end GPU, and a single failed transistor can destroy that fancy GPU. That's literally just a few atoms out-of-place or a few atoms of some pollutant.

The level of perfection needed is insane, and the processes which go into that are equally insane. We are making things on glass, but the glass has to be nearly perfect glass.

The demand for perfection is mildly ameliorated by having a few redundant circuits and microcode and then a software layer that can detect and workaround defects. If your GPU loses one if its 1000 integer math units, it doesn’t die, it just slows down for an operation that might otherwise use that one.

>> There are something like ≈100 billion transistors per IC for something like a high-end GPU, and a single failed transistor can destroy that fancy GPU.

No, it can't thanks to this fancy marketing strategy where you sell faulty GPUs at lower price, as lower-tier model.

can is important there. Not all failures can be masked off. And this only makes the slightest of dent in the level of reliability you need in making any given transistor.

How many power layers are there? Or how do the power "wires" cross over each other?

Are they sparse, like wires? Or solid, like the ground plane of a PCB? Are there "burried vias"?

thanks, I've fixed them

btw, you appear to be shadow banned, probably on account of being downvoted for making short comments like "good" or "I don't get it"

Short & simple responses may be adequate at times, but usually people will view it as not adding anything

Perhaps reviewing guidelines https://news.ycombinator.com/newsguidelines.html will give an idea of the ideal being yearned for (granted, it's an ideal). In general, trying to enhance the conversation with relevant information/ideas. "your links are 404" was definitely relevant here

Their new posts will be dead. You can enable seeing dead posts in settings. Then if you click on their account & view their comments, it'll all be grayed out

I vouched for barfard's comment in this thread

instead of signals and power path going through the same side (frontside) causing all sorts of issues and inefficiency, they're decoupling where power is coming from (from the other, backside, err side).

More importantly, intel saw it as one of two key technologies of them moving into angstrom era, and was touting itself they'll be the first one to bring it to life (not sure they did).. so this seems to be more of a business power move.

more on all of it from anandtech: https://www.anandtech.com/show/18894/intel-details-powervia-...

So, if I understood that correctly, "frontside" actually is the top side, and "backside" is actually the bottom side. They're delivering power from the bottom (substrate) side, and connecting the data on the top side.

Half your wires deliver power, half deliver signal. So if you do both on the same side, you need twice the density of wires. If you split the delivery into two parts, you get double the density without needing to make things smaller.

This isn't quite right. Big wires (ideally entire planes) deliver power. Small wires deliver signal. Half and half isn't the right split, and you don't want to make power wires smaller.

The very different requirements of the two is where a lot of the gains come in.

True! I went a little far in the name of 'eli5'. I think it roughly holds that you gain about a factor of 1.5 in routing density by removing the power distribution, so you can relax some critical patterning. But I havent looked closely in a long time.

It is going to get even a bit more interesting when you consider power gaters and virtual power supplies. Now the real power will be on the back side and the virtual power will be on the front side. Fun time for power analysis.

I am not sure I understand backside in this instance and the illustration in the article didn't entirely help.

In general, at least in older time, one side of the CPU has all the nice pins on it, and the motherboard has a pincushion that the pins match nicely. At the top of the CPU you put a HUGE heatsink on it and off you go.

In this configuration the power delivery must be via the pincushion, through some of the pins.

Intuitively that sounds to me like the power is coming in the backside? But given that it is a big deal I am missing something.

Is the power fed from the "top" of the cpu where the heatsink would sit?

A16 in 2027 vs Intel's 18A in full swing by 2026 feels like a miss on TMSCs behalf. This looks like an open door for fabless companies to try Intel's foundry service.

The "Hollywood accounting" transistor density aside, I think a new metric needs to become mainstream: "wafers per machine per day" and "machines per year manufactured".

Getting these machines built and online is more important than what one machine (that might be less than 6 per year) can do.

The information, I'm sure, is buried in their papers, but I want to know what node processes are in products available now.

> This technology is tailored specifically for AI and HPC processors that tend to have both complex signal wiring and dense power delivery networks

Uh?

I imagine it's because AI and HPC processors are typically utilized much more fully than your regular desktop processor.

A typical desktop CPU is designed to execute very varied and branch-heavy code. As such they have a lot of cache and a lot of logic transistors sitting idle at any given time, either waiting for memory or because the code is adding not multiplying for example. You can see that in the die shots like this[1] for example. I imagine the caches are relatively regular and uniform and as such as less complex signal wiring, and idle transistors means lower power requirements.

AI and HPC processors are more stream-oriented, and as such contain relatively small cachees and a lot of highly-utilized logic transistors. Compare the desktop CPU with the NVIDIA A100[2] for example. Thus you got both complex wiring, all those execution units needs to be able to very quickly access the register file, and due to the stream-oriented nature one can fully utilize most of the chip so a more complex power delivery network is required.

edit: Power delivery tracks can affect signal tracks due to parasitic coupling if they're close enough, potentially causing signals to be misinterpreted by the recipient if power usage fluctuates which it will do during normal operation (if say an execution unit goes from being idle to working on an instruction, or vice versa). Thus it can be challenging to fit both power and signal tracks in close proximity.

[1]: https://wccftech.com/amd-ryzen-5000-zen-3-vermeer-undressed-...

[2]: https://www.tomshardware.com/news/nvidia-ampere-A100-gpu-7nm

There are certain trade offs when designing process. TSMC researched with partners how AI and HPC (high performance computing) chips most likely will look like and adjusted process accordingly.

In fact this is big deal as until recently the processes were more tailored toward mobile application (that is were trading some switching performance for lower power consumption). Look like we are back in 2000s when speed/density is again more important than power consumption.

（评论） (comments)

（评论）
(comments)