Expanding PCOLA for 3D Die Stacks

Testing 3D chips | JTAG IJTAG and IEEE P1838 โ€“ Part 3 of a three-part series

The first and second blogs in this series discussed the PCOLA/SOQ/FAM test methodology for two-dimensional (2D) circuit boards and described how 3D stacked die devices are currently being tested. From a test perspective, 3D die stacks resemble circuit boards in many respects. This blog, the final in this three-part series, brings the discussion full circle by describing where the PCOLA methodology can help test 3D devices and where it needs to be supplemented.

Previously, I have mentioned that test and debug standards should be able to test three aspects of a 3D die stack:

  • Each die in the stack
  • The stack itself
  • The stacking process

More than one standard will likely be needed to accomplish all of this testing.

Before we get into how to apply PCOLA/SOQ/FAM to a 3D stack, letโ€™s take a quick look at 3D die stacks and 2D circuit boards to identify what is the same about each of them and what is different. But first, I need to define a few things.

To begin with, a 3D die stack has a bottom or foundational die, which is meant to connect to the pins of the package, which in turn is connected to a circuit board. On the surface that interfaces to the package, this die looks like a standard 2D die with bond pads to connect to the chip package. On the upper surface, the lowest die in a stack will have microbumps for connecting to the die above it. Some of these foundational die may be true 3D die in that they have microbumps on both their upper and lower surfaces because they will be connected to an interposer that will be connected by bond wires to a chip package. (Figure 1)

AL-3D_blog_Part3_Image1
 Figure 1: The stacking process

Aside from the bottom die, each die in a stack has a bottom surface, which the IEEE P1838 standard for 3D chip test calls the primary side. The upper or top surface is referred to as the secondary side. These die will have microbumps on both surfaces. Some will include through-silicon vias (TSV) from the primary to the secondary surfaces. Other such die may integrate logic for capturing and driving from/to the microbumps on the two surfaces.

Some of the microbumps on the surfaces of a 3D die are dedicated to clocks and data/control signals, while others are for powers and grounds. Of course, the microbumps on a die will be configured differently to meet the needs of a particular stack. So, the size of bumps will differ, as well as their pitch or spacing on a die and their location on the die. For example, the microbumps could be located in the center of a square die, be sized at 10 microns and have a 20-micron pitch. Another somewhat common example involves a larger die which is meant to connect to four smaller memory die above it. In this case, the microbumps on the larger die would be located in the four corners. They could be 20-micron bumps with a 30-micron pitch.

In many cases, die stacks are similar to circuit boards, which are populated with chips with JTAG interfaces that are connected in a daisy chain scan path. Each chip on this JTAG scan path receives power and ground in addition to signal traces connecting one chip to another as needed. At the board level, some chips are often connected to a bus structure so the chips on the bus can communicate. In a die stack, each or several die could have a JTAG interface and be connected serially to a daisy-chain of sorts. (Note that the die in a stack are โ€˜parkedโ€™ when they are not being accessed via JTAG. They are not placed in a serial bypass mode.)

3D die stacks may also have interconnects between abutting die as well as other interconnects that travel through die without interacting with the die that they travel through. In this way, any die in the stack may connect directly to any other die in the stack. In addition, bus structures may traverse many die such that some or all of the die through which the bus passes are able to access it.

One of the subtle differences between circuit boards and die stacks is the limitation placed on power delivery. The entire stack must be provided power through the limited number of power pins on the package that connects to the board. So, there may be some slight differences, but in my mind, a 3D stack is both a chip and a board-in-a-package.

As for the stacking strategy itself, the stack should follow simple geometric rules so that the die with the largest 2D footprint is on the bottom and smaller die are stacked on top. This would make a stack slightly tapered toward the top, resembling a four-sided pyramid of sorts. Inverting the pyramid so that slightly smaller die are on the bottom could generate stresses that would result in cracked or broken die when they are placed in a package. Stacking the largest die on the bottom also provides the greatest number of signal interfaces for the device because there will be more pins on the package.

Not all die stacks are made up of similarly sized die. Some stacks are complex hybrids. For example, a stack could have a large die on the bottom level in slot 1. This might be a microprocessor with four cores. Slot two could be made up of four die, each die consisting of cache memory for each of the four cores below. Over these four die in slot three, a single large flash memory die could be placed and so on. As mentioned previously, the die making up a stack can be homogeneous (all the same die and from the same provider) or heterogeneous (different die from different providers).

Back to PCOLAโ€ฆ

Now letโ€™s consider what can be appropriated from the PCOLA methodology to test 3D devices as well as a few critical security issues.

So, if we wanted to apply PCOLA to 3D devices, the question would become which features should be designed into die so that PCOLA could be applied not only to each die individually, but also to the die stack as an entity in itself. I wholeheartedly support this line of thought. Test features should be designed into die so that the PCOLA method can be applied to test, debug and diagnose problems, errors, or defects in die and 3D stacks.

Letโ€™s take a closer look at each of the aspects that comprise the PCOLA test method so we can determine how the methodology might be applied to 3D stacked devices or how it might be modified to offer even better test coverage.

1)     Presence (P) โ€“ When applied to a 3D stack, this board-level metric resolves into three verification/test questions: 1.) Are the correct die present? 2.) Are the die in the stack in the correct order or right position? 3.) Are the right number of die present in the stack? Determining the presence/position of die would require some sort of documenting process. That is, each die would need to have a unique identification (ID) code that could be accessed by the test method. If we were to express the โ€˜Pโ€™ in PCOLA in shorthand, we could say it now consists of P2+N, or presence/position plus number.

2)     Correctness (C) โ€“ At the board level, this metric verifies whether the correct chip is in the correct socket on the board. This is required to test a 3D stack as well, but C is subsumed into the P metric described above when the ID code for each die is accessed to determine whether the die in the stack are correct. Again, this capability would require a documenting of the stackโ€™s make-up and a test to verify that each die in the stack is the proper die in the proper order.

3)     Orientation (O) โ€“ The orientation of die in a 3D stack has more possible misplacement errors than a chip on a circuit board. On a board, the orientation of a chip could be off by 90, 180, 270 degrees or some other angle. In a 3D stack, a die can be off-angle or upside down. In our shorthand, we might refer to this as โ€˜Fโ€™ for flipped or flipped-over. The worst case scenario usually occurs when the top and bottom sides of certain die are visually indistinguishable. For example, the microbumps on both sides of a die could have the identical size, pitch and locations. Another example might be when the dieโ€™s TSVs are located right in the middle of the die. When this is the case, it is certainly easier for a die to be mistakenly flipped, mis-oriented or off-angle in the stack. A flipped or mis-oriented die would cause the test process for the stack to fail immediately because the JTAG interface signals would be linked to the wrong TSVs, rendering the connection discontinuous. The O in PCOLA for 3D devices then becomes O+F.

4)     Live (L) โ€“ At the board level, L represents whether a chip is live or receiving adequate power. Applying the L metric to a 3D device is actually very subtle. One of the difficulties with 3D TSV configurations is the fact that power rails must be delivered vertically to each die. Some power rails may occupy dedicated TSVs which provide power directly from the circuit board to the target die. Other power rails may be distributed through shared TSVs where the TSV is routed through several die while providing power to each. That a die is not receiving any power may be attributed to any of the P2, C, O, F or A metrics.

However, in some cases a die may not be receiving enough power. To ascertain this would require a test to verify max power on each die as well as testing power distribution vias, including shared vias. For example, one such problem could arise when a lower die is leaching the power intended for from an upper die. Another important metric for a 3D device relating to power consumption is heat generation or dissipation. Heat handling requirements are often referred to as the deviceโ€™s thermal response or thermal limit. In our shorthand then, the L in PCOLA becomes L+T, where T represents thermal or thermal limiting.

5)     Alignment (A) โ€“ Verifying die alignment means that the bumps on any two die in a stack are connected center-to-center. Many believe that verifying alignment on a 3D stacked device will require the same types of technologies that are employed on 2D boards, such as X-ray inspection or other structural inspection technologies that may include sonogram, sonar or thermal imaging. Misalignment can be a real problem for device yield rates. Bumps that are off-center by 1/3, 1/2 or 3/4 of a bump can result in open TSVs or shorted/bridged TSVโ€™s, which relates to the S and O in SOQ.

6)     Security (S) โ€“ Security experts have expressed concern that 3D chip fabricators, or stackers as they are called, may unknowingly or knowingly build die stacks from counterfeit die. The stacker is a new member of the supply chain. They obtain die from different sources, such as distributors, chip manufacturers and others. It is very difficult for anyone to know whether die are counterfeits or copies without explicitly verifying the trustworthiness of the die.

In addition, the presence of a test port such as JTAG, IJTAG, SPI, I2C, or a debug port that accesses embedded instruments and internal content raises the question of whether the chipโ€™s content can be protected from unwarranted snooping or operation. Protecting this content will require data security (encryption), hardware locks and keys, challenge-and-response engines, and other known security techniques. Some security experts recommend that security measures must be in place for each die in a stack, for portions of a stack and for the whole stack as a unit. Security can be further subdivided into specific areas of concern, such as counterfeit detection (Co); Trojan or malicious content (Tr) detection; data protection (Dp); reverse engineering (Re) protection; embedded instrument (Ei) or embedded content protection; denial of service (Ds) protection and others.

So, in my mind, there are a few more letters we need to add to the PCOLA/SOQ/FAM/I methodology so that it can be applied to 3D die stacks. At the very least, it should be more like P2NCOFLATS (Presence, Position, Number, Correctness, Orientation, Flipped, Live, Alignment, Thermal, Security). (Table 1)

Some may suggest moving Thermal to the Functional Connections category (FAM in Table 1) along with verification of Features, At-Speed Operation and the ability to take a Measurement because verifying Thermal limits may involve a measurement or metric of sorts. In addition, Security is not limited to the Structural Device or die category. It may also extend to the Structural Connections category since any of the many TSVs in a typical 3D device may be hijacked and provide a security leak.

AL-3D_blog_Part3_Image2

Table 1: Expansion of PCOLA for testing 3D die stacks.

I hope this three-blog series has shown that verifying, validating, characterizing and testing 3D stacked die devices is complex. The industry will certainly be challenged in the years ahead to develop cost-efficient standards which overcome the difficulties Iโ€™ve pointed out.

NOTE: Al Crouch is a member of ASSETโ€™s consultation team. If you think Alโ€™s expertise would be helpful to you, click here for more information.