Introduction to Failure Analysis

Today’s electronic chips are a complex composition of devices intricately connected to form a highly-functioning unit. The devices or integrated circuits include simple resistors, capacitors, inductors, and diodes coupled with the more complicated transistors. To give an idea of the complexity, the chips running today’s advanced processors contain billions of transistors at a density of nearly 300 million transistors per square millimeter. At these densities, the heat density starts to approach that of the sun. The performance of integrated circuits at an ever-shrinking scale is truly amazing.

The fabrication of these integrated circuits (ICs) involves multiple layers of materials that are often controlled in three dimensions down to atomic dimensions. Those advanced processors containing billions of transistors may involve hundreds of processing steps and many different carefully crafted materials. A completed wafer may be worth hundreds of thousands of dollars, so it is no wonder that yields of working chips are key metrics. Equally important is understanding how these chips fail as they control critical systems in areas such as aviation, automation, medical, and communication.

Between the required critical dimension, many materials (e.g., dielectrics, metals, semiconductors, polymers), and the countless fabrication processes (e.g., etching, deposition, oxidation, heat treatment, implantation, cleaning) there are ample failure modes. Trying to identify the source of a failure in this webbed, highly-complex electrical network requires specific strategies, equipment, and expertise.

So, how do you identify the source of a failure? Failure analysis is how semiconductor manufacturers peel away the complex layers of the electrical “onion” that is an IC chip to identify and remediate device failures.

Understanding Failure Analysis

The first step in any failure analysis project is to partition or isolate the "general" source of the failure. If the failure is happening in one specific system, start by isolating that system. Later, using a variety of investigative failure testing equipment, you can identify the particular IC causing the fault. Integrated circuits can range from only a few device components up to the billions mentioned earlier and thus failure analysis efforts can be simple or very complicated.

Due to the complex and intricate nature of semiconductors, failures in IC occurs for a variety of reasons. Some reasons may be as simple as short and open circuits but most are more difficult to detect. Failures may be subtle and manifest themselves as differences in device performances that simply do not meet specifications. For example, operating at the wrong voltage, current, or frequency, or even more indirect and only occur during certain operations or in specific environments (e.g., hot, cold, vibration). There is also the issue of determining whether the failure is a result of a design issue or the result of a fabrication issue. In all these cases, failure analysis is the investigative technique to uncover the reasons for the problems.

Fabrication of more advanced ICs typically use what is known as Process Control Monitors (PCM) or test structures that are used to evaluate various process steps during chip manufacturing. While helpful, these PCMs cannot represent everything that has happened to either individual devices or predict the performance of the completed chip. The delicate and complex nature of both the devices and the chip prevents thorough testing. The final chip can be probed as part of quality control but if chips that fail, either at this stage or a later stage when they have been operating, will need a failure analysis.

Failure analysis can be broken down into three core steps:

  • Identifying the failure type
  • Identifying the failure source
  • Identifying the failure remediation

Identifying the Failure Type

Searching for the failure type may start with electrical probing and looking for specific results. The results of probing may provide some evidence of what has gone wrong and then help narrow the search for the specific cause. That pursuit may require multiple types of tools or identification strategies.

Some failures are maybe clearly visual such as a crack in the integrated chip package. But simply discovering a crack leads to more questions such as how or why did the chip package crack. For example, did a cracked encapsulation layer cause the problem, or was there another root cause that resulted in encapsulation cracking? Was the encapsulation material the problem or perhaps an abnormal heating problem during chip operation? And this begins the detailed detective work using a variety of techniques. Some of those techniques are noninvasive such as Scanning Acoustic Microscopy (SAM) and X-ray Imaging but others may require more dramatic exploration to find the failure origins.

Identifying a Failure Source

Some methods of failure identification may require a form of reverse-engineering, using wet chemistry or plasma-based processes to expose the chips layers one by one. The goal is to relatively gently uncover the layers without inducing new damage or artifacts. It should be noted, even some testing methods (e.g., bench testing, curve tests, etc.) can cause damage to the device, so it's important to approach device testing carefully and methodically.

The first step is to de-encapsulate the chip. As encapsulation is part of the package and is meant to serve as protection, it is usually thick. Often a combination of gentle grinding to remove the bulk of material followed by plasma is used to reveal top active layer of the chip. Once the encapsulation has been removed, the slow step-by-step, layer-by-layer work is begun. Plasma etching is typically the method of choice to uncover the various layers, often due to the etch rate controllability and the capability to selectivity etch the many materials used in integrated chips. Focused ion beam (FIB) milling is sometimes utilized to generate cross-sections that can be analyzed in detail.

Depending on the results from identifying the failure type, different attention will be given to each of the layers. Microscopy, electrical testing, and a variety of materials analysis techniques such as Auger Emission, Secondary Ion Mass, and Laser Ionization Mass, and Fourier Transform Infrared Spectroscopies (AES, SIMS, LIMS, FTIR). It is crucial to select equipment that can provide the most damage-free delayering process while maintaining structural integrity possible so the metrology is not deceived.

Identify the Failure Remediation

Once the failure point is identified, the next step is to ensure the failure point observed is indeed the root cause of the issue. Identifying the root cause is the single most important part of the failure analysis process. The incredible scale, density, and complexity of integrated circuits makes it easy for root causes to escape detection. Failures anywhere in the circuit may cause failures at other points. Once the root cause is detected then the challenge shifts to taking action to mitigate it. Solutions may involve circuit design revisions or changes to the fabrication process flow. Perhaps one of the many fabrication steps has strayed beyond the acceptable range and a more robust process is needed.

The failure analysis process is more than simply quality control. It becomes fuel for product improvement and increasing yield – both hopefully contributing to the success.

CORIAL Fuels Failure Analysis with Best-in-Class Equipment

At CORIAL, we provide world-class plasma processing equipment like the Corial 200FA etching system to help you perform failure analysis. Our etching systems come with the precision, delicacy, and capabilities to de-encapsulate complex ICs while keeping electrical and planar integrity intact. Contact us to learn more about CORIAL's line of plasma etching equipment.

CORIAL plasma etching  white paper