FTA | Fault Tree Analysis | Quality-One

How to Perform Fault Tree Analysis (FTA)

As previously mentioned, the FTA is a logical breakdown from the Top-level undesired event, cascaded to the Base-level event (root cause). Each path has a probability assigned. The paths related to the highest severity / highest probability combinations are identified and will require mitigation. Starting at the Base-level event (at the bottom of the FTA) and working the path up to the undesirable Top-level event is called a Cut Set. There are many cut sets within the FTA. Each has an individual probability assigned to it. The Base-level event is often color coded to identify the risk level indicated.

The 5 basic steps to perform a Fault Tree Analysis are as follows:

  1. Identify the Hazard
  2. Obtain Understanding of the System Being Analyzed
  3. Create the Fault Tree
  4. Identify the Cut Sets
  5. Mitigate the Risk

Step 1: Identify the Hazard

Knowing the consequence of the failure is useful in defining the Top-level event of the Fault Tree. The Top-level event, or Hazard, should be defined as precisely as possible:

  • How much?
  • How long (duration)?
  • What is the safety impact?
  • What is the environmental impact?
  • What is the regulatory impact?

Step 2: Obtain Understanding of the System Being Analyzed

  • Create or acquire appropriate support information:
    • List of components (Bill of Material)
    • Boundary Diagram
    • Schematic
    • Code Requirements
    • Engineering Noises and Environments
    • Examples of similar products or failures
  • List the potential causes of the hazard to the next level. This is similar to the 5 Why process, except development of a Fault Tree should be focused on a single level before progressing to the next.
    • Include system design engineers, who have full knowledge of the system and its functions, in the higher levels of the Fault Tree Analysis. This knowledge is very important for cause selection.
    • Include Reliability Engineers who can assist in developing the relationships of causes to a failure or fault.
  • Estimate probability of the causes at the Base-level event
  • Label all causes with codes (optional)
  • Prioritize or sequence causes in the order of occurrence or probability

Step 3: Create the Fault Tree

In the FTA example to the right, the team would stop the analysis on “Air Present” because Oxygen presence is outside of the control of the team developing the FTA.

Analysis continues down to the next level on “Fuel Leak”. The team performing the FTA is brought together to focus on the potential causes of fuel leaks. The analysis is not limited to mechanical failures alone. The inclusion of electronics and software in complex design brings both the opportunity to create or mitigate failures. The risks may be prevented through engineering choices or controlled through Quality Control.

The example tree continues to additional, more detailed levels. The Base-level event (depicted as a circle or oval) is the point at which the team can address the risk.

The Base-level event is typically color coded as follows:

  • Red: Critical Risk
  • Orange: High Risk
  • Yellow: Minor Risk
  • Green: Acceptable / Very Low Risk

Step 4: Identify the Cut Sets

  • Risk is estimated for each event
    • When available, the failure rate data can be used to calculate the risk of a single chain or the many chains
    • If there is no data, an estimate is established based on subjective guidelines similar to those used in FMEA development
  • The Cut Sets with risk greater than the system can tolerate (i.e. safety or inoperative conditions) are selected for mitigation
  • Actions are required for Critical (red) and High Risks (orange)

Step 5: Mitigate the Risk

Risk Mitigation can take many forms. A popular method is to use the criticality method. Other techniques require a level of mitigation calculated to Defects per Million Opportunities (DPMO). Safety systems may require resulting risk to be mitigated to:

  • Error Proofing (cannot Occur)
  • 1 in 10 million (1 X 10 to the minus 7)

Action logs and revision records are kept for follow-up and closure of each undesirable risk. Any risk not mitigated to an acceptable level is a candidate for Mistake Proofing or Quality Control, which protects the consumer from the risk.

Examples of Mitigation Strategies

When a risk is unacceptable the team may have several options available. The following are a few examples of the options available:

  • Design change
  • Selection of a component with a higher reliability to replace the Base-level event component
    • This is often expensive unless identified early in Product Development
  • Physical Redundancy of the Component
    • This option places the redundant component in parallel to the other. Both must fail simultaneously for the hazard to be experienced. If a safety issue exists, this option may require non-identical components.
  • Software Redundancy
    • The addition of a sensing circuit, which can change the state of the product, often reduces the severity of the event by protecting components through duty cycle changes and reducing input stresses when identified.
  • Warning System
    • The circuit may just warn of an event. This requires action by an operator or analyst. It is important to note that if this course of action is taken, Human Factors Reliability must also enter the evaluation.
  • Quality Control
    • This may include removal of the potential failure through testing or inspection. The inspection effectiveness must match the level of severity that the hazard may impose on the consumer.