Failure Mode and Effects Analysis (FMEA) – How to Analyze Risks for Better Software Quality & Satisfied Customers!

By Swati

By Swati

I’m Swati. I accidentally started testing in 2004, and since then have worked with at least 20 clients in 10 cities and 5 countries and am still counting. I am CSTE and CSQA certified. I love my job and the value it adds to software…

Learn about our editorial policies.
Updated June 23, 2024
Edited by Vijay

Edited by Vijay

I'm Vijay, and I've been working on this blog for the past 20+ years! I’ve been in the IT industry for more than 20 years now. I completed my graduation in B.E. Computer Science from a reputed Pune university and then started my career in…

Learn about our editorial policies.
Reviewed by Kamila

Reviewed by Kamila

Kamila is an AI-based technical expert, author, and trainer with a Master’s degree in CRM. She has over 15 years of work experience in several top-notch IT companies. She has published more than 500 articles on various Software Testing Related Topics, Programming Languages, AI Concepts,…

Learn about our editorial policies.

Failure Mode and Effects Analysis (FMEA) is a Risk Management technique.

If implemented properly, this can be a great addition to the best Quality Assurance processes to be followed. In this article, our goal is to introduce you to this Risk Analysis technique which in the end, is very useful for improving the software quality.

Failure Mode and Effects Analysis (FMEA)

Failure Mode And Effects Analysis

FMEA is mostly used by upper management or stakeholders. In practice, the testers get little insight into this technique. But now the trend is changing and I feel if the testers understand this concept properly, they can drive their thought process of writing test cases

to one level up by utilizing this technique to:

  • Understand the stakeholder’s goals of testing the application.
  • Understand the business.
  • Derive the high-level test scenarios based on business and management interest.
  • Derive effective test cases that provide better coverage to the risk-prone areas.
  • Prioritize the test cases.
  • Decide what to test and what to defer at any phase.

Background

RISK ANALYSIS is a crucial aspect of Test Management. The question then arises – What is Risk Analysis? And why is it important? To understand this, it is vital to understand – what is RISK?

risk analysis

See Also => Types of Risks in Software Projects.

RISK as its literal meaning is a possibility of a negative or undesirable outcome or event. Risks if not handled or managed properly may lead to poor quality, unsatisfied customers and sometimes loss of business.

Risk has 2 attributes:

  • Probability
  • Impact

Probability means the chances of a particular risk to occur and impact means the extent of the effect of the risk.

What Is Risk Analysis?

Risk Analysis is a mechanism by which the identified potential risks are analyzed & studied thoroughly in order to find the probability and impact. It is advisable to measure the two attributes and based on the result we identify:

  • What to test first?
  • What to test more?
  • What not to test(This time)?

There are many methods of doing the Risk Analysis and they are broadly classified into two types:

  • Informal Techniques: These are based on experience, judgment, and intuition.
  • Formal Techniques: Identifying and weighing risk attributes.

Failure Mode And Effects Analysis (FMEA): This is a formal method of doing a Risk Analysis. In the following sections, I will be discussing more on FMEA and try to elaborate it with the example.

FMEA is a formal technique of doing Risk Analysis. It is a systematic and quantitative tool in the form of a Spread Sheet that assists the members to analyze what might get wrong. To do the FMEA we require the right people on the table. It requires a representative from all walks of the industry including customers.

Description

FMEA starts and continues with Brainstorming sessions. Participants need to identify all the components, modules, dependencies, limitations that could fail in a production environment and eventually lead to poor quality, reliability and may result in loss of business.

During FMEA we not only identify the extent of the loss but also try to identify the cause of those failures. To measure FMEA, we require 3 attributes:

  • Severity of the Failure (S)
  • Priority of the Failure (P)
  • Likelihood of the Failure (L)

We put each of these attributes in a scale shown below:

Severity Scale:

DescriptionClassScale
Loss of data, hardware or safety issuesUrgent1
Loss of functionality without a workaroundHigh2
Loss of functionality with a workaroundMedium3
Partial loss of functionalityLow4
Cosmetic or trivialNone5

Priority scale:

DescriptionClassScale
Complete loss of system valueUrgent1
Unacceptable loss of system valueHigh2
Possibly reduction in the system valueMedium3
Acceptable reduction of system valueLow4
A negligible reduction in the system valueNone5

Likelihood scale:

DescriptionClassScale
Certain to effect all usersUrgent1
Likely to impact some usersVery High2
Possible impact on some usersHigh3
Limited impact to a few usersLow4
Unimaginable in actual usageNone5

All these three attributes (Severity, Priority, and Likelihood) are individually measured in scale and then multiplied to get a Risk Priority Number (RPN).

i.e. Risk Priority Number (RPN) = S*P*L

Based on this RPN value, we determine the extent of testing. Lesser is the RPN, higher is the Risk.

Let’s try to understand it with an example:

Failure Mode Effect Analysis Example

(This is a hypothetical example only for an understanding purpose. Actual implementation and features may vary)

Let’s consider a simple example of a banking application that has 4 features.

  • Feature 1: Withdraw
  • Feature 2: Deposit
  • Feature 3: Home Loan
  • Feature 4: Fixed Deposits.

A Risk Analysis team is formed which consists of the Bank manager, UAT Test Manager ( representing end-user), Technical Architect, Test Architect, Network Administrator, DBA, and Project Manager.

After a series of brainstorming sessions, the team came up with the following Risks:

  • Complex Business logic in the case of calculating the interest rate of the home loan.
  • The system fails at 200 concurrent users.
  • The system fails to handle documents that are more than 6 MB.

Now let’s try to calculate the Severity, Priority, and Likelihood of these identified risks.

Severity:

FeatureClassScale
Complex business logic in case of calculating interest rate of home loanVery High2
The system fails to handle documents which are more than 6 MBHigh3
System fails to handle documents which are more than 6 MBVery High2

Priority:

FeatureClassScale
Complex business logic in case of calculating interest rate of home loanVery High2
The system fails to handle documents which are more than 6 MBHigh3
System fails to handle documents which are more than 6 MBHigh3

Likelihood:

FeatureClassScale
Complex business logic in case of calculating interest rate of home loanHigh3
System fails at 200 concurrent usersHigh3
The system fails to handle documents which are more than 6 MBLow4

Now let’s put all these attributes together:

Feature

Severity

Priority

Likelihood

Complex business logic in case of calculating interest rate of home loan223
The system fails at 200 concurrent users333
The system fails to handle documents that are more than 6 MB234

Now let’s calculate the Risk Priority Number (RPN = Severity * Priority * Likelihood)

Feature

Severity

Priority

Likelihood

RPN

Complex business logic in case of calculating interest rate of home loan22312
System fails at 200 concurrent users33327
The system fails to handle documents which are more than 6 MB23424

Now the key is: Lower is the RPN – Higher is the risk.

So here for this particular example, Feature 1 (Complex business logic in case of calculating the interest rate of the home loan) has the highest risk, and feature 2 (System fails at 200 concurrent users) has the lowest risk.

How to use this to derive test cases?

Since Feature 1 is the riskiest feature, the test cases should be rigorous and more in-depth. Write the test cases to cover complete functionality and affecting modules by the feature. Use all sorts of test case writing techniques (Equivalence Partitioning and BVA, Cause and effect graph, State Transition diagram) to derive the test cases.

The test cases should not only be functional but also non-functional (Load test, Stress, Volume test, etc.). Basically, we need to do exhaustive testing of this particular feature, so base your test cases accordingly. Also, consider all the dependent modules on this important feature.

Feature 2 is the LEAST RISKY feature, so base your test cases on the major functionality. Just high-level test cases to validate that the feature works as expected should be sufficient.

Feature 3 is a MODERATE RISK feature, so base your test cases to cover all the major and dependent functionality. Write some BVA test cases to validate a few negative scenarios as well. The extent of the test cases should be between High risk and Low-risk factors. If required, include a few non-functional test cases as well.

FMEA And Degree Of Testing

Based on the RPN value, we determine the extent or degree of testing to be done.

Normally if:

  • RPN is between 1-10, we do Extensive Testing (Covering in and out of the feature/module)
  • RPN is between 11-30, we do Balanced Testing ( Covering all the major functionality of the feature/module)
  • RPN is between 31-70, we do Opportunity testing (Covering the basic functionality of the feature/module)
  • RPN is more than 70 – No testing or when time permits, only anomaly reporting.

These ranges or numbers are not restricted to the ones I mentioned above. They may vary as per the nature of the project.

Resources: Download FMEA Software and FMEA Template.

Conclusion

Risk Analysis using FMEA requires time and experience. Desired results can be achieved only by equal participation from all the responsible team members. Though this technique is formal, it requires a series of brainstorming sessions and it is equally important to document all the identified risks.

Since most of the applications are exclusive, the scale to measure the parameters of FMEA (i.e. Priority, Severity, and Likelihood) also depends on the application. If done appropriately, there are many advantages to the FMEA technique. It can be used for identifying potential risks and based on this team can plan an effective mitigation strategy.

About the Author: This is a guest article by Shilpa Chatterjee Roy. She is working in the software testing field for the past 8.5 years in various domains.

If you have used this technique please feel free to comment on your experience below.

Was this helpful?

Thanks for your feedback!

Recommended Reading

17 thoughts on “Failure Mode and Effects Analysis (FMEA) – How to Analyze Risks for Better Software Quality & Satisfied Customers!”

  1. Hi
    It is a nice article on FMEA in Software testing. The author can make a slight tweak in how the rating is done. In a traditional FMEA rating for Severity, Occurrence and Detection are rated in opposite way, higher is the score higher is the is Severity or Occurrence or Detection. In this way the Failure mode with the high RPN numbers are to be prioritized. I feel the way of rating explained in article will confuse the people. Would like to hear the views of others.

    Reply
  2. you are introducing us to new and new testing processes which we never used. these are really helpful technical methods to perform testing. thank you soooo much for sharing. keep up the good work.

    Reply
  3. Thought Provoking Article.

    Just One Doubt –

    When to do this FMEA meeting ?

    1. If At the Starting of Development – Then How we know what is going to be come in front of us during Development.
    In your case how you know before development that the Application fails to handle documents which are more than 6 MB

    2. If after the Development -Then Does it will definitely delay the Release of Application as we have to fix the urgent Issues.

    Please explain when FMEA should be done and Why ?
    If you explain with the same example it will be well and Good.

    Thanks,
    Sachin

    Reply
  4. Thank you all!

    @Sachin,

    Risk analysis is basically done during the planning stage. In fact this forms the basis of creating the dev plan and test plan.

    To understand and identify the risk is a tricky job but Their is no rocket science involved to identify the risk. It requires experience. Based on your experience and judgement we do the risk analysis and thats why i mentioned, it requires lots and lots of brainstorming sessions.

    Reply
  5. Hello,
    The article very crisp and gives a good summary. However, i have following queries:
    1) FMEA guidelines usually followed globally have RPN ranking tables in descending order, ie, 10 is most severe and 1 is least severe, however your article mentions in ascending order. What is the basis and advantage of this method?
    2) FMEA process also includes identification of causes, preventive controls etc, however these are not included in this article. Kindly share the name of the guideline followed for this method or is this method customized for software testing.

    Reply
  6. Hi Shilpa,

    Really its very useful, i thought it was risky, now am feeling its very simple if you have little testing experience.

    Thanks a lot..!!!!!!1

    Reply
  7. Very Good Article. I am not into software, however this tool is very much useful in my Pharmaceutical quality systems.
    It would be great help if you can give some examples according to my area.

    Reply
  8. Hi,

    It is indeed a great article Shipla.

    In addition, I may be able to add a few thoughts to FMEA in reference with 15 years of hands-on implementation of the tools. FMEA is classified into 2: 1) DFMEA (Design Failure Mode Effects Analysis) which is used during the product design stage. At this point designer would have to review the historical problems or complaints encountered and perform necessary analysis before proceeding with new/improved design. In this manner VALUE of the product could be increased and the COST on the other hand could be tremendously reduced. While adding value the company would be able to realize higher profit margin. Remember DESIGN stage is crucial or else the failure will spill into processes/production/services. This may further impact process cost, internal and external customer satisfaction. What is more scary is shrinking of profit margin. It would be more costly to resolved severe failure at PROCESS stage instead at DESIGN stage. Companies are losing tons and millions cash because they are unaware of the importance of DFMEA. 2) PFMEA (Process Failure Mode Effects Analysis) which is used as a tools for process improvement by identifying the potential failure/risk surfacing from processes. In most cases the failure identified would be adopted into the process manual or SOP to ensure that the day-to-day processes are complied by the operators. It is to bare in mind that both the DFMEA & PFMEA is to be developed prior to product design and processing. In most cases DFMEA & PFMEA documentation will be finalised during the prototype or trial stage. I hope this info is helpful! Be blessed!

    Reply

Leave a Comment