Failure Mode and Effects Analysis (FMEA) – How to Analyze Risks for Better Software Quality & Satisfied Customers!

Failure mode and effects analysis (FMEA) is a risk management technique. If implemented properly this can be a great addition to the best quality assurance processes to be followed. In this article our goal is to introduce you to this risk analysis technique which in the end, is very useful for improving the software quality.

FMEA is mostly used by the upper management or stakeholders. In practice, the testers get little insights into this technique. But now the trend is changing and I feel if the testers understand this concept properly, they can drive their thought process of writing test cases to one level up by utilizing this technique to:

  • Understand the stakeholder’s goals of testing the application.
  • Understand the business.
  • Derive the high level test scenarios based on business and management interest.
  • Derive effective test cases which provide a better coverage to the risk prone areas.
  • Prioritize the test cases.
  • Decide what to test and what to defer at any phase.

Background:

RISK ANALYSIS is a crucial aspect of Test Management. The question then arises – What is Risk analysis? And why is it important? To understand this, it is vital to understand – what is RISK?

risk analysis

See Also => Types of Risks in Software Projects.

RISK as its literal meaning, is a possibility of a negative or undesirable outcome or event. Risks if not handled or managed properly may lead to poor quality, unsatisfied customers and some times loss of business.

Risk has 2 attributes – Probability and Impact.

Probability means the chances of a particular risk to occur and impact means extent of the affect of the risk.

What is Risk Analysis?

Risk analysis is a mechanism by which the identified potential risks are analyzed & studied thoroughly in order to find the probability and impact. It is advisable to measure the two attributes and based on the result we identify:

  • What to test first
  • What to test more
  • What not to test(This time)

There are many methods of doing the Risk Analysis and they are broadly classified into two types:

  1. Informal Techniques – Which are based on experience, judgment and intuition
  2. Formal Techniques – Identifying and weighing the risk attributes.

Failure Mode And Effects Analysis (FMEA) – Is a formal method of doing risk analysis. In the following sections, I will be discussing more on FMEA and try to elaborate it with example.

FMEA

Introduction:

FMEA is a formal technique of doing Risk Analysis. It is a systematic and quantitative tool in a form of a Spread Sheet which assists the members to analyze what might get wrong. To do the FMEA we require right people on the table. It requires representative from all walks of industry including customers.

Description

FMEA starts and continues with Brain storming sessions. Participants need to identify all the components, modules, dependencies, limitations that could fail in production environment and eventually lead to poor quality, reliability and may result in loss of business.

During FMEA we not only identify the extent of the loss, but also try to identify the cause of those failures. To measure FMEA, we require 3 attributes:

  1. Severity of the failure (S)
  2. Priority of the failure (P)
  3. Likelihood of the failure (L)

We put the each of these attribute in a scale shown below:

Severity Scale:

DescriptionClassScale
Loss of data, hardware or safety issuesUrgent1
Loss of functionality without a work aroundHigh2
Loss of functionality with a work aroundMedium3
Partial loss of functionalityLow4
Cosmetic or trivialNone5


Priority scale:

DescriptionClassScale
Complete loss of system valueUrgent1
Unacceptable loss of system valueHigh2
Possibly reduction in system valueMedium3
Acceptable reduction of system valueLow4
Negligible reduction in system valueNone5


Likelihood scale

DescriptionClassScale
Certain to effect all usersUrgent1
Likely to impact some usersVery High2
Possible impact on some usersHigh3
Limited impact to few usersLow4
Unimaginable in actual usageNone5


All these three attributes (Severity, Priority and Likelihood) are individually measured in scale and then multiplied to get a Risk Priority Number (RPN).

I.e. Risk Priority Number (RPN) = S*P*L

Based on this RPN value, we determine the extent of testing. Lesser is the RPN, higher is the risk.

Let’s try to understand it with an example:

Failure Mode Effect Analysis Example:

(This is a hypothetical example only for understanding purpose. Actual implementation and features may vary)

Let’s consider a simple example of a banking application which has 4 features.

  1. Feature 1 – Withdraw
  2. Feature 2 – Deposit
  3. Feature 3 – Home Loan
  4. Feature 4 – Fixed Deposits.

A risk analysis team is formed which consist of Bank manager, UAT test manager ( representing end user), Technical architect, Test architect, Network administrator, DBA and a Project manager.

After a series of brain storming sessions the team came up with the following risks:

  1. Complex business logic in case of calculating interest rate of home loan.
  2. System fails at 200 concurrent users.
  3. System fails to handle documents which are more than 6 MB.

Now let’s try to calculate the severity, priority and likelihood of these identified risks.

Severity:

FeatureClassScale
Complex business logic in case of calculating interest rate of home loanVery High2
System fails at 200 concurrent usersHigh3
System fails to handle documents which are more than 6 MBVery High2


Priority

------------

FeatureClassScale
Complex business logic in case of calculating interest rate of home loanVery High2
System fails at 200 concurrent usersHigh3
System fails to handle documents which are more than 6 MBHigh3


Likelihood

FeatureClassScale
Complex business logic in case of calculating interest rate of home loanHigh3
System fails at 200 concurrent usersHigh3
System fails to handle documents which are more than 6 MBLow4


Now let’s put all these attributes together:

Feature

Severity

Priority

Likelihood

Complex business logic in case of calculating interest rate of home loan223
System fails at 200 concurrent users333
System fails to handle documents which are more than 6 MB234


Now lets calculate the Risk Priority Number (RPN = Severity * Priority * Likelihood)

Feature

Severity

Priority

Likelihood

RPN

Complex business logic in case of calculating interest rate of home loan22312
System fails at 200 concurrent users33327
System fails to handle documents which are more than 6 MB23424


Now the key is: Lower is the RPN – Higher is the risk.

So here for this particular example, Feature 1 (Complex business logic in case of calculating interest rate of home loan) has the highest risk and feature 2 (System fails at 200 concurrent users) has the lowest risk.

How to use this to derive test cases?

Since feature 1 is the MOST RISKY feature, the test cases should be rigorous and more in depth. Write the test cases to cover complete functionality and affecting modules by the feature. Use all sorts of test case writing techniques (Equivalence Partitioning and BVA, Cause and effect graph, State transition diagram) to derive the test cases.

The test cases should not only be functional but also non-functional (Load test, Stress and Volume test etc.). Basically we need to do an exhaustive testing of this particular feature, so base your test cases accordingly. Also consider all the dependent modules on this important feature.

Feature 2 is the LEAST RISKY feature, so base your test cases on the major functionality. Just high level test cases to validate that the feature works as expected should be sufficient.

Feature 3 is a MODERATE RISK feature, so base your test cases to cover all the major and dependent functionality. Write some BVA test cases to validate few negative scenarios as well. The extent of the test cases should be between High risk and Low risk factor. If required, include few non-functional test cases as well.

FMEA and Degree of testing

Based on the RPN value, we determine the extent or degree of testing to be done.

Normally if:

  • RPN is between 1-10 , we do Extensive Testing (Covering in and out of the feature/module)
  • RPN is between 11-30, we do Balanced Testing ( Covering all the major functionality of the feature/module)
  • RPN is between 31-70, we do opportunity testing (Covering the basic functionality of the feature/module)
  • RPN is more than 70 – No testing or when time permits, only anomaly reporting.

These ranges or numbers are not restricted to the ones I mentioned above. They may vary as per the nature of the project.

 Resources: Download FMEA Software and FMEA Template.

Conclusion:

Risk Analysis using FMEA requires time and experience. Desired results can be achieved only by equal participation from all the responsible team members. Though this technique is formal, it requires a series of brain storming sessions and it is equally important to document all the identified risks.

Since most of the applications are exclusive, the scale to measure the parameters of FMEA (i.e. priority, severity and likelihood) also dependents on the application. If done appropriately, there are many advantages of FMEA technique. It can be used for identifying potential risks and based on this team can plan an effective mitigation strategy.

About the Author: This is a guest article by Shilpa Chatterjee Roy. She is working in software testing field for the past 8.5 years in various domains.

If you have used this technique please feel free to comment your experience below.



Get FREE eBook + Blog Updates By Email!

Subscribe to get software testing awesome articles and free resources. Enter your email address and click 'SIGN UP NOW' button.


9 comments ↓

#1 Anjali Mone on 01.22.14 at 8:59 pm

you are introducing us to new and new testing processes which we never used. these are really helpful technical methods to perform testing. thank you soooo much for sharing. keep up the good work.

#2 Bibhishan Dhagate on 01.23.14 at 4:36 am

sir,
it is a great article….

#3 Veda on 01.23.14 at 8:02 am

thank you. what are other methods for risk analysis?

#4 Sachin on 01.23.14 at 10:34 am

Thought Provoking Article.

Just One Doubt -

When to do this FMEA meeting ?

1. If At the Starting of Development – Then How we know what is going to be come in front of us during Development.
In your case how you know before development that the Application fails to handle documents which are more than 6 MB

2. If after the Development -Then Does it will definitely delay the Release of Application as we have to fix the urgent Issues.

Please explain when FMEA should be done and Why ?
If you explain with the same example it will be well and Good.

Thanks,
Sachin

#5 Sirisha Ch on 01.23.14 at 7:16 pm

Useful article. Thank you !

#6 Shilpa Chatterjee Roy on 01.24.14 at 11:33 am

Thank you all!

@Sachin,

Risk analysis is basically done during the planning stage. In fact this forms the basis of creating the dev plan and test plan.

To understand and identify the risk is a tricky job but Their is no rocket science involved to identify the risk. It requires experience. Based on your experience and judgement we do the risk analysis and thats why i mentioned, it requires lots and lots of brainstorming sessions.

#7 Arockiaraj Martin on 01.25.14 at 6:04 am

Good documentation thanks !!

#8 Mugil.k on 01.27.14 at 6:46 am

Hi Shilpa,

Really its very useful, i thought it was risky, now am feeling its very simple if you have little testing experience.

Thanks a lot..!!!!!!1

#9 KK on 04.10.14 at 5:56 pm

Hi Shilpa ,

This is very good article on the FEAM , the way you explained the concept is excellent.

Thanks
KK

Leave a Comment