
Thanks: 0
Likes: 0
Dislikes: 0

Junior Member
Reliability measure/calculations
I am looking for how one (non mathemetician, high school algebra) goes about measuring reliability. An example I saw for a simple SYSTEM made up of
Computer Reliability 0.95
Terminal Reliability 0.98
System Reliability = 0.931
Trying to reverse engineer, I took the mean of the 2 componets 0.965, and subtracted the mean of the failues 0.35, and came up with 0.93.
Is this even close to any statistical formula?
I have the "Metric and Models in SW Quality" by Kan...but I'm afraid my lack in mathmatics makes much of it Greek:(
I searched stickyminds for info and can't seem to find what I'm looking for.
Any insight would be greatly appreciated....


Moderator
Re: Reliability measure/calculations
<BLOCKQUOTE><font size="1" face="Verdana, Arial, Helvetica">quote:</font><HR>Originally posted by lprice22:
I am looking for how one (non mathemetician, high school algebra) goes about measuring reliability. An example I saw for a simple SYSTEM made up of
Computer Reliability 0.95
Terminal Reliability 0.98
System Reliability = 0.931
Trying to reverse engineer, I took the mean of the 2 componets 0.965, and subtracted the mean of the failues 0.35, and came up with 0.93.
Is this even close to any statistical formula?
I have the "Metric and Models in SW Quality" by Kan...but I'm afraid my lack in mathmatics makes much of it Greek:(
I searched stickyminds for info and can't seem to find what I'm looking for.
Any insight would be greatly appreciated....
<HR></BLOCKQUOTE>
Most likely, they meant 0.95 * 0.98 = 0.931
(Where the SYSTEM reliability is made up of the COMPUTER reliability and TERMINAL reliability)

 Joe (strazzerj@aol.com)

Senior Member
Re: Reliability measure/calculations
Statistically to do it your way you can only use 2 decimals for your mean value.
So Mean Pass = 0.96 because on .5 you round to the even number. ie (0.955) = (0.96) and (0.945) = (0.94). so your mean (0.965)= (0.96) and your Falure (0.035)= (0.04).

[This message has been edited by CCoulter (edited 05162002).]

Junior Member
Re: Reliability measure/calculations
Thanks you guys! Feeling like a real idiot here!! Any suggestions on a good place to start leaning theory? Trying to "get it" through deciphering formulas tain't working for me! I need to back up and do this logically.


Senior Member
Re: Reliability measure/calculations
<BLOCKQUOTE><font size="1" face="Verdana, Arial, Helvetica">quote:</font><HR>Originally posted by lprice22:
Trying to "get it" through deciphering formulas tain't working for me! I need to back up and do this logically.<HR></BLOCKQUOTE>
Basically, you just need to consider the basics. If you will tolerate a slightly longer post here, I will throw out some things for you to think of.
First off, one good way to think about reliability is in terms of what it is actually looking for. Your traditional equation, in this case, for software reliability is:
Reliability = 1  Number of Errors / Total number of lines of executable code
(Note: the "number of errors can be based on actual or predicted values, depending upon if you are doing forecasting or estimation. You could also talk about total modules or total units or total functions rather than strict lines of executable code.)
Some Basics
Obviously you know that you can think of reliability in terms of the probability of failurefree operation for a specified time in a specified environment for a given purpose. This, of course, means quite different things depending on the system and the users of that system. Formally, software reliability is basically the estimation of the probability of software failure during a specified exposure period. Informally, reliability is a measure of how well the users of a system think it provides the service(s) they require. Reliability usually requires an operational profile for its definition. The operational profile defines the expected pattern of usage of the system. Also, reliability must consider fault consequences. Not all faults are equally serious. The system is perceived as more unreliable if there are more serious faults.
Reliability is improved when software faults which occur in the most frequently used parts of the software are removed. The idea is that removing a certain percentage of software faults will not necessarily lead to that same certain percentage of reliability improvement. Studies tend to show that removing sixty percent of software defects actually led to a three percent overall reliability improvement. Removing faults with serious consequences is the most important objective. We all know this but often reliability measures are not done in this context.
Something else to keep in mind: time units in reliability measurement must be carefully selected. One should not just automatically use the same measurement for all systems. For example, you might use raw execution time for nonstop systems. You might use calendar time for systems which have a regular usage pattern, such as systems which are always run once per day, for example. Or you might use the number of transactions for systems which are used on demand.
As far as a general reliability specification, which you will often hear used as a form of requirement that the business wants reporting on, the idea is generally that for each subsystem (or module, if you prefer), you analyze the consequences of possible system failures. From the system failure analysis, you then partition failures into appropriate classes. For each failure class identified, set out the reliability using an appropriate metric. Different metrics may be used for different reliability requirements so that is where your creativity can somewhat come in. As examples, there are two major techniques: Reliability Growth Models and Statistical Testing Procedures.
Statistical Testing Procedures
In order to do this, you determine the operational profile of the software. Then you generate a set of test data corresponding to this profile. Then you apply tests, measuring the amount of execution time between each failure. After a statistically valid number of tests have been executed, reliability can be measured. Some problems with this, at least potentially: (1) Some uncertainty in the operational profile. This is a particular problem for new systems with no operational history. It is less of a problem for replacement systems. (2) High costs of generating the operational profile. Costs are very dependent on what usage information is collected by the organisation which requires the profile. (3) Statistical uncertainty when high reliability is specified. It is sometimes difficult to estimate level of confidence in operational profile Usage pattern of software may change with time.
Reliability Growth Modeling
A growth model is a mathematical model of the system reliability change as it is tested and faults are removed. It is used as a means of reliability prediction by extrapolating from current data. This technique depends on the use of statistical testing to measure the reliability of a given system or of a given product.
Some ways to measure...
You might consider the probability of failure on demand (POFOD). This is a measure of the likelihood that the system will fail when a given request is made of it. (Common example: a POFOD that equals 0.001 means one out of one thousand requests result in failure.) This is relevant for safetycritical or nonstop systems. You might also consider rate of fault occurrence (ROCOF). This basically refers to the frequency of occurrence of unexpected behavior of some sort. (Common exampleE: a ROCOF of 0.02 means two failures are likely in each one hundred operational time units.) This is relevant for operating systems, transaction processing systems, etc.
You might also consider mean time to failure (MTTF). Here you just measure the time between observed failures (i.e., the average time units between failures). So, as an example, an MTTF of 500 means that the time between failures is five hundred time units. This is relevant for systems with long transactions. You can even look into the notion of availability and apply that to reliability measures. This is a measure of how likely it is that the system is available for use. (This should take repair/restart time into account.) So, an availability of 0.998 means software is available for 998 out of 1000 time units. This is relevant for continuously running systems. This measure is often given as: MTBF / (MTBF + MTTR), with MTTR being Mean Time To Repair and MTBF being Mean Time Between Failure. In that case, availability is the degree of availability expressed as a percentage. MTBF is the mean time between failures (think of it as uptime) and MTTR is the maximum time to repair (think of it as downtime). As MTTR approaches zero, the availability increases toward one hundred percent (all other things being equal). As the MTBF gets larger, the MTTR has less and less impact on the availability. All this is stating is the obvious: the key is obviously to minimize downtime. As downtime approaches zero, availability approaches one hundred percent. What all of this now means is that we can then make a statement of reliability as such:
Reliability = MTBF / (1 + MTBF)
So the overall idea is that you measure the number of system failures for a given number of system inputs (which is used to compute the POFOD). Then you measure the time (or number of transactions) between system failures. (This is used to compute the ROCOF and MTTF.) Finally, you measure the time to restart after failure. (This is used to compute the availability.)
= = = = = = =
Hopefully this at the very least gives you some different ways of thinking about how reliability is used.

Posting Permissions
 You may not post new threads
 You may not post replies
 You may not post attachments
 You may not edit your posts

Forum Rules
