| || |
Effort estimates - FPA ?
Anyone done practical effort estimates for Test Life cycle and been successful in any J2EE Testing project ? if so, please enlighten us...
FPA - Function Point Analysis is a technique used by Project Managers in development life cycle to measure the effort estimates. So if anyone has used this technique in Test life cycle ?
Re: Effort estimates - FPA ?
I had tried to come up with some estimates and based on functional specifications had estimated 5 Man Day for a Java project where total test cases were 500-600 originating from 40-50 core functional requirements. Most of the time it used to be valid barring the situations of setup and operational constraints. For bigger projects you may take FPA too into account but I would suggest you to stick to estimation based on either coding size or functional requirements, preferable taking into account both.
Re: Effort estimates - FPA ?
I have posted a reply in a different discussion on the 'test estimate' topic. I did that based on the little knowledge I had with FPA. Hope it solves your issue.
Kindly follow the mentioned URL <http://www.qaforums.com/cgi-bin/forums/ultimatebb.cgi?ubb=get_topic;f=40;t=000108>
QAForums- Estimation and Planning
Re: Effort estimates - FPA ?
I do not believe that function point analysis is a very reliable basis for test estimating, and I'd like to give a typical example of the difficulties.
I was involved in a major effort to introduce function point (FP) metrics into a large financial services organization. This effort failed for both business reasons and technical ones. Since the business reasons were specific to that organization, I will mention three only briefly before I address the technical reasons.
First, the motive underlying the metrics project was questionable. The company’s headquarters in New York funded their Phoenix-based Information Systems (IS) operation to the tune of $1.5 billion annually, and in the eyes of the Phoenix people the headquarters staff was always whining about how little a billion dollars buys. Any metric which showed the Phoenix crowd were hard working, productive and delivered high quality was a good metric. By contrast, a bad metric ... you get the picture.
Second, the project was allowed to balloon from two part-time consultants (myself and another) to 20-plus full-timers over two years, as we added a team to develop our own metrics visualization software, a team to publish an FP how-to manual, another team to audit function point (FP) counts, etc. Third, the project team failed to build a wide support base, and when the senior executive sponsoring left the company the vultures circled as the project fell apart.
On the technical side, the biggest mistake in my opinion was to ground the metrics program in function points (FP). This was my recommendation (naively, with hindsight), and I felt like Johnny Appleseed as I trotted around various senior management meetings with my PowerPoint slides. I quoted Capers Jones, who called FP one of the most significant breakthroughs in the history of software. I even trained about 200 people in the FP method, with each attending two days of classes.
The deeper problem was grounding the metrics program on any single measure. This company pursued one sizing measure as their panacea to measurement success. Any measure no matter its virtues will be a failure (and a scapegoat) if used indiscriminantly, without having been chosen as part of a carefully planned program. Since FP are the equivalent of a building’s square feet for software, this is similar to measuring the size of every house in a city because a real estate guru says that square feet is the key to managing community resources like roads.
Training and Certification
The FP technique requires trained and certified counters. FP counters are professionals who know how to apply the method according to the international FP (IFPUG) standards, and can competently count the FP for an application or a modification. If untrained personnel count FP, there will be inconsistencies from counter to counter. (The IFPUG and several consulting organizations provide FP classes.)
This is similar to accounting. If someone tries to do his or her own books without knowing the accounting rules, his results will be different from someone who knows the rules. But most people do not want to become certified accountants in order to manage their accounts. You could argue that this point is a red herring, because the fact that FP requires training and certification has nothing directly to do with the failure of the financial services company’s measurement program. Nevertheless, the need for training and certification was a distraction and complication which contributed to the failure.
Human Misinterpretation and Vagaries
Even with training and certification, there can be variations in how FP counters interpret and apply the rules. Certified counters can routinely produce FP counts which are within a range of plus or minus 10% of the counts produced by other certified professionals. Typically, though, the variance is higher because many people are under- or mis-trained. Unfortunately, there are few alternatives to FP. If you need a sizing measure (and all software metrics programs need one), then LOC (lines of code) or FP are the only two viable choices. The skew is probably worse for LOC, though, and you still have to gather raw data and follow basic counting rules.
To be fair, other measures involved in the company's metrics program were equally or more haphazard than FP -- in fact FP lent a degree of credibility to an otherwise under-planned and maladjusted program. We saw attempts to correlate and compare work effort figures that were all recorded in hours but were apples-to-oranges comparisons. No one challenged that because everyone thought that if efforts are recorded in "hours" then the figures must be measuring the same thing. This was wrong, yet FP counts were blamed when software productivity and delivery dates were calculated and significantly misrepresented the situation.
FP counting is labor-intensive and can be horrendously time-consuming, which encourages people to skimp, short circuit or bypass the work. The time drain is not in the counting process itself, but in gathering data on the system (inputs, outputs, databases, etc.) and gaining a sufficiently thorough understanding of its functionality that the count will be valid. Automated tools are available which count FP, but these are not as accurate as human counters, the tools have a narrower range of systems they can analyze, and the tools do not perform the hard part which is to gather the data.
FP counting is becoming more feasible to automate with the spread of repeatable requirements engineering methods, where entity-relationship (ER) diagrams, universal modeling language (UML), Rational unified process (RUP), use cases and the like are available. Where there is no documentation the FP counting can be difficult, but whether or not it is worthwhile depends on the reasons you are measuring in the first place. This company set out to count their entire application portfolio in FP (similar to sizing a city) with only a weak rationale for doing so. They spent a pile of $$$ on measuring things that they thought would make a difference, but they neglected to follow sound measurement principles and did not adequately apply the goal-question-metric (GQM) approach.
The data gathering difficulty contributed to the FP failure. However, the same accusation can be made for most measures. For example, defect data requires an investigation of the defect numbers and categorization before it can be relied on. Does defect data also fall into the failure-as-a-measurement category because of its data gathering difficulties?
Results Provided in FP
The FP technique in itself delivers only an overall project size in FP, not the project cost or time estimates or early warnings of problems. Many people do not find an estimate of let’s say 337 FP to be much help: they do not work or think in FP. To derive something usable from the FP count, such as the estimated project resource-hours, an estimate of the productivity factor is needed (the number of FP that can be built -- or tested -- per resource-hour). This productivity factor acts as the divisor in converting from the project size in FP to the estimate of resource-hours. The productivity factor may be based on an organization's own prior experience or on industry averages (available from IFPUG and several consulting organizations). However, if the productivity factor is too optimistic or too pessimistic, the FP-derived estimate will correspondingly be incorrect.
Many experienced software professionals have been persuaded that FP could be the holy grail of estimation. Square feet cannot predict the cost to build a building (there are many more factors -- and size is only one (albeit an important one). A software productivity rule of thumb is similar to a cost per square foot (I know it would be better stated as the hours per square foot) to build. The issue is "to build what? where? and how?" that comes into play, and many organizations calculate a productivity rate based on a sample size that often is less than five projects, and some on the ludicrous basis of a single data point. In construction it is easy to see that this is a basis for failure; in software measurement, some people simply miss the point and are convinced that a small sample set is adequate. This is not a failing of FP per se, but of the misdirection of the measurement team to force-fit a measure and misapply its results accordingly.
Breadth of Applicability
FP works very well with classic IS applications, i.e., systems which are oriented towards transaction processing and are data-based, regardless of whether the platform is a mainframe or a client/serve network. (The FP technique was originally developed for business information systems such as payroll.)
FP does not work as well with other types of systems than classic IS, such as orbital calculation software for the space shuttle, real-time embedded systems or a network operating system (NOS). For these types of software, there are variations on FP with names like Feature Point and Function Mass, but these are difficult to apply and have doubtful accuracy. There is a methodology called Full Function Points (FFP), designed specifically for sizing embedded and real-time applications. Examples of inputs and outputs where the FP method does not work well include: a screen saver that generates graphics from a mathematical model, an encryption program that reads a file and encrypts it into an output file, and a file compression program.
Real-time applications generally are more difficult, requiring stricter quality considerations, algorithmically intensive processes and response time guarantees. (Models such as COCOMO II and others have changed their estimating equations to take these differences into consideration.) The labor has increased due to the non-functional characteristics of these types of systems, and the measure of the functionality has changed. To quote Capers Jones, FFP attempts to solve the same problems as Feature Points was intended -- that is to placate the egos of engineers whose work does not appear as productive as other types of work when using FP.
We need to use the appropriate estimating equation -- and it needs to account for differences in a lot of things besides the non-functional requirements such as the technical approach (RAD, spiral, XP, etc) as well as WBS, languages, skills, geography, etc. It doesn't make a lot of sense to mix the sizing measure with a selected group of non-functional and technical requirements and then throw away the square foot measure because it doesn't suit your specific purpose. I've seen object points compared to FP as similar and interchangeable measures -- when in fact, they are measuring different things. Both have a time and place where they work -- but neither is (or was intended to be) a Swiss army knife of metrics. It is similar to having a toolkit with a hammer and a screwdriver and then telling people to toss the hammer away because you've been working mostly with screws lately. What happens when you encounter a nail? A screwdriver doesn't do nailing very well, even if it is a new screwdriver and the hammer is old.
A great deal of effort has been spent to make FP as objective as possible, but there is still a considerable amount of subjectivity to the method. For example, how the boundary is drawn around an application affects the application’s FP count The way in a which a group of interrelated applications are partitioned, in other words, where the application boundaries are drawn, can significantly change the total FP count for that group of applications. While guidelines direct how to draw the boundaries, ultimately the choices are subjective.
Using alternatives to FP such as the FFP method can creates even more problems with boundaries by introducing the notion of layers that can vary based on your physical architecture. The boundary definition is done once in FP, and the consistency of measurement proceeds from there. I've seen cases where there is more variability in the boundaries of defect categories -- the same defect could be classified differently across projects -- based on definitions (even good ones) of categories of defects.
Lack of One Standard
While the IFPUG maintains the "official" standard for counting FP, there are several dialects and variations, such as the British MK II version. The FP count can vary, depending on which set of counting rules is used. To be fair, this may be another red herring. Anyone who has an inkling about FP knows that when you cross rule sets you get inconsistent results. That's like saying that distance measurement is wrong because both feet and meters can be used.
Combination of Inconsistent Elements
I'm concerned that function points violate measurement theory by combining into one dimension (the point count) data from several other dimensions that are incompatible. Norman Fenton has expounded on the misassumptions people have about implementing measurement programs, and why the programs ultimately fail.
His points are absolutely true and the whole notion that you mention about measurement theory was not a part of it. I'm not going to go into the discussion about measurement theory because I've been privy and a part of many of these having been on the ISO working group and a project editor for one of the functional size measurement ISO standards during my involvement since 1994. I'd simply like to know your opinion about the mixing of dimensions specifically.
Function points cannot be readily be visualized by many people, or what people do visualize is wrong. I understand lines of code, what they can and might do. What does a function point do? Because I can't visualize them, I also can't relate them to testing. I can look at 500 (or 10,000) lines of code and think of ideas for testing it, but five (or 100) function points? Most testers would recognize the absurdity of this statement: “This system has 675 FP, give or take a few. How would you test it?” We end up looking merely at the functionality itself. Some functions take more testing, and some less, but in a way that is only weakly related to function points.
Some have reported success with people being able to visualize function points. But that won't help testers who are not used to dealing FP, just as they (black-box testers) are not used to working with lines of code or physical modules.