|
In the fast-paced world of software development, rigorous and well-organized testing is often an ideal that quickly falls by the wayside. "Just get it done!" is frequently the managerial watchword. Meanwhile, testing is an all-too-often black-box undertaking performed "over there" by the Software Quality Assurance (SQA) group. In many shops, one groupengineeringwrites the code, while another group SQA tests it. In essence, the code gets thrown over the wall, and the bugs get thrown back. Because of this "over the fence" mentality, software development and software testing groups are often functionally isolated from one another. And this assumes that a dedicated SQA group exists at all. In some development environments, the only rigorous testing that's ever performed on a block of code is what's run during the actual coding phase. Any SQA may be little more than a rubber stamp entity. But increasingly, the process of software testing is becoming a formal and well-defined science, with it's own barometrics, procedures, and tools. And when properly applied, such testing offers verifiably improved development cycles, and heightened product quality. So it shouldn't come as any surprise to find Sun Microsystems, Inc. at the forefront of this sometimes neglected, but all-important aspect of software development. This article, with the help of Frank Dibbell, Manager of Software Quality Assurance in Sun's Consumer and Embedded Division, explores the basic theories, principles, phases, categories, and tool types currently being used in software testing at Sun, as well as testing issues specific to the world of Java technology. A future article will profile actual selected testware, along with sample inputs and outputs from these tools.
SQAEarly InvolvementThere are countless payoffs to involving SQA in a software development project from its earliest stagesand in breaking down the oft-found artificial boundaries between development and SQA. "I tell my managers to get involved with their engineering counterpartsengineer to engineer," says Frank Dibbell, "to learn what each of their technical skill sets are, and to recognize that they're all working on a given project together." Dibbell produces a chart demonstrating the various group inputs during the software development cycle (see Figure 1). In it, engineering and SQA are equal partners at even the earliest design stages. "Typically, marketing requirements come to engineering," he says, "and engineering assesses the feasibility. From there, it goes into a design phase. Test QA should ideally get involved at this design phase. And out of that comes the Product Specification." Having engineering and SQA working in parallel at such early stages has multiple payoffs. "Because you have both teams working together early on," continues Dibbell, "you have them each asking questions of one another. And as a result, your design tends to be more solid. That's yet another way of shortening the development cycle. By thoroughly exploring the testing issues, engineering comes to a better understand of the requirements." After the design phase, the two teams then split off for a time and perform their separate duties. "During the implementation phase," Dibbell explains, "the engineers go off to code, while SQA is developing the test plan, the testware, and the test cases. But while we're working on our separate duties, we still maintain contact via product team meetings. The beauty of this model is that with SQA involvement from day one, when engineering is done with the code, guess what? We're ready to test!" "What happens traditionally," adds Debra Hay Creger, Manager of Software Quality Assurance in Dibbell's Java TV group, "is that QA doesn't get involved in a project until the end of the development cycle. But using this model, we manage to compact the cycle significantlyand in so doing, get over the reputation of QA being a development bottleneck. In one of my groups, at the point that development was freezing code and releasing it, we'd already been testing for weeks." "As a matter of fact," adds Dibbell, "using this model, we're often so far ahead, we have to wait for Engineering to fix bugs before we can resume testing." Testing MetricsEven with the best designed system, there will always be bugs. A handy rule of thumb is that 20 bugs will be generated for every 1000 lines of code. "I've been using this model for 20 years," explains Dibbell. "It doesn't seem to matter whether it's Cobol, C, C++, or the Java programming language. Regardless of the language or the skill of the engineer, there's one element that's common, and that's design. Nobody does a perfect design." But once a predicted number of bugs can be determined, it's then a relatively straightforward task to test, find, and fix those bugsand in so doing, gain a quantifiable sense of the state of the software. "The first thing I do is model the software and determine how many bugs we should expect to find," says Dibbell. "The testing then becomes a relatively methodical process to detect and remove them. And in doing so, we get objective numbers to measure against our statistical predictions." Since the easiest and most obvious bugs tend to manifest early, while the more pernicious ones tend to be harder to find and fix, the number of bugs over time initially rises quickly, before finally leveling off, and then asymptotically approaching the predicted values (See Diagram 2). "I accumulate the number of bugs over time," explains Dibbell. "If the line doesn't curve upward and then level off, then the engineers either have a poor design, or they're introducing new bugs as they fix the old ones. Eventually, as we're narrowing the number of bugs and the software is becoming more stable, the curve levels out." More recently, Dibbell has begun tracking what he calls the "bug discovery trend." "It's a ratio of the number of hours it takes to find the next incremental bug," he explains. "Given a fixed (but still unknown) number of bugs, it's relatively easy to find them at first. But it gets progressively harder, so the amount of time and effort to locate them increases." The bug discovery rate is calculated by dividing the number of bugs found during a given period, by the amount of time in person hours it took to find them. "Intuitively, this number should be going up all the time," says Dibbell. Another objective gauge used when trying to estimate overall software quality, and completeness of testing, is a code-coverage analyzer. "You can compile your software under one of these environments," explains Dibbell, "and then run all of your test cases. What it does, is it captures which methods, blocks, and statements are being exercised, and then issues a coverage map. So if our bug counts on a given project are at about 60% of target, and our code-coverage is also at about 60%, then we're pretty much right where we want to be." Code coverage analysis can also serve to detect "dead code"blocks of code that can never actually be executed. While such detection is obviously a good thing, such code blocks can artificially lower testing coverage numbers. "For example," explains Dibbell, "a program with 10,000 lines of code might produce a map of 80% coverage, but with 1,000 lines of it being dead code. In reality, the coverage is closer to 88% of the 9,000 lines of executable code." Another factor to keep in mind when testing software, is the acceptable level of robustness for a particular audience. What might be seen as fine for a beta release to a developer community, would never fly as an off-the-shelf consumer product. "The internal benchmarks I use," explains Dibbell, "if the code is going to a consumer market, I want to have detected and fixed 95% of the estimated bugs. If it's code for a commercial application, then 85% is probably OK. And if it's for a developer audience, then even 65% is sometimes acceptable, because we often use them in our beta testing." Only with such a rigorously applied testing regimen can software quality be truly gauged. "The whole point of testing is to get to a point where we know what the quality level is," says Dibbell,"and then management can make a decision as to whether it's viable to go or not. I have one project that I inherited, where we had no QA group. So the engineers were finding and fixing bugs and not bothering to report them. Now I'm trying to estimate what the quality level of the code is, and I really can't." Development PhasesEvery phase of a software development has distinct needs and requirementsnot only from engineering, but from SQA. Alpha PhaseThe alpha phase is defined as being "feature complete." "We don't enter alpha until all of the features have been coded," says Dibbell. "The purpose of SQA in the alpha phase is to find as many bugs as possibleit's pure nonregression testing. If we predict 800 bugs, and our target for purity is 95%, then we'd better find 760 bugs before we move on." Beta PhaseThe beta phase is defined as a time of product hardening, preparing for the First Customer Ship (FCS). "Beta is full regression testing," explains Dibbell. "Here, we're regressing implemented bug fixes, and firming up the product, getting it ready for FCS." FCSThe last beta build ultimately becomes the FCS build, in preparation for final shipment.
Entry/Exit CriteriaIn a properly administered software testing regimen, each phase should also have well-defined entry and exit criteria (see Figure 2). "Here, you start at the code freeze from engineering," explains Dibbell. "Release engineering first promotes the build, then we run an acceptance test based upon the entry criteria for that phase. If it passes, then you go into the phase test cycle. If it doesn't pass, then it goes back to engineering." During the phase test cycle, new bugs are logged and existing bugs are fixed. Engineering sets the priorities of the bugs. They then either fix them, or schedule them to be fixed in future releases. Finally, the exit criteria for that particular phase is achieved. Alpha CriteriaIn the alpha phase, the most obvious entry criteria is that the code is complete. "We don't know if it works yet," says Dibbell, "but at least the code is there." Other optional entry criteria might be that the test plan is complete, that the test cases have all been specified, and that the design document and functional specs have been written. From there, the alpha phase enters a weekly build/test cycle. "We test on build A all week long," says Dibbell, "finding as many bugs as we can. Meanwhile, engineering is fixing bugs and integrating those into build B, which is promoted on the following Monday. It's total nonregression testing. We don't go back and re-run our tests, we just keep moving forward, to get maximum coverage." At the end of each week's testing, the results are compared with the formal alpha exit criteria. "If we predict 100 bugs," says Dibbell, "and we want to hit 90%, then we'd better have at least 90 bugs. And we'd also better have a code-coverage analysis showing that we've covered at least 90% of the code." Such well-quantified testing scenarios stand in stark contrast to the hit-and-miss methods seen in testing days gone by. "In the past, we often had no objective measurements, and we had no objective goals," says Dibbell. "We just tested until the scheduled date came along, and then we shipped the product. But as a result, we never really knew what the quality was." Beta CriteriaOnce the alpha exit criteria is met, development then takes several weeks in which to prioritize bugs and to decide which will be fixed and tested in beta, and which will be deferred to a future release. "Realistically," says Dibbell, "you can't fix every bug." The software then enters the beta phase, which is one of pure regression testing. The goal of beta is to stabilize and harden the product, in preparation for final shipping. A typical entry criteria for beta might require that all documentation be in draft form. From there, the code enters another weekly build/test schedule, similar to that found in alpha. "We promote a new build each Monday," says Dibbell. "We verify that bugs have been fixed, and if any regression bugs pop up, we fix those too. It becomes an orderly sifting and refining process." A typical exit criteria for beta might be that no priority one or two bugs are left unresolved. Depending upon the actual product, the beta exit might also include certain stability measures. "If this is a server product," says Dibbell, "then we might want to state that it has to run for 72 hours non-stop." By having well-defined testing cycles, with explicit entry and exit criteria, the entire testing schedule tends to significantly compress. "This way, you know what you're measuring against," says Dibbell. "You know from the completion of one cycle exactly where you stand going into the next one. You have real milestones that you can measure against." Testing TypesThere are three main types of testing:
Test ToolsThere are several different categories of testing tools:
Code Coverage AnalyzersAs previously discussed, when software is compiled under a code-coverage analyzer, it captures which methods, blocks, and statements are being exercised by a given set of test cases. Dibbell is quick to point out that a code-coverage analyzer is generally a tool used to guide other types of testing efforts. "Just because we've gotten 100% coverage, doesn't mean that the code is 100% tested," he says. "There are still various combinational issues to keep in mind. We've executed code in a certain order, but that may or may not manifest a given bug. A code-coverage analyzer is just a tool that helps you to better focus your efforts." API TestingAn API-level test tool runs through the various methods of a program randomly, exercising as much of the functionality as possible. "If it has a screen," says Creger, "then you're going to push buttons, or scroll down scroll bars, or expand windows, or minimize them. It will just keep running through these things, for as long as you want it to." Stress TestingA stress-test tool generates large volumes of data, often, randomly. "Literally, you can test to destruction," says Dibbell. "With our stress test tool, we spawn threads, we create objects, and force garbage collection." "And you can stress systems in other ways, as well," adds Creger. "You can say, 'Let's do this, with 20% of the normal memory, taking up 80% of the usual memory with something else.'" "Developers hate stress testing," says Dibbell, "because it manifests memory leaks, which are often difficult to debug. You may not always be able to recreate the exact same thing each time, but using random events, over a period of time, you can usually reproduce something similar." Test HarnessesA test harness is a software framework used to drive a given test. Sun's JavaTest product is an example of such a test harness. JavaTest fits in the class of an assertion level testing tool. "With JavaTest," says Dibbell, "there are no test cases. You have to code your own test suites, and include them in the harness. You specify which test cases you want, and how you want your results to look." GUI/User-Level ToolsSun's JavaStar product is used for testing software GUI's. It is a screen capture and playback tool, usable with both Java applets and applications. "With JavaStar," says Ying Zhang, Software Test Engineer in Dibbell's group, "you can either write a script from scratch, or you can record and play one back. You can also start and stop along the way, stepping through execution of a given method." JavaStar also allows separate testing sessions to be joined together as one. "You can record the login portion of an application, for example," says Zhang, "and then you can stop and record a particular function just after login. Afterward, you can connect the two together and run them as a single test." But the tool also has its limitations. In order to get reflection back for its screen capture and playback, it has to replace some of the AWT classes with its own. Therefore, if the goal is to test the AWT itself, JavaStar cannot be used. TestwareDibbell's group is careful to emphasize their own distinction between test tools and testware. "For us," says Creger, "testware encompasses specific test tools we may write, as well as test suites that are based upon assertions." With the ever-increasing complexity of today's software, test tools that generate random inputs, are proving to be of greater and greater importance to Dibbell's group. "It typically gives us more bang for our buck," says Creger. "It's more like a test case generator, and has come up with things that we wouldn't have ever thought about. With these tools, we're not as dependent on a human being having to sit down and think up all possible test cases." What to UseWhere, and WhenEach type of test tool has its own specific strengths, and is best suited to a particular phase of software development:
"You generally start off doing your assertion-level testing early, as you write your code," says Dibbell, "to make sure that it's logically consistent with the specification." PersonalJava, for example, has it's own conformance compatibility kit (PJCK), which is a JavaTest framework used to ensure that a given PersonalJava implementation matches the official specification. But Creger and Dibbell point out that such conformance testing is far from complete. "The PJCK is necessary but not sufficient," says Dibbell. "That's simple assertion testing," adds Creger. "It makes sure that all of the input and output match the specification. While that's useful as a tool for conformance testing, it doesn't test how the product behaves under stress, or if a feature actually functions properly." Once individual software components have been written, they can then be compiled together for random, API-level testing. "You specify which classes you want to test," Dibbell says, "and start generating random events. And when your code is stable in that mode, you can then take it to the next level and plug and play with different features." Once the general functionality is firmly in place, stress-level testing can then begin. "You start pushing on it at that point," says Dibbell, "dealing with stress and scalability issues." Meanwhile, at each stage along the way, code-coverage analysis ensures that the testing is properly directed and sufficiently comprehensive. "It's something we use at every stage in order to properly focus our efforts," Dibbell says. Testing Issues in the World of Java TechnologyTesting in the world of Java technology offers many unique and diverse challenges. Today's Java SQA engineers are not only faced with multiple platforms, but also the scalability issues (limited memory, nonstandard input, network traffic considerations) encountered with an increasing array of Java technology-based computational devicesfrom Java Rings, to telephones, to set-top boxes. "Because we're dealing with many different platforms, and many different types of devices," says Creger, "all our tests are written in the Java programming language." But sometimes the old ways of testing simply no longer apply. "In some of the projects I work on," says Creger, "there's no UI to speak ofso we're definitely not talking about people banging on keyboards." Meanwhile, memory issues also come into play in many non-PC devices. "With Java TV," she says, "we run on set-top boxes, and there, you have to be very careful in terms of the memory footprint of your testing tools, because you don't want to be working against Java and the OS." Dibbell's SQA engineers literally never know what will be thrown at them next. "When we were first working on the Java TV project," says Creger, "we were running on PCs, but we were using Windows CEand it was a special flavor used for TV, which had some rather unique characteristics. For example, it didn't have the concept of a file system. That's a situation where we had to find some very inventive workarounds in terms of porting our tools to a new platform!" Effective cross-platform testing also entails making sure to vary the hardware testbed. "When we moved to the set-top boxes, there were several different hardware configurations," says Creger. "So every two to three months, we changed boxes, and upgraded." And finally, when dealing with consumer-level products and devices, there's simply a much higher standard of reliability to be adhered to. "Most consumers simply aren't going to stand for the idea of rebooting their televisions," laughs Dibbell. ConclusionAs the Java programming language has become more complex, sophisticated, and full-featured, spanning across diverse platforms and devices, so too have the tools and facilities necessary to effectively test these systems. Gone are the days of testing labs filled with test-script-laden keyboard bangers. Today's Java SQA engineers are employing the same levels of automation and sophistication found in the very code they're testing. "At each feature level, we look at what kind of tool is required," says Creger. "If it doesn't exist, we build it. 95% of our tests are now automated."
About the AuthorSteve Meloan, frequent contributor to the JDC, is a writer, journalist, and former software developer. His work has appeared in Wired, Rolling Stone, BUZZ, San Francisco Examiner, San Francisco Weekly, ZDTV's "The Site," and American Cybercast's "The Pyramid." | ||||||||
|
| ||||||||||||