Sunday, September 11, 2011

Synergic Exploratory Testing … Part 1

I don't know about you but I hate "Bug Bashes". You know those large gatherings of Developers, Testers and Program Managers in a Conference Room smelling of cheese and beer where everyone's banging on a software product trying to find bugs for 3 hours. I can safely say that I've never seen a successful Bug Bash (but if you have I'd love to hear your story). I've been pulled into several of them, some where I knew the product very well and some where I had no clue about what I was testing. But without fail, I've come out frustrated and asking myself why I ever went in there and wasted all that time. And if for some untoward reason I was the appointed as the organizer, I'd be sleep deprived for days trying to consume the mostly useless data that resulted from it and make sense out of it. The people in those bug bashes were almost always very smart engineers, but the process was so wrecked that it didn't matter how many smart people we had. 

My gripes with Bug Bashes:

  • They are too long. Testing activities going beyond 60 or 90 minutes are threatened by the possibility of attendees losing focus (given they had decided on one before they started).
  • They are a spaghetti bowl of redundant activities. It is very hard to keep several people from doing the same thing. It is not impossible, but definitely very hard and in my opinion a problem not worth solving. There's a bigger issue here.
  • Their results are hard to measure and track. At the end of a Bug Bash, there are hoards of repeated reports and no way of measuring test coverage.
  • They have no conclusion. There is no way to tell what the next steps should be.
  • They have too many people in them and even if they are smart they are "de-skilled" by the inadequacy of the process.
  • They are ad-hoc. Careless and sloppy.

There is one thing I like about them though. Pizza. I'd like to mention here that Bug Bashes are not Joint Testing Sessions that comprise of a smaller set of testers working on very specific goals and not stepping on each others' faces. Nor are they survey sessions where a diverse group of users is asked to play with a mostly complete and sufficiently tested product to get feedback.

Bashing Bugs with Scripted Testing: one way to keep me from whining about Bug Bashes ... maybe

I will admit, there is at least one way of correcting the ad hoc nature of a Bug Bash and that is having Scripted Tests distributed to attendees. That way we can make sure that there is minimal redundancy, results can be tracked and future steps are clear. There is a tiny assumption here. Okay it's not that tiny. The assumption is that the scripts provided are accurate and provide ample test coverage. That's VERY hard to achieve in practice. Dependence on weak scripts isn't useless, it's dangerous. But even if we had perfect scripts, there are issues. First and foremost, the whole thing becomes a fundamentally mindless activity and smart engineers do not like that. They want to be challenged and they want to feel important. But more than that, we are completely ignoring the creative capacity of folks present in the room. Secondly the effort spent in writing scripts and organizing them in a way that attendees can individually execute them on their own in practice is HUGE.

All problems and no solutions?

It's easy to state problems without solutions. Agreed. But I'm not done yet. I have a proposal that I'd like to share with you in Part 2 of this post. Stay tuned.

Monday, August 29, 2011

Testing what matters

In the past few years I've been very focused on trying to learn what the expressions "testing the right product" and "testing the product right" really entail in a practical scenario - and am still in the process. On August 7, 2011 I wrote a post on one approach of ensuring that you're testing the right product by increasing understanding through modeling. Today, I wanted to share my thoughts on one aspect of "testing the product right". The activities that this 4-word phrase could potentially involve are probably innumerable but one relatively simple and rounded definition that makes a lot of sense to me goes something like this: Test everything that matters most in the least possible time and resources and discard or delay what matters the least.

The problem of what matters most and least isn't just a prioritization problem, but let's begin with a discussion of prioritization anyway and leave the rest for later.

You could collect statistics and empirical data about a billion things but none of that information might be sufficient to help you make the decisions that keep you up at night. So it is imperative that we first learn what we are trying to decide first. But usually our purpose is to know whether or not we are ready to ship and if the quality of the product is at par with what the expectations from it are.

James Whittaker, in his presentation on Turning Quality on Its Head, talks about focusing on JUST one number. A number that quantifies the risk the product is at. The only activity worth a tester's time, in his opinion, is one that brings that number down. I agree. I'm just not sure how easy it is to assign a number to risk and be confident about it. There is a lot of unknowns that keep us from having a number that accurately quantifies the absolute risk so for now let's just try to look at the "relative risk" of a feature. In his talk, the calculation of risk is not discussed so I'm assuming it's part of his on-going research and may just not be disclosed yet.

Three possible dimensions of risk: Impact of failure, Likelihood of failure, Frequency of Use

We talk about risk because it is in some sense directly proportional to priority. If there is a high risk feature, it needs to be tested first so that if we run out of time we have already addressed our biggest fears about the product. Impact of a feature's failure, its likelihood and the frequency with which it is used are metrics that help us determine it's risk. The terms "likelihood of failure " and "frequency of use" are pretty much self explanatory. "Impact of failure" really is about the pain that a customer will have to go through if that feature fails to perform as expected.

If each of the three values for a certain feature of our product are roughly plotted in the 3D graph in Fig. 1, we can get an idea of its relative risk. The red dot, in theory, denotes the maximum risk that a feature could be at.



As an exercise let's assume we have a Personal Finance Management Software to test. One of the features in our product is the End-of-Year Report Generation function, let's call it A. Another is the Retrieve Last 10 Transactions operation, this is our feature B.

Feature A has a high impact of failure say 8 on a scale of 1-10. An incorrect annual report may lead the customer to incorrect long-term plans. It's likelihood of failure is also high say 8, based on the assumptions that it interacts with multiple components, performs a large number of operations and operates on a lot of environment input that is not directly provided to the software and has to be inferred using an algorithm. It's frequency of use however is low, just once a year so let's say it is 3.

Feature B has a moderate impact of failure so let's say it is 5. It's likelihood of failure is low, say 3, because it does not involve merging of data and merely reads the results returns by an API provided by your bank's online service. Its frequency of operation is high, say 8, because historically we've seen that it's the most frequently used operation and is used multiple times a day.

Which feature is more risky and needs to be attacked first? Not too hard to tell now.

It's not just prioritization

The question of how long will it take for you to test this product is inherently flawed. Michael Bolton once tweeted in a reply to a question I asked:  [The question "how long will it take to test feature X" is inherently flawed] in the same sense that the question "How long will it take you to learn Japanese?" is flawed. We need to let the available resources , in this case time, guide our testing strategy rather than trying to come up with fictional numbers that supposedly represent estimates and make time for it. That means that we need to set our testing up in a way that if at any point we are told that our time is up, the activities that we have performed are the most important ones. Each and every one of them.

The approach of prioritization described above, in theory, is very appealing. You can list your product's features or capabilities and analyze them for their risk values and those risk values could be transformed into priorities with a few additional modifications made to them. But here's the thing, using this approach to determine the priority of everything is like using a foot ruler to measure the length of all physical objects. It's very effective in measuring say the length of a ball pen but very inefficient in trying to measure the ball on the tip of the pen - it just isn't the right tool. What I'm getting at is we don't just need to prioritize things, we need to posteriorize them too. That is our tool to help us delay activities that are of little value to us. Things that are prioritized are done earlier and more of. Things that are posteriorized on the other hand are done less of and later, if time permits.

Why is that so important you may ask? If they matter less, they'll never make the cut right? Turns out that's not so. Human psychology plays an important role here. If you haven't read Brian Tracy's masterpiece on Time Management "Eat That Frog" I would strongly recommend that you do. It has a plethora of actionable instructions about how to manage your time effectively but I'm mentioning it right now because the concept that I just explained was borrowed from his book and is called "Creative Procrastination". We have a tendency of doing easier tasks first whether they are important or not and procrastinate on the more difficult ones. They give us a false sense of achievement but keep us from achieving our real goals. The trick is to deliberately posteriorize them so they are intentionally delayed in your schedule.

Notice that we are not eliminating them. And why is that? The answer is worth spending some brain cycles on. There's a difference between test points that matter less and test points that are irrelevant. Something that is irrelevant hopefully never appears in your list of things to prioritize or posteriorize. Being able to run the Personal Finance software on a Microwave Oven is an irrelevant test point. Being able to run it on a less popular operating system on your PC, might matter less but is not irrelevant. Such tests should be posteriorized.



Fig. 2 aims to illustrate the two concepts. Before prioritization the perceived importance of all test activities (represented by red and blue dots) is the same. We prioritize to change that but the capabilities of the tools we use is somewhat limited in that they are great at bubbling up what matters most but not necessarily too good at pushing down what matters less. So those less important activities tend to stick around and are not perceived to be as unimportant as they really are. As a consequence they might still end up in our daily tasks. After posteriorization, the perceived importance of the activities denoted by blue dots is drastically reduced and until and unless we're done with the most critical activities we will not attack them.

Here are a few questions (not exhaustive), answers to which help us guide us in posteriorizing activities:

Is this activity a pre-requisite for any of the subsequent activities that I am required to perform?
If this feature never gets tested will we delay our ship date?
Does this activity require specialized skill to perform?
Is this activity a nice to have but not performing it will have no consequences?

One key thing to remember is these activities might not be all done just once. They could be repeated and perceived importance could change over time. Another of Brian Tracy's suggestions that could help us revisit and rearrange perceived importance of test activities is the Zero-Based Thinking strategy. Ask yourself the question: "Knowing what I know now, would I still make the same decision?" If not, time to rearrange.

Summary

Making sure that we are spending our time and effort on the right things is critical. In addition to that we must also ensure that things are less important are not taking our attention away until and unless we have addressed more critical issues first. In this post, we talked about how these two are not the same. We looked at a popular method of prioritizing test activities. They could be testing a feature or a capability or any other activity that we have listed as a to-do during our planning. We discussed the prioritization is great but we need to augment this with posteriorization, a heuristic based strategy to deliberately delay any activities that may be relevant but less important. That gives us a two-prong strategy of focusing on the most important activities.





Thursday, August 18, 2011

Testing without specs: Unblurring the fuzzy corners of Perception-based Design

Got spec?

Yea, me neither.

We all have our favorite flavors of "Agile" (and ice cream). Our fondness for whiteboards, sticky notes, wikis and one-pagers continues to increase as we immerse ourselves in this new paradigm. Over time we have become more willing to incorporate change in our design and code. One of the first things we do in the mornings is our 15 minute stand up meeting. We are getting more comfortable with not having a lot of upfront planning. Actually, we're almost scared of doing too much planning, thinking that we might lose our valuable time if we are required to adapt to something later on. This new approach has at many instances proven to be our vehicle to many fast-paced yet high-quality customer focused software projects.

Here's what my office monitor looked like a few months ago:



But there are challenges, especially for testers. One of the big ones being: testing without requirements. For waterfall testers or brand new ones, it can be a nightmarish task. But agilists aren't huge fans of comprehensive documentation, their argument against it is that it keeps the team from producing actual work. Where actual work equals working code. The argument is valid for the most part. Documentation does tend to erode over time and is costly to maintain. This is however, unfortunately, sometimes used as an excuse to completely get rid of documentation. Simply abandoning formal planning doesn't make a team Agile. When formal documentation is completely ignored, design and implementation is based more on individual perceptions. Implementation could then really be a manifestation of what the developer thinks the product should be. That is scary.

Any solutions? We could go back to doing minimal documentation to help testers do their job. Maybe a list of capabilities that the team decided to enable in the software and have the testers verify them. But that's not testing, that's checking. If you're in a true Agile team the developer has probably already done that. The real problem here isn't so much with what has been determined as a capability or feature but really what hasn't been determined, omitted or incorrectly assumed as true or false in an attempt to be terse and nimble. How does one uncover those fuzzy corners?

Well, let's start with getting our hands on some information that is relatively easier to get.

12-minute Stakeholder Interviews

Depending on whether a stakeholder is a chicken or a pig, they will either be able to make a claim or have an expectation. A developer or designer for example has the capability to make a statement of how the product will behave or look like because they are writing the code or designing the UI. Similarly a customer has the right to make a demand and have an expectation. The purpose of a Stakeholder interview is to discover what those claims or expectations are. Don't worry we're not going to stop there. There are a few tools that will help us clear the haze that surrounds these statements.

The idea is to keep these interviews short and specific. They can be verbal and pre-planned, but should be changed on the fly if need be. One thing to remember is: specific questions will yield specific answers. To get most value it's a good idea to include qualifiers like "most important", "worse possible in your opinion", "5 best", "top most" etc. They will guide the stakeholder in their thought process. We're assuming that we are not committing to too much work per iteration, hence a 12-minute interview will be sufficient. General test sources can be used as mental triggers to design Stakeholder interviews e.g. questions around input data, record handling, file processing, security, performance etc.

Here are a few sample questions:

1. Can you give me a three examples of input data and corresponding output or resulting behavior?

2. What is the most frequently used feature of this product going to be?

3. Does timing matter for capability X?

4. If error condition Y occurs what would happen?

5. What is the maximum time that Operation M can take?

6. Which feature are you most nervous about/is most stable?

The questions are designed to be somewhat tough and intend to make the stakeholder think and make claims (or state expectations). It is imperative, though, that our stakeholder remains comfortable. We are not trying to grill them, we are learning about their perceptions. It is possible that some questions themselves will make the stakeholder take a step back and rethink a claim they had made earlier.
To decide whether a question is a candidate for a 12-minute Stakeholder interview question ensure that:

1. the question is seeking to get a statement claiming something or expressing an expectation.

2. the question is seeking to get a measurable statistic about the product. Something you can "check" (see above).

Questions about environment setup or background and motivation for building a feature should not be included in Stakeholder interviews. Once we have interviewed a few stakeholders, we can move to the next step i.e. poking holes in their claims and expectations.

The Agile Philosophy: Individuals and interactions over processes and tools

Empowering individuals is a superb idea. Something to be careful about, however, is the fact that individuals are inconsistent and not perfect, not to say that processes and tools are. They are only as perfect as the ones who developed them. We often make arguments that are fallacious. I'm sure there are a few statements in this post that fall perfectly into a certain category of informal fallacies as well. Our goal however is not to prove that someone's claim or expectation is wrong or fallacious, moreover it is to point out that there is some missing information or test case that we need to figure out and validate.

We, the software testers, couldn't be luckier. We appear to be tied very tightly to software and technology yet we have the luxury of employing learnings from other fields of art and science to reach our goals. Actually you could argue we work as much or more with people as we do with software. Today our saviors are scholars of Logic. How? Well, they have given us frameworks to detect fallacies in claims, arguments and general statements. Sometimes without even understanding the context of an argument and based purely on their structure. And we are going to use them to illuminate the shady borders of software.

A lot of what I'll be talking about in the remainder of this post really comes from the study of Logic. For some motivation, here's an example of a fallacy:

If it's raining, then the streets are wet.
It isn't raining.
Therefore, the streets aren't wet.

(from http://fallacyfiles.org/examples.html)

This fallacy is called "Fallacy of Propositional Logic". In this particular example, the fallacy is rather obvious but here's a real-world (okay I had to remove some specifics) example of the same:

Excerpt from a One-Pager: If the Global Validation Mode is on, Local Validation Mechanisms are effectively ignored.

From the troubleshooting guide: Turn Global Validation Mode off to make sure Local Validation Mechanisms are invoked. (Wait it's already off!!!)

Incorrect Assumption: The only way Local Validation could be bypassed is by turning Global Validation Mode on. (Turns out it isn't so)

A few ways to attack ambiguity

A great resource on fallacies are the "Fallacy Files". If nothing they're just fun to read. But for today let's talk about how to apply what we know from Logic to the claims and statements that we collected about our product. Here are just a few heuristics derived from fallacies: (If you're looking for them, there's a good chance you'll find them)

1. The Dangling Else: This one is a classic. Ex: The system will do operation X when A happens. Sooo … what if A doesn't happen? Is A the only possible thing that could happen?

2. Ambiguity of Reference: Ex: If condition A is true, the default behavior will kick in. And what's the "default behavior" again? Is it well defined?

3. Scope of reference: Ex: For all other cases, the system will ignore the input. Is there a finite set of "other cases"? What makes us so confident about this?

4. Omissions: Important details left out.
   • Causes without effects
   • Effects without causes

5. Random Organization
   • Mixed causes and effects: Ex: If A and B happen, C and D will occur. So do A and B need to happen    together for C and D to happen? Is C a result of A, or B or both.
   • Random case sequence

 6. Ambiguous precedence relationships: what comes first? Is that always the case?

 7. Implicit cases: Ex: The system takes two inputs A and B. Are there any environmental inputs like a path variable or a windows registry entry?

 8. Temporal Ambiguity: Does the statement hold true at all times? If we don't know, we have a case of Temporal Ambiguity.

 9. Boundary Ambiguity: Where does the influence of one feature end and where does that of the other begin?

You get the idea. There are a few others that can be listed here but I'll leave it up to you guys to think of more. And if you can I'd love to hear about them.

Summary

Let's do a quick recap. Rapid software development necessitates the abandonment of a lot of formal documentation techniques like writing Requirements Specifications or creating UML artifacts that we may be used to. Sometimes documentation is just verbose and hence not very useful but when done right it could prove to be vital for recording claims, responsibilities, edge-cases etc. It could also act as a forcing function for teams to think about things they wouldn't have otherwise. But in any case, it takes time away from writing and testing code. So instead of fighting for more or better documentation, we talked about changing our approach a bit and trying to use our own way to do all the good that documentation can do without doing extensive documentation. We tried to understand stakeholder perceptions by interviewing them and collecting their claims and expectations in one place. Then we looked at a few heuristics that we borrowed from Logicians to poke holes into these statements and find out what's missing. By doing that we're not just verifying claims but actually unblurring the fuzzy corners that no one had seen or realized existed. A thing to note here is that these methods can also be used to improve existing documentation.

If you have encountered similar issues and solved them in your teams, I would love to know. If you can improve my approach I'd be obliged. Till next time dear readers, sayonara!

Sunday, August 7, 2011

Modeling Software for Wider Understanding and Communication: An essential (yet usually missing) ingredient in the discussion of Test Planning



Hello there! What a great time to be part of this amazing community of testers! There's passion in the air. Many people around you strongly believe in the value of Testing now. Some call it an art, some call it a science. There are carefully crafted linguistically delighting definitions of Testing all over the place. Numerous discussions popping up day in and day out on all of your favorite social hangouts about what the best way of testing something is. Inspirational talks being given about how the toughest quality demons were approached and beaten to the knees. Yes, it is all magical and enchanting and you're loving every bit of it. There is also no shortage of smart and useful techniques that you can adopt today to root out those pesky defects and bugs. Beautiful! Now where's my peachy keen stamp? But wait, here's the 234 billion dollar question, where do you start? From all of those one hundred and fifty six testing techniques that you read in that splendid book: "The Best Testing Stuff You'll Ever Read" on your Kindle (iPad?) which one do you apply first? Where do you begin the journey?

I don't claim to be addressing this for the first time, oh no I wouldn't dare. I just have an issue. It is this: the new college hire Jane Q. Tester comes in and she's given the exciting feature X to test. She's introduced to the team. She's given a few papers and wiki links on Exploratory Testing, Test tours, Oracles and Risk-based Testing. And she's asked to change the world. When she asks where she should start she's given a test plan template. She fills it up as if it has all the right questions that need to be answered, bumps into a few walls and eventually finds SOMETHING that looks like a starting point, but that works too right? Well yeah ... but there's a more systematic way of doing this, promise.

The problem of not having a clear starting point has crossed my mind more than once. It's usually when I'm done reading a book on Software Testing full of great advice and techniques that seem very appealing but leave me wondering if I know where or what to begin with. This happened most recently, when I was doing some reading on test tours that is in theory the solution to this very issue, and a brilliant concept. It is essentially a heuristic-based methodology that lets you map tourism to testing and helps trigger your brain by deriving similarities between testing and touring. A quick example is the Landmark Tour: which guides you through your testing by letting you believe that the system under test (SUT) is a city and all its main features are landmarks. Just as you would visit landmarks, you would visit the main features of the SUT. This ALMOST gives you a plan.

Why would I say ALMOST? Well in my mind before you start touring, you need a map. A guide is good but a map is best - if you like more control. Else you won't be touring, you'd be wandering - you need to know WHERE the landmarks are, or at least WHAT they are. The map also tells you how big the city (or SUT) that you are touring is. You could tour on foot, by car, in a bus or in a plane. The difference? When you're traveling by plane you cover more space in less time but you really have only a 12,000 foot view (it's a private plane). When you're on foot you're more likely to get a more up close feel of things but it would take you ages to cover a big city - keep in mind that may not always be what you want. By car, you're somewhere in the middle. You could mix all of them. But I digress. So a map and a vehicle are essential ingredients of a successful tour. A software model is a map of the software and it's depth i.e. the level of detail it conveys about the software governs whether you'll be flying in an airplane over it, driving in a car through it or walking around it admiring all of its little details and nuances. Here are a few maps - they are all maps but they give you different kinds of information at different levels:


You get the idea. Let's talk more about models, after all that's what this post is supposed to be about.


Lessons from Systems Thinking

Systems Thinking, without going into much formal detail, encourages us to study behavior of a system by having a holistic view as opposed to coming up with extrapolations about the system based on our understanding of its sub parts. Here a "system" is anything comprising of multiple components that have dependencies on each other - software is a great candidate for a system. In reality, the only complete system of course is the Universe, everything else is a subsystem. So drawing the line is always going to be a challenge but hopefully not too much.

Systems Thinking teaches us that behavior of individual components can not be understood in isolation because there are other factors influencing that behavior. A specific strain X of parasites affecting the produce negatively could be completely wiped off by a powerful pesticide but this may give rise to another more harmful breed of insects controlled by the parasite X and cause more damage than before. My point? Having the larger picture is CRITICAL.

Also, systems fall into categories. Systems within the same category have similar characteristics and fall prey to similar defects. You realize what I'm getting to right? If you can figure out the category, you can look for patterns and design more targeted tests. Man that is a awesome! As a tester I get goosebumps just thinking about this. But before we get there we need a quick lesson in Software Architecture.

Models

I re-read a gem of a book sometime last year which I had managed to keep with me for a while but never fully explore. It's called Software Architecture: Foundations, Theory, and Practice and is used quite frequently in graduate level Software Architecture courses. I highly recommend you read it: one, to go back to the basics and refresh your concepts about Software Architecture and two, to understand what goes on behind any kind of software. Chapter 6 of this book, Modeling is an amazing primer on techniques used to model software when its in the conception phase. But nothing is stopping you from using those techniques to increase your understanding of something that is already built or partially built. Let me highlight a few of those for you here.

Modeling techniques roughly have three properties. Thoroughness, rigidity and ease of use. Thoroughness is the level of detail, models created using it convey. Rigidity is the level of formality of a technique i.e. number of rules that oversee the "validity" of a model and ease of use is, well, how easy or hard is it to apply. For our purpose, we do not care too much about being overly formal. Our models, at least for now, won't be evaluated by a software and don't need to follow standards too precisely but rigidity does have advantages. The biggest being the fact that it makes communication easier. You speak English because you know I will understand it. When you're writing your diary you could use your secret language that you came up with in 5th grade. What, you didn't? Okay maybe then I'm the only weird one here.

  1. Natural Language

Natural Language could be thorough or not thorough. The rigidity is minimal and ease of use is maximum. But it's obvious where the tradeoffs are. Initial specs are usually written in Natural Language and hence are full of holes but they are nevertheless an effective modeling technique. A quick sample:

The job of feature X is ________.
The job of feature Y is ________.
The job of feature Z is ________.

Simple? Yes. Useful. Sure. Sufficient? Depends on your purpose. I would say for most projects, no. It's hard to illustrate complex interactions between multiple components without being verbose and there is danger of losing meat by being terse. Yet, it is a great starting point and one that is used most frequently.

  1. Powerpoint Style/Whiteboard Modeling

Similar to natural language, powerpoint style modeling has minimal rigidity and high ease of use. But is slightly more thorough. It's visual nature makes interactions and dependencies easier to show. Here's a screenshot from my notebook, I was just too lazy to use powerpoint.

Here, boxes are components and arrows are invocations.

  1. UML

Here rigidity is high but so is ease of use. There are several UML diagrams that can be employed for effective modeling. Component diagrams like the one below shows a dependency which could be of any form e.g. an invocation, a reference or a compilation dependency. "Stereotypes" (labels on arrows) can be used to specify what kind of dependency this is.


Sequence diagrams, among other UML modeling techniques are used to illustrate temporal interactions between components.

  1. Darwin's Modeling Technqiue

In Darwin's Modeling, components are represented as having two properties. Provisions and Requirements. E.g.

component DataModel
{
provide DataPoints;
}

Component Calculation
{
requires DataPoints;
provides MinimumDistance;
}

Here thoroughness and rigidity is high. Ease of use might be slightly low. And this is well suited for conveying structural information about the software.

I would encourage you to read detailed explanations of the above from the book mentioned as this is only a brief account. Artifacts built by developers and program managers can definitely be used as a starting point for a model that a tester can use. But sometimes redoing something that exists gives you a much better understanding than just going over it. It is also important to realize that ALL models no matter how detailed you try to make them will be flawed. They will have errors, ambiguities and abstractions. They are not going to EVER replace playing with the software itself, but they will give you a starting point. And that was the only promise I made.

Summary

All kinds of testing are based on models, so in essence all testing is model-based. It's interesting that Model-Based Testing has a very specific meaning (I wish I could rename it to State Machine Based Testing but I'm going to stop here). They may not end up being documented but they are there, in your head at least. If the model you are following is inaccurate chances are you would end up testing the wrong thing. Modeling helps us increase our understanding of the subject matter and get a holistic picture, a map, that we can then use to move around.

In this blog post we recalled that models created using different techniques like maps have distinct properties. Knowledge of what we are trying to achieve from our experiments helps us decide what kind of modeling technique we need to use. We visited the following modeling techniques:

Natural Language
Powerpoint Style
UML
Darwin's Technique

This was only an introduction and there are several others worth checking out.

My goal for this blog post, more than talking about well documented modeling techniques was to raise a point. The fact that without a map and a vehicle, exploration is aimless and misguided. Also, that we will always have a starting point and that is not very hard to uncover. We have at our disposal techniques that may not be designed specifically for testing but we are going to be lazy and we are going to use them to get our job done quicker and better because we're like that and we love it.

Hope this was of value! Stay tuned for more.

Thursday, July 28, 2011

Software Testing, how they do it at NASA

Last week I was having a discussion with a colleague who works as a developer in the MSN team about why testing continues to be in its infancy after all these years and testers even after putting in hours and hours of hard work do not "deliver to expectations". Why is that, our Developer and Program Manager counterparts are sometimes unable to see where exactly the testers bring value and ultimately give them the respect everyone yearns for. Why have some companies or teams deemed the entire Test discipline useless and started focusing on the so-called 'moving Quality upstream' metaphor i.e. letting the Devs do it. My blog post isn't about how to get that respect, I'm still working on that one but it sort of lays the backdrop for the rest of my story. Following that discussion my colleague asked me a question: "How do they do it at NASA?". Why NASA? Because we assume that NASA is somewhere mediocrity is not affordable and definitely not an organization that would waste resources. The question was interesting and I thought to myself that I should definitely look into it, but then I went back to my desk and just got back to work. A couple of days later I was having another discussion with my soon-to-be Test Manager along the same lines and he mentioned to me that we should really try to learn from others and see how we can improve ourselves and then he said something that made me say, Ok I get it! He told me we should see how NASA does it. I took it as a sign, took a deep breath and set out on my mission. It was slightly less dramatic in reality though. Ok, way less dramatic.

I began my search for NASA and Software Testing and the first thing I ran into was a chapter called "Basic Vista Trouble Shooting" from the book "Microsoft Windows Vista Help Desk" with a huge headline reading "If Microsoft Was NASA". I was intrigued, oh boy this is going to be interesting - I have the answer right here in this unheard of book (at least by me), I've discovered a gem. I hadn't. To my embarrassment the text following that headline read "if Microsoft was NASA, we’d have a lot more dead astronauts." That was it. I ditched the book and I wanted to see if I could find out what it is that they're doing over there.

Anyhow, after looking around a bit in the internal Microsoft library website (which I absolutely love by the way) I got my hands on an IEEE paper called 'Software quality assurance engineering at NASA' from 2003 written by Linda Rosenberg and Albert Gallo, who at least in 2003 worked at the Goddard Space Flight Center in Greenbelt, MD. The title suggested that the content was fairly high-level and not tied to a specific domain and after reading the abstract I gathered this was something that could help me. The paper talks about the philosophy that Quality Assurance engineers adhere to at NASA and a brief account of what the general process is to test anything. It starts with what is expected of QA Engineers, mainly domain knowledge of the piece of software they are testing, familiarity with standards and limitations of software in the context of how it would be used by say a satellite or a system that stays on the ground, ability to match requirements with implementation etc. Fairly standard stuff. Then came the real meat. You might already be doing most of what I'm about to talk about in your day-to-day activities, in fact I'm sure you are - if you are a tester in some capacity - but are you doing it consciously? Because if you're not, your "skill" is not measurable or transferable or describable or teachable. And we all know why that is so important.

How they approach "Quality"

There is no dearth of deep and intense philosophical debates about what Quality really means and whether or not you can ever "assure" it. The approach taken by these guys however is coming to an agreement on what Quality is, and then assuring that the software has whatever that is. Their line of attack is to first admit that Quality is not absolute, and criteria for it are not independent but also realize that until there is some way to measure it, we won't be getting anywhere with our testing. So the idea is to assign measurable metrics to a pre-determined set of criteria. Since the context for a software test project in NASA is well-defined I think it's a fair tactic.

The criteria are derived from a Quality Model named "McCall Software Quality Checklist" (I found a decent explanation of the model here) and to each criterion a measurable metric is assigned. To determine whether or not a criterion is met, the tester can examine the metrics. It is clearly stated that not all criteria are applicable to all domains and compromises are going to be part of the game. For example, portability for a software running on a satellite and not meant to be plugged into any other component is not worth spending time on, but reliability definitely is.

This reminds me of one of James Whittaker's talks 'Turning Quality on its Head' where he suggests that instead of testing features, we should be moving on to testing capabilities i.e. things that a product can do. Makes perfect sense in theory, it definitely would make life easier. He mentioned that the first response from anyone would be "there are so many capabilities" but there aren't, according to him. The upside of doing that, if we assume that is possible, is that whenever there's a regression related to a capability you know exactly which tests to rerun. I am unsure what the granularity of a capability would be and I felt that the approach was a bit too theoretical but I'm sure James has kept some details from us and there are answers to these questions.

Responsibilities of a Software Quality Assurance Engineer




Most of the times, it is not understood what the real job of a tester is. I'm ashamed to say more often than not, even testers themselves don't fully realize what it is that they're paid for. The one thing you'll hear when you ask a Tester "What is your primary responsibility", is "finding bugs". Sure that's ONE of the things. But that's not all. I think James Bach and the other gurus of the context-driven school of testing are spot on when they say that the job of a tester is to provide empirical evidence about a software product, that could be bugs or just plain information e.g. behaviors, response to certain conditions, things that help stake holders decide whether or not the product is ready to be shipped or released or deployed.

I was glad to read a whole section about responsibilities in the paper. In the NASA world the main responsibilities of QA engineers, as told by the authors, was to provide feedback on the processes that guide requirement analysis, design, implementation and testing. The feedback would be two-fold: are the processes being followed right and is following them really improving quality. They talk about how they had enforced the application of ISO standards at some point but later realized that adhering very strictly to a standardization hindered their ability to improve on them. I think it's safe to assume that ALL processes are faulty and need to be tweaked a bit here and there for different projects. I loved the phrase they used about the downside for following ISO blindly: "fossilizing procedures rather than encourage process improvement".


Somewhat linked to this, the authors also talked about moving away from standards created by NASA towards commercial standards to achieve industry standardization and have Testers play a major role in that larger effort. The bigger the body of people having a stake in something, the better chances there are of it improving. Or is that really so? I'll leave it at that.

So finding defects was not the main goal - but a byproduct. Well, that was a learning. In addition to providing feedback, QA engineers are also responsible for providing empirical data about the software. Measurable, quantitative information that can be used to make decisions about where the software stands.

Level of Involvement of Testers in the Various Phases of Development

Okay this one might not be applicable to everyone given that over the past few years all of our products have been deeply penetrated by Agilists pushing for less documentation and code having to speak for itself. Don't get me wrong, I agree with the philosophy - I just think, as do others, sometimes it's used as an excuse to spend little time on design and then waste cycles later on in improving a shoddy piece of software that resembles a bowl of spaghetti.

Regardless, it appears that in 2003, NASA software products were following the typical Waterfall lifecycle and maybe they still do - I don't know. The authors appear to be advocates of Testers being involved in ALL phases of the lifecycle but their activities vary slightly from phase to phase.

In what they call the Concept Phase, they urge Testers to generate concept of operations and scenarios, attend reviews and general risk reports. I would equate this to the Design Phase in a typical software company.

During the Requirements Phase, Testers are advised to actually participate in the Requirements Gathering and ensure that the Requirements are "Test Ready" i.e. have measureable and observable outcomes. They introduced something called the "Automated Requirement Measurement (ARM) Tool" which I liked so much I'll have a separate section for it.

In the Test and Implement Phases (which is generally thought of as the only phases that Testers should concern themselves with and consider the others phases as "Down-time" which can be used to build tools and invest in infrastructure. I love tools that allow me to be lazy and get my testing done faster. I just think we should have a strong justification for them, better than I like developing things or I use code to break code. First of all, I'll quote James Bach once again, you're not breaking anything, it's already broken - you just exposed it. Secondly, an apologetic tester really looking to be or pretending to be a developer is not going to get much done. All the time we're spending on doing something that does not reduce the risk of the product should be under scrutiny) Testers are required to manage change. Change is expensive - especially in a waterfall style development environment. How to reduce code churn and prioritize defects is the prime purpose of executing this phase. Is that what we usually do? We find bugs, that's great. But how much time do we spend, as testers, evaluating different solutions for defects and costing each of them. In addition to this Testers are expected to provide experimental data and suggest how far they are from release. It appeared to me that their QA engineers were expected to be Quality Police in some sense. I've never been comfortable with that concept, not because I don't like responsibility but because sharing responsibility of quality results in a better product.

In the Deployment Phase, Testers are expected to report on whether Acceptance Criteria determined early on are met or not. I hope to write a complete blog entry on this someday.

The Automated Requirement Measurement (ARM) Tool

Nowadays, we are hesitant to spend too much time on formal requirements documents. Because they erode fast in an agile world and time spent in maintaining documents that are not going to be used long-term isn't a great idea. But everyone's not working on Agile products and that's where this tool might be great. A few years ago when I was a tester in the .Net team, which had a release cycle of 2 years, I came in contact with a lot of well-written requirements documents. They were well-written but they were flawed. We came up with a framework of requirements analysis that helped us poke holes into these documents using heuristics like faulty assumptions, if-present-else-missing etc. As far as my understanding goes this tool does JUST that, only automatically. This tool, as the authors put it, doesn't ensure that you are writing the right requirements but rather that you're writing the requirements right.

The IV&V Process

IV&V stands for Independent Verification and Validation. In a nutshell, this process entails having a 3rd party that is technically, managerially and financially independent examine the product. The whole idea is to have someone who has no bias and emotional ties with the product examine it once you think its done. In a company like Microsoft, this could be simulated by having another feature team do this examination. Love this idea!

I would like to extend this a bit. A fair question would be, if you are the IV&V team what exactly does your examination involve. James Whittaker in his book about Exploratory Testing had a chapter on Test Tours that might help us a bit here. I am not sure who really came up with the metaphor "tours", I have reason to believe it was James Bach but I read about it in Whittaker's book so I'll talk about that. In that chapter, there is mention of a "Garbage Collector Tour". The term Garbage Collector is borrowed not from your CLR or JVM context but an actual Garbage Collector who visits every home, or office if he is responsible for that, in town and collects the garbage. What the Garbage Collector does is this: visit every home, get the trash and move without spending much time. Applying that concept or "tour" as James Whittaker calls it to software would mean moving really fast from feature to feature and getting some empirical data about each of them and maybe express your sentiments about what you like or dislike. The main point here is examining everything but being quick about it. You could also call it a sanity check but the fancy "tour" term makes it more fun.

Summary

Here's a quick recap of what we can learn from Software Testing processes at NASA:




  • Prepare a set of criteria and assign measurable metrics to each of them.
  • Provide feedback about processes being used, are they being implemented right? Are they actually improving quality.
  • Involve yourself in all phases of the development lifecycle, wear different hats but in all phases strive to achieve "test readiness" (measurability, observability) be it requirements gathering, design or development.
  • Have a group not involved with the product throughout the cycle review the product independently and gather their feedback for more insight.




The paper is almost a decade old, but there were certainly things to learn from it or at least revisit so we can internalize them as testers. As I mentioned earlier, most of these are intuitive but the key is to apply these techniques - whichever ones works for us in our specific context - consciously and then pass them on. Making them teachable and quantifiable is of utmost importance. That, I am certain, is the key to making testing a solid skill.




Stay tuned for some learnings from other great companies, people and fields in my following posts. It's going to be a fun ride.