What a deal! They’ll fix their own bugs “free of charge!”
Reading this advertisement made me realize how clever the software industry has become. Why bother fixing your product prior to shipment when you can sell it on the premise that you will fix the bugs “free of charge” when the users find them for you. Interestingly, anyone who bothered to read their licensing guide will find the following sobering caveat:
“…From an engineering point of view, it is impossible to fix bugs in multiple source code branches. If we would have to do this, we would never be able to implement a major redesign. Major redesigns are required now and then to be able to fix bugs and add features fast.”
Nothing communicates your attitude towards your users better than the way you handle exceptions and error messages. As soon as something goes wrong with your application the user is at a heightened emotional state and is the most impressionable. Some software products, including the leading market applications, have developed a bad reputations for having cryptic error messages that are impossible to resolve, leaving the user feeling helpless and outraged.
The worst offenders include fortune teller style messages that inform you (not without irony) that you are about to lose all of your work because the application has encountered an unknown problem and needs to shot down.
This is even more pronounced in the session-less environment of the internet. It seems that when it comes to web application reliability and robustness, we’ve been steadily taking a step backward in the way we treat our users.
Engineering and Failure Handling
A civil engineer designing a bridge will invest a significant amount of time and resources in predicting potential structural failure scenarios. Failure analysis and safety factoring (i.e. redundancy) are two important cornerstones of the engineering discipline. In the physical world of machines and structures, the ability to indentify a potential design flaw and remedy it is a given. Similarly, we should strive to achieve the same in the virtual software world by accounting for critical error conditions and developing robust application code capable of handling those cases.
Software engineering does have certain nuances that differ from classical engineering, which makes prioritization of work more arbitrary and less straightforward. For example, a small memory leak in a server component may be considered by the development team to be a critical bug, but a relatively small data validation problem that forces the user to retype a lengthy application could have a bigger user impact and rank higher on the bug fix priority.
A 12 Step Program for Error Rehabilitation
Making your application more agile in handling failure and enabling it degrade gracefully are not a single step processes and there is no silver bullet technology out there that will fix this problem. If you want to break the cycle of application instability and user frustration, you will have to dedicate time and your best technical talent to solving it. I have found that a phased approach works best. In this approach you first handle the low hanging fruits, (addressing the mechanics of the error handling), and than gradually move to higher ground (addressing automated problem resolution and preemptive countermeasures).
The following is my 4-phased program for working out a resolution to application errors. Classification is inclusive, so the 4th phase (the highest level of reliability) also includes the properties of the preceding levels:.
Phase-1: Create Unique and Traceable Errors and a way to Record them
If you are under the gun and don’t have time for any other remedy, at least make sure that your error cases are unique. Telling your users that an error has occurred in the application without providing details is a sign of an immature product. When your technical support team receives an error report, they should be able to determine precisely what is causing the problem.
Generic error handling (same message for all errors), or different error causes that return identical messages, are easy to implement, but when it comes to debugging they are useless. Unique error IDs allow us to more efficiently track bugs and translate them to a more stable product.
Error codes should be visible in the error messages but not be the focal point of the the message. You should develop a library of descriptive text that provides a human readable explanation of what the error means. Provide a simple mechanism to either log the message directly into your app or send it to you via email. Nothing is more annoying to the user than being asked to type in the error message manually.
Establish an Issue Tracking System that allows quick data entry and reporting. At the minimum record the error code, error description, and the steps to reproduce it, effected environments, and its frequency.
Phase-2: Keep the User Calm and his Data Safe
Error messages should always carry a mature and responsible tone. Always use supportive, polite language, like a good teacher would when instructing a pupil.
If the user opts to leave a mandatory field empty, or mistypes the data type (CC#, zip , etc.), don’t go ballistic. Non-critical errors deserve non critical messages. Instead, indicate on the entry form where the problem was, place the cursor in the relevant field and leave the rest of the data intact. This is especially important for long entry forms that require a lot of effort to complete.
Don’t force the user to duplicate entry of some previously supplied data for verification purposes (such as billing and shipping information) as it may introduce human error and trigger him to abandon the application altogether.
Phase-3: Good Errors Messages are Clear and Provide Remedies
The way the user perceives the error is much different from the way you do. He thinks in business terms and knows nothing about the inner workings of your application, nor does he care. That’s why you should always design the error UI from the user’s perspective.
Here are the seven golden attributes of error messages:
1. Describe the error in user terms and language
2. Instruct the user as to how to complete the task and resolve the error
3. Explain how to prevent the problem in the future
4. Avoid technical mumbo jumbo and acronyms
5. Avoid modal pop up error messages and instead write error directly to the page
6. Provide help links that better explain the nature of the error
7. Keep the text formatting simple and avoid bright colors and animations
When providing a solution, give clear step by step instructions as to how to fix the problem. Be specific and do not assume any pervious user knowledge. If there is a relevant tutorial or the specific solution in your on-line help, provide links directly there.
If it’s a critical problem—for example, the Website is not accessible—provide a mechanism for the user to report the problem to you and immediately acknowledge the receipt of his complaint, provide an explanation and an estimate of time before this problem will be resolved.
Phase-4: Handle Errors Internally
Write code to robustly handle all errors. This will eliminate the most severe and common errors (like missing data or validation). You can achieve this by automating data entry components from the user interaction (i.e. deriving city name from zip code).
To the extent possible, take corrective action before an error occurs. For example, if the user is in the middle of a lengthy entry form, save the contents as he moves between fields, this will allow you to restore the information if he inadvertently navigates off the page or even closes his browser session.
It’s often expensive to identify and address all possible failure cases, but if you have been tracking your top bugs, you can start with the biggest offenders first.
The way you handle and communicate application errors directly reflects on your team’s and your company’s reputation. When building a new or reworking existing functionality don’t assume that the old error messages apply to your new logic and boundaries. Building test cases around various error scenarios (missing data, wrong data, bad data, etc.) and dedicating a test cycle to generate all known error messages is also an excellent strategy.
Error handling and messages should be thought of as required phase of any feature development, and adequate engineering time for it should be budgeted into all SDLC estimates.
Real quality of service goes beyond just acknowledging your application’s faults. My rule of thumb is that there is no such thing as an “informative error message”. A good error is one that has been eliminated through error-handling code and through superior product design.
© Copyright 2010 Yaacov Apelbaum All Rights Reserved.