Planning for Failure

Posted on June 12, 2015
by Doug Klugh

Quality software is built around the expectation of failure. To deliver reliable software, you must always plan on things breaking. In designing and building software for critical systems, such as air traffic control or nuclear power plants, runtime reliability is absolutely critical. And while human life may not hang in the balance of a business application, the life of the business may.

However, exceptional situations do occur which are beyond our control, such as hardware failures. While we cannot predict when such situations may occur, we must always plan for them. Building reliable software requires anticipating problems and designing solutions to handle them.

Exceptions

An exception names an event that causes suspension of normal program execution. This event may be a system error (such as a numeric error or a buffer overflow) or it could be a process error (such as a customer being denied credit). Ideally, the program should be able to respond to the exception. In the case of a fatal error, such as a hardware failure, the program should degrade gracefully and continue running with reduced capability.

When an error occurs, raising (or throwing) an exception alerts the system of the condition. The response to that condition is called handling the exception. In the case of system errors, these exceptions are usually raised by the operating system or the runtime environment. Process exceptions are defined and raised within the application. Reliable software will always handle these exceptions before they fall through to the underlying environment.

Handling Exceptions

How we handle exceptions has a tremendous impact on the quality of our solution(s). If an error occurs, it is not good practice to simply ignore the exception; some sort of action should always be taken (even if it’s just to log the error). When an exception is raised, there are several courses of action which can be taken:

Abandon the execution of the unit
Retry the operation
Use an alternative approach
Repair the cause of the error

The first course of action (abandonment) is appropriate if the current unit is unable to continue processing. In the case of a system exception, a peripheral device may fail; or for a process exception within a business application, this may include being unable to fulfill a compliance regulation.

Rather than abandon execution, an operation could be repeated after the exception is raised. While it may be appropriate (in some cases) to retry the operation until it succeeds, in most cases there should be a limit on the number of retries attempted. An example would be a failed attempt to establish communication with a peripheral device or the inability to obtain a valid user password. In each case, repeated attempts may be the best response.

Another possible response to an exception is to try an alternative approach. While attempting to establish a communication link with a satellite, a redundant path may be provided to attempt an alternate route. For a business process, obtaining payment from a customer may include utilizing alternative payment methods.

The only other response to an exception is to repair the cause of the error. In the case of a control system, for example, a valve may indicate an unsafe pressure level. A response to this condition may be to initiate other controls to lower the pressure. Or if invalid user input is captured within a business application, for example capturing letters in a field that expects a number, a routine may be called to parse out the letters.

Defining Exceptions

While exception handling is traditionally handled by programmers, allowing this practice is a mistake. Burying it in implementation code prohibits visibility into critical parts of the process, along with the loss of agility, reuse, and shared best practices. By identifying exceptions in the business model, then later defining the handlers during analysis, risks can be identified early in the process and potential issues can be averted. And to no surprise, UML and BPMN provide notations for intermediate events; which is a critical piece missing from traditional flowcharting. If you’re considering exceptions while writing code, you’re almost guaranteed to have something break.

Defining Business Transactions

An important part of exception handling is defining appropriate business transactions. A transaction is a collection of activities that must be performed atomically – all activities must complete successfully or the system must return (roll back) to its initial state. Classic system transaction protocols lock all of the resources required to perform each activity until the transaction is either committed or rolled back – as if none of them had ever occurred. While this works well for computer systems, business transactions can last from a few seconds to a number of days; too long to lock resources – which means you can’t truly restore the initial state as if nothing ever happened.

If a business activity cannot complete successfully, completed activities must be reversed and uncompleted activities must be abandoned. You cannot simply remove a debit from a ledger that is no longer needed; you must apply a credit in the same amount. While this is not a roll back (as if the debit was never entered), it has an equivalent business effect.

As an example, when booking a trip online, you would not want to secure a hotel reservation for specific dates if you were unable to secure an airline ticket for the same dates. By booking the airline and hotel together as a single transaction, you are guaranteed that your reservations coincide. If either the hotel or airline reservation cannot be completed for the selected dates, the system should signal an exception which would reverse (or cancel) any reservations already made. This type of scenario should always be accounted for when modeling business processes.

Defining Process Exceptions

Exceptions are a powerful construct which can help define and implement business processes. While all processes have alternative paths, exceptions can be used to identify the events that initiate those paths. And exception handlers can also be defined to resolve the conditions which led to those events.

When realizing use cases, exceptions can be used to initiate alternative paths. If a process cannot be completed or if a condition exists due to violation of one or more business rules, an exception can be raised to alert the system of the condition. For example, when defining an authentication method within a User class, an exception could be raised when authentication fails.

Exception handlers ensure that appropriate steps are taken to recover from an exception. These steps should always be defined within the business model and should be implemented within the control classes. While exceptions should be raised within entity classes, these classes should never implement exception handlers. Exception handlers are defined by business rules and must always be implemented as such.

Conclusion

Many people often mistake exceptions for errors. While an exception may be raised due to an error, an exception refers to an event which can occur due to various conditions within a software system. Those conditions may result from business processes or runtime operations. Delivering reliable software depends on identifying those conditions and defining corrective action to facilitate resolutions.

Tags:

error exception exception handling quality transaction

Doug Klugh

Doug is an experienced software development leader, engineer, and craftsman having delivered consumer and enterprise firmware/software solutions servicing more than one billion users through 20+ years of leadership.

Planning for Failure

Anticipating problems and designing solutions to handle them.

Posted on June 12, 2015
by Doug Klugh

Exceptions

Handling Exceptions

Defining Exceptions

Defining Business Transactions

Defining Process Exceptions

Conclusion

Doug Klugh

Similar Articles

Article

Managing Technical Debt

Article

Why Bother With TDD?

Article

Enhancing Software Testability

Planning for Failure

Anticipating problems and designing solutions to handle them.

Posted on June 12, 2015by Doug Klugh

Exceptions

Handling Exceptions

Defining Exceptions

Defining Business Transactions

Defining Process Exceptions

Conclusion

Doug Klugh

Similar Articles

subject Article

Managing Technical Debt

subject Article

Why Bother With TDD?

subject Article

Enhancing Software Testability

Posted on June 12, 2015
by Doug Klugh

Article

Article

Article