Working Toward a Managed, Mature Business Continuity Plan

Author: Sven De Preter, CDPSE, CISSP, CompTIA Cloud+, CompTIA Net+, CompTIA Sec+
Date Published: 30 April 2021
Related: Getting Started With Risk Management | Digital | English
português

Business continuity is defined as having the right tools in place to make sure that an organization can continue to function during an interruption of one or more of its critical mission functions.

Consider the example of an earthquake that causes massive damage to the majority of an organization’s infrastructure. As a result, the organization can no longer do business until the infrastructure is restored and new equipment purchased and installed.

In this example, having an accurate and up-to-date offsite copy/backup of inventory can enable the organization to file an insurance claim, place orders for new hardware/software and determine the amount of resources needed to rebuild the infrastructure in the cloud. However, infrastructure is not the only thing that may need to be rebuilt. The organization may need to reconstruct its data center and would therefore need to determine the people needed, the required skills needed, what processes are in place and how they work, what data are needed, whether the data can be restored, how much rack space and power are needed, and in what order systems should be restored.

The organization also needs to consider what vendors to use and whether they need alternate work locations. They must determine if people were on the premises when the earthquake struck and whether anyone is hurt and how that impacts the organization.

These details are important to consider when dealing with business continuity and disaster recovery.

International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) standard ISO/IEC 22301:2019 Security and resilience—Business continuity management systems (BCMS)—Requirements specifies how to design and implement a BCMS.¹ Having the necessary procedures, processes, activities, assets and decision-makers in place to proactively and reactively return to business as usual is key to success.

It is essential for organizations to work toward a decently managed and mature business continuity plan or program to keep mission-critical business functions running. After all, those critical functions are what is generating revenue.

The COVID-19 crisis is an important example of the necessity of business continuity. Many organizations have had a hard time reinventing themselves. They had good protection against a variety of threats, did vulnerability management, considered geopolitical issues, and evaluated reduced income and other risk areas. Many thought the odds of a pandemic were quite low, accepted the risk and later found that they were not prepared.

Often, frameworks can be used to help achieve organizational goals. A framework can be defined as a toolbox, which is accompanied by a manual that will describe the tools, use cases, and guidance on how and when to use those tools.

Not everything can be implemented at the same time due to cost limitations, resource constraints and lack of knowledge. Future requirements or changes may occur.

However, if an organization uses a structured, planned, managed and well-documented approach, things will become easier and more efficient. Different visualizations of the model can be created using a framework.

A framework can help an organization build a successful business continuity and disaster recovery plan based on organizational requirements by providing a methodology, guidance and tools.

Plan-Do-Check-Act

One of the most important concepts regarding business continuity is the Plan-Do-Check-Act (PDCA) Cycle (figure 1), also known as the Deming Cycle. One of the main reasons that this cycle is so important is because it covers every aspect of a plan, program, procedure or project, including planning and defining a scope, implementing measures, checking to see if these measures achieve the goals defined during the planning phase, and acting accordingly. This last phase will drive a new planning cycle, resulting in a gradual and steady approach that continuously improves or matures the process.

A FRAMEWORK CAN HELP AN ORGANIZATION BUILD A SUCCESSFUL BUSINESS CONTINUITY AND DISASTER RECOVERY PLAN…BY PROVIDING A METHODOLOGY, GUIDANCE AND TOOLS.

This basic model can be expanded to fit the specific needs of an organization in the context of business continuity planning.

Corporate Hierarchy

Before creating a business continuity plan, it is important to understand that there are different hierarchical levels within an organization’s structure (figure 2). People at each level are trying to achieve the same corporate goals, but each level has a different specialty.

It is likely that the strategic layer will not contain the most tech-savvy people. The technical layer will likely include people who have no real in-depth knowledge of all the aspects of running and maintaining an enterprise, but they do have the technical skills to get things implemented. The operational layer acts as the bridge between the strategic and the technical layers.

It is important to understand what those at each layer do and for what they are responsible.

Strategic Layer
In general, the main goal of the strategic layer, populated by the board of directors (BoD) and the chief executive officer (CEO), is to provide the direction of the organization. They think and dream about the future and how to take the organization to the next level while keeping risk and budgets within acceptable levels. They have a clear view of what keeps the organization viable and the potential negative events that may strike. They drive the organization’s business continuity efforts, provide guidance, and set requirements for the operational and technical layers.

For example, the CEO of an accounting firm may have done some math indicating that if the organization is not able to service its clients for longer than one month, the organization will need to file for bankruptcy as client claims will become too costly.

At first glance, this may seem like an exaggeration, but the key point here is that there can and probably will be events and risk scenarios that force an organization to close. Therefore, it is necessary to evaluate an organization’s critical business functions in terms of risk and decide how to best avoid, prevent and mitigate the risk. As there can be budget and knowledge restraints, not everything may be implemented immediately. When this is the case, priorities must be defined or additional resources must be acquired.

This also implies that once prioritization occurs, the organization must accept that it will still be vulnerable in terms of risk items with lower priorities.

Operational Layer
The CEO cannot do everything alone; therefore, the people in the operational layer are often asked to help. After all, the C-level executives populating this layer are experts in their fields, and they can provide valuable insights about detecting, mitigating and preventing risk and how to respond when a risk actually materializes.

Another thing C-level executives do is transform the objectives set by those in the strategic layer into something more usable. If the CEO says, “Due to the implications of privacy laws, we must protect customer data in the best way possible to avoid fines that can put us out of business,” the C-level executives take that statement and try to develop a plan. In this example, the chief marketing officer (CMO) would likely look into how client communications are affected, while the chief information officer (CIO) would likely look at how data are stored, where they are stored and what the options are on a technical level. The chief legal officer will probably look at requirements defined by the law or legislation and look at the risk of noncompliance. The chief information security officer (CISO) or data protection officer (DPO) may also use words and phrases such as:

Training—“We must train our users on the acceptable use of the data.”
Awareness—“How are we going to make sure our users understand why they cannot send those data?”
Monitoring—”How can we check that everyone complies?”

Technical Layer
Once the plan is designed, the activities are performed by staff in the technical layer. Based on the hardware, software and applications used, the controls defined in the plan are implemented. A successful business continuity plan cannot be built without understanding the organizational hierarchy and roles and responsibilities of the staff.

From Policy to Implementation

Once organizational hierarchy is established, objectives and restrictions should be analyzed to create an overarching policy, followed by a plan to implement that policy.

It is important to note the different components of the plan at the management/operational layer. If an organization does not identify what it has, it cannot protect itself, nor can it detect unwanted changes/issues/events/incidents and respond appropriately. If what is had is not identified and documented, how would it be possible to recover it?

A SUCCESSFUL BUSINESS CONTINUITY PLAN CANNOT BE BUILT WITHOUT UNDERSTANDING THE ORGANIZATIONAL HIERARCHY AND ROLES AND RESPONSIBILITIES OF THE STAFF.

The US National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF)² uses different functions and categories of activities that should take place when building a program. The functions can be defined as identify, protect, detect, respond and recover. When looking at the functions, it is clear that protection/prevention tries to prevent risk from manifesting itself. Detect, respond and recover deal with issues that come up when the risk materializes.

For each function, NIST CSF provides documentation on how these functions map to other NIST Special Publications. For example, figure 4 illustrates the identify function.

The CSF also lists informative references, which are

[S]pecific sections of standards, guidelines, and practices common among critical infrastructure sectors that illustrate a method to achieve the outcomes associated with each Subcategory. The Informative References presented in the Framework Core are illustrative and not exhaustive. They are based upon cross-sector guidance most frequently referenced during the Framework development process.³

Reviewing and understanding the listed informative references can provide a deeper insight into what is expected and how these goals can be met.

NIST Special Publication (SP) 800-53 Rev.5 covers security and privacy controls for information systems and organizations.⁴ It documents every control in terms of control objectives, supplemental guidance and control enhancements, and can be used in conjunction with any framework. Control objectives are goals that need to be achieved. Supplemental guidance provides additional information on the control and sometimes also refers to other controls. Control enhancements enhance the strength of a control or add additional value.

Another combination that is commonly used is ISO/IEC 27001:2013 Information technology—Security techniques—Information security management systems—Requirements⁵ and ISO/IEC 27002:2013 Information technology—Security techniques—Code of practice for information security controls.⁶ ISO 27001 defines the framework, and ISO 27002 defines the controls that can be used.

There are many resources available to help map controls from one framework to another.⁷

Business Continuity and Disaster Recovery

Business continuity focuses on addressing problems before they become real-life issues. It stretches well beyond infrastructure only, as it also incorporates alternate ways of working, managing succession and handling the protection of people. Disaster recovery is a part of business continuity that mainly focuses on technology and hardware. The main goal of disaster recovery is to rebuild/recover an organization’s site, infrastructure and data when something bad happens. When a disaster occurs, resulting in the destruction of mission-critical infrastructure or data, there are two metrics that are important:

The return point objective (RPO) defines how much data loss the organization can survive. The RPO is measured in function of time. For example, an organization may be able to afford to lose one day of data. However, if the organization loses more data, it may be at risk of not being able to survive.
The return time objective (RTO) defines the amount of time the organization has to perform the restore operation. For example, the impacted service must be restored within four hours or the entire organization is at risk of financial damage, reputational damage, claims from customers and legal consequences.

The simplest plan would have the following steps.

Scope and Initiation (Prepare)
During this phase, decisions need to be made on what will be assessed. In the case of business continuity, the activities that should be prioritized are those that keep the organization running. Dependencies to assets also need to be charted. An asset is defined as anything that brings value to the organization. The reason for mapping assets to the activities is that when something happens to an asset, it can impact the entire organization. For example, consider a generic sales activity that generates 90 percent of an organization’s income by means of an online sales platform. If that platform went offline, it would impact the entire sales process.

Business Impact Analysis (Plan)
This can be seen as a 3-4 step exercise. The first step is the risk assessment. For the risk assessment, the organization should construct some form of risk register. The risk register is used to document threats and vulnerabilities for all the assets involved in the organization’s activity as defined during the scope and initiation phase.

The next step is to add a likelihood and impact to each risk. This can be done using a quantitative or qualitative approach. The qualitative version defines impact and likelihood of risk in terms of high, medium and low, making it a subjective approach.

The quantitative approach uses percentages, currencies and results in a potential loss projection, making it an objective approach.

This allows a risk score to be added to the risk matrix that is being developed. The annual loss expectancy (ALE) should be calculated or estimated if possible. The ALE is an important value that defines the amount of money the organization can spend on the controls it would like to implement. There is no use in spending US$200 to protect US$5.

THE RISK REGISTER IS USED TO DOCUMENT THREATS AND VULNERABILITIES FOR ALL THE ASSETS INVOLVED IN THE ORGANIZATION’S ACTIVITY AS DEFINED DURING THE SCOPE AND INITIATION PHASE.

When a qualitative assessment is used, it becomes hard to estimate the ALE. The best option is to prioritize risk in terms of the final score.

As each risk is scored, they can be prioritized by sorting them according to risk score. Once this is done, the risk treatment/strategy can be defined. Some risk is unwanted and must be treated, while other risk may be accepted. This is a strategic decision.

Once decisions are made on what risk should be accepted and what risk needs to be addressed, business continuity program and security program efforts can begin.

Note that this is a process that is key to the survival of the organization. It is imperative to make the right choices and take the right actions, guaranteeing the continuity of the organization as the term “business continuity” implies. The policy defines the high-level objectives and the direction set by the strategic business layer.

Recovery Strategies and Continuity Development (Do)
A plan must be built with specific actions to take to comply with the policy. In other words: “What do we need to make sure we reduce the likelihood or the impact of the negative outcome?”

Incidents usually occur in three ways as illustrated by figure 5:

Something breaks, but business functions are not impacted.
Something breaks, and business functions are impacted.
Something breaks, and business functions come to a halt.

When discussing the assets that can be damaged, the most common mentioned are the triangle of people, products and processes. However, an asset is anything that provides value to the organization, so why are data and information not included?

Data are probably among the most important assets that need to be protected. Two different directions to protecting data are illustrated in the pyramid in figure 6.

The first direction to take is working on securing processes, products and people. The data are protected as well as the assets that work with the data.

The second direction is to start with a data-centric model. In such a model, the people, processes and products should be designed to provide a protection level required by the data.

Privacy law is an example of this. Specific categories of personally identifiable information (PII) require special or additional protective measures in order to protect the data.

Once the views are established, a plan/program can be designed.

When combining the categories defined in figure 3 and the people/process/product/data from figure 6, a matrix can be designed as shown in figure 7.

The matrix uses the categories defined in figure 3 and actions that must be taken for the people, processes, products and data columns. However, data also need to be considered, as illustrated in figure 6.

The generic controls (figure 7) represent a more traditional approach. When looking at these generic controls and comparing them to the NIST CSF, it may seem generic controls allow more granular control. However, the NIST CSF includes the generic controls as well.

For example, the following actions can be prioritized to protect proprietary data from being exfiltrated and sold to a competitor (which could potentially hurt the organization):

Identify people who work with these data.
Protect the data using encryption.
Detect changes to the data or detect unauthorized access.
Respond to security violations.
Recover the data from backups.
Train users on how to work with the data and the products the data are used on.

At this point, the operational layer is not yet telling the technical layer how to perform these actions; they are simply stating what needs to be done. In essence, they are creating the control sets. These control sets will then be reviewed by the technical staff to determine what the implementation options are. Once implemented, the controls can be audited and monitored to prove compliance. Guidance on the control families and controls that can be used can be found in the NIST CSF or NIST SP 800-53. Guidance can also be found in ISO/IEC 27002:2013 Information technology—Security techniques—Code of practice for information security controls.⁸

Implementation and Testing (Do)
As the plan is presented to the technical layer, those stakeholders must think about how they are going to implement the control requirements as set by the operational/management layer.

Testing of the controls must be done. After all, if they are not tested, they do not exist.

Testing allows an organization to verify and validate the implemented controls, which, in turn, provides assurance that everything will work as planned at the time it is most needed.

Monitoring and Maintenance (Check)
Monitoring and maintenance of the controls, as well as the entire risk environment, must be done because new risk scenarios may arise. Also, already detected risk that was unimportant earlier may have become more critical now.

As controls are monitored, key performance indicators (KPIs) can be bound to them. These KPIs could be:

What workforce percentage has received training?
How many incidents occur per day?
How many of those get resolved within 24 hours?
What is the average central processing unit (CPU)/DISK/memory load?

THE DOMAIN OF BUSINESS CONTINUITY GOES WELL BEYOND THE PROTECTION OF INFRASTRUCTURE.

These metrics can be indicators of progress and, at times, even precursors to other events, meaning they could be used for many purposes, even planning capacity upgrades.

Feedback (Act)
The feedback loop is there to consolidate all the lessons learned and to provide them as input during the next planning cycle.

Expanding NIST SP 800-37

Business continuity efforts can also be demonstrated using NIST SP 800-37 Risk Management Framework for Information Systems and Organizations.⁹ The basic approach of NIST SP 800-37 can be expanded, as it provides a clear picture on what happens behind the scenes in this risk-based life cycle model. If all the aforementioned items are incorporated, the risk management framework initial form can be reshaped as illustrated in figure 8.

View Large Graphic

Conclusion

The domain of business continuity goes well beyond the protection of infrastructure. It is an organizational effort that must be initiated and led by senior management, as they are ultimately accountable for what happens within the organization. They are responsible for assigning budgets and resources, and they are in the best position to provide high-level guidance on what needs to be done. It is up to security managers, privacy managers and risk professionals to provide them with the best advice possible so they can make well-informed decisions. Using a framework can help by giving the organization the tools for documenting its processes and building a good plan in the best way possible.

A good business continuity plan can help an organization survive in times of trouble. Not having one will most likely cause more trouble, as it can lead to an uncoordinated approach to fixing the problem, which can inadvertently cause additional damage. A business continuity plan must be reviewed, monitored, tested and updated for the best chance of success. It is like creating data backups. The only assurance that a backup is successful comes when the restore process is used to recover the data.

This is why it can be useful to expand on the risk-based approach from NIST SP 800-37. It is a comprehensive framework that allows continuous monitoring and tuning of controls. It is a never-ending process of defining an organization’s current state and desired state to detect and fill potential gaps, as not everything can be implemented at once, resulting in multiple planning and budgeting cycles to bridge gaps.

Endnotes

¹ International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC), ISO/IEC 22301:2019 Security and resilience—Business continuity management systems—Requirements, Switzerland, 2019, http://www.iso.org/standard/75106.html
² National Institute of Standards and Technology (NIST), Cybersecurity Framework, USA, http://www.nist.gov/cyberframework
³ National Institute of Standards and Technology, Cybersecurity Framework Version 1.1, USA, April 2018, http://www.nist.gov/cyberframework/framework
⁴ National Institute of Standards and Technology, Special Publication (SP) 800-53 Security and Privacy Controls for Information Systems and Organizations, USA, September 2020, http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r5.pdf
⁵ International Organization for Standardization/International Electrotechnical Commission, ISO/IEC 27001:2013 Information technology—Security technique—Information security management systems—Requirements,Switzerland, 2013, http://www.iso.org/standard/54534.html
⁶ International Organization for Standardization/International Electrotechnical Commission, ISO/IEC 27002:2013 Information technology—Security techniques—Code of practice for information security controls—Requirements, Switzerland, 2013, http://www.iso.org/standard/54533.html
⁷ National Institute of Standards and Technology, NIST SP 800-53 Revision 5 Control Mappings to ISO/IEC 27001, USA, http://csrc.nist.gov/CSRC/media/Publications/sp/800-53/rev-5/final/documents/sp800-53r5-to-iso-27001-mapping.docx
⁸ Op cit ISO/IEC 27002:2013
⁹ National Institute of Standards and Technology, SP 800-37 Revision 2 Risk Management Framework for Information Systems and Organizations, USA, December 2018, http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-37r2.pdf

Sven De Preter, CDPSE, CISSP, CompTIA Cloud+, CompTIA Net+, CompTIA Sec+

Is a senior network and system administrator for an organization that owns and runs concert venues and theaters in Belgium and a ticket-selling organization with a full customer service center. During his more than 20 years at the organization, he has gained experience in the fields of event and incident management, change management, operational team lead, data center virtualization (using VMware), connectivity, and architecture. He has also worked on different aspects of the corporate privacy program, providing advice on several policies, procedures and guidelines. He is also one of the founders of CertificationStation.org, which is a free platform where people studying for security certifications can discuss topics and training materials.

Home / Resources / ISACA Journal / Issues / 2021 / Volume 3 / Working Toward a Managed Mature Business Continuity Plan