Recovery point objective

A recovery point objective, or “RPO”, is defined by business continuity planning. It is the maximum targeted period in which data might be lost from an IT service due to a major incident. The RPO gives systems designers a limit to work to. For instance, if the RPO is set to four hours, then in practice, off-site mirrored backups must be continuously maintained – a daily off-site backup on tape will not suffice. Care must be taken to avoid two common mistakes around the use and definition of RPO. Firstly, business continuity staff use business impact analysis to determine RPO for each service – RPO is not determined by the existent backup regime. Secondly, when any level of preparation of off-site data is required, rather than at the time the backups are offsited, the period during which data is lost very often starts near the time of the beginning of the work to prepare backups which are eventually offsited.

When computers used for normal business services are affected by a "major incident" that cannot be fixed quickly, then the Information Technology Service Continuity (ITSC) Plan is performed, by the ITSC recovery team. This plan will always assume that the production computing equipment and the wider geographic location they normally reside at might become completely out of bounds at an unpredictable time, without any warning. The location chosen to rebuild the service (the recovery site) must be distant (for example, at least 10 miles) from the normal Production site and suffer no threats in common with the production site (e.g. they should not be near the same coastline). The ITSC Plan must also satisfy two measurements- the Recovery Time Objective (RTO) and the Recovery Point Objective (RPO) for any potentially affected services. These measures are determined by a team of people, called the Business Continuity (BC) team, that quantifies what losses might ensue if the services are not available. It is sobering to think that "potential loss of life" appears in far more IT service risk assessments than one might assume. The RTO and RPO are time intervals, typically expressed in number of hours, specified by the BC team to be the longest time the business can allow for without incurring significant risks or significant loss, allowing system designers to specify designs that are as cost effective as the RTO and RPO will permit.

The RTO is the amount of time the business can be without the service, without incurring significant risks or significant losses. The events that mark the start and end of the RTO duration must be pre-agreed between Business Continuity and ITSC staff. It is best to agree to start the RTO clock at the moment when it is decided to proceed with the recovery. Sometimes too much time is taken over the decision to invoke recovery, sometimes Major Incidents do not start at easily definable wall-clock times anyway. The RTO clock should be deemed to stop once the team responsible for testing the service (before it is successfully released to the wider user community) begin work. By defining the RTO in this way it can be set to a very specific time period, which allows better decision making at all levels- accepting that this compromises a little the principle of setting the RTO to be "the amount of time the business can be without the service".

...
Wikipedia