Backup validation

From Wikipedia, the free encyclopedia

Backup validation is the process whereby owners of computer data may examine how their data was backed up in order to understand what their risk of data loss might be. It also speaks to optimization of such processes, charging for them as well as estimating future requirements, sometimes called capacity planning.

History[edit]

Over the past several decades (leading up to 2005), organizations (banks, governments, schools, manufacturers and others) have increased their reliance more on "Open Systems" and less on "Closed Systems". For example, 25 years ago, a large bank might have most if not all of its critical data housed in an IBM mainframe computer (a "Closed System"), but today, that same bank might store a substantially greater portion of its critical data in spreadsheets, databases, or even word processing documents (i.e., "Open Systems"). The problem with Open Systems is, primarily, their unpredictable nature. The very nature of an Open System is that it is exposed to potentially thousands if not millions of variables ranging from network overloads to computer virus attacks to simple software incompatibility. Any one, or indeed several in combination, of these factors may result in either lost data and/or compromised data backup attempts. These types of problems do not generally occur on Closed Systems, or at least, in unpredictable ways. In the "old days", backups were a nicely contained affair. Today, because of the ubiquity of, and dependence upon, Open Systems, an entire industry has developed around data protection. Three key elements of such data protection are Validation, Optimization and Chargeback.

Validation[edit]

Validation is the process of finding out whether a backup attempt succeeded or not, or, whether the data is backed up enough to consider it "protected". This process usually involves the examination of log files, the "smoking gun" often left behind after a backup attempts takes place, as well as media databases, data traffic and even magnetic tapes. Patterns can be detected, key error messages identified and statistics extracted in order to determine which backups worked and which did not. According to Veeam Availability Report in 2014 organizations test their backups for recoverability on average every eight days. However, each quarter, organizations only test an average of 5.26 percent of their backups, meaning that the vast majority of backups are not verified, so could fail and cause downtime.

Some backup software's validation consists solely of examining the backup file to see if it can be read by the backup program. That is a useful part of validation, but as an entire validation process, it's useless.

A proper validation process consists of at least two processes. Validation of a backup file is of little or no use unless it compares the backup file's data to the data of the source. Additionally, "validation" is an unknown unless it's known with certainty that the backup file can actually restore the source's data.

Optimization[edit]

Optimization is the process of examining productivity patterns in the process of backup to determine where improvements can be made and often, where certain (less important) backup jobs may be eliminated entirely.

Chargeback[edit]

Very often, the service of backing up data is done by one person (or persons) in the service of others, the Owners of the data. Becoming more prevalent today also is the charging for those services back to the data owners. A simple fee per backup might be agreed upon, or, as is more often the case, a complex charge based on success rates, speed, size, frequency and retention (how long the copy is kept) is put into place. Usually some form of service level agreement (SLA) is in place between the backup service provider and the data owner in which it is agreed what is to be done and how the service is to be charged for.

See also[edit]