You can blame the Egyptians for our need to make daily or more frequent data backups. From the day in the year 3,000 B.C. that the Egyptian monarchy introduced documents written in ink on papyrus, they also invented the need to keep backup copies of those documents.
Stone tablets and cave paintings hadn’t required backups. But fragile papyrus didn’t last long in humid environments, and as the use of papyrus spread through the world (at least until the Chinese invented paper), keeping documents safe became an expensive, time-consuming and difficult necessity. First sealed in clay jars and later encased in wax, backup copies remained a constant and difficult chore through the migration of documents to paper, on to electronic dots and dashes, and finally to the binary system of ones and zeros used in today’s computers.
Still, given that this is a 5,000-year-old technology, the hardware choices available to today’s practitioners are remarkably simple. Data is backed up to one of three types of media: magnetic media such as tape, hard drives and magnetic disk; laser optical media such as CDs and DVDs; and “hard copies” printed on paper. As it happens, the key to effective backups isn’t the hardware or the medium, but the software and policies that are used.
Defining Data Storage
The term backup refers to the process of making a storage copy of a file or document. But in reality, this term covers not one but five separate and distinct kinds of data storage:
• Data archiving is the process
of retaining business data for future reference. Bank statements and accounting data, legal documents, correspondence and other documentation is retained to comply with legal requirements or for reference as needed for a fixed number of years (usually five or more). An archive is a single version of a file or group of files meant for long-term storage. Archiving is usually done manually, but may be managed by software.
• E-mail archiving may serve a similar function but is even more important as a tool to keep electronic mail systems from being overwhelmed by the sheer volume of data. Old messages and attachments may be moved off of the primary e-mail server and stored in compressed form until needed.
• Document control is the process of keeping works in progress available until the finished work is complete. Draft documents, for example, may be kept for immediate reference and then deleted once the final draft is approved. This kind of data storage differs in that it is generally done on-site and is short-term in nature.
• Data Synchronization is a form of document control in which data is “synchronized,” or made to act in the same manner, on two or more devices. Synchronization is used to export data from the desktop to a laptop, for example, or from an e-mail program to a PDA.
• Data backup & restoration is the process of retaining and storing files and documents in a format from which they may be restored to the original format in the event the original data is lost, destroyed or corrupted. Unlike an archive, a backup is often only an incremental record of the changes made to a file over a relatively short time period.
How It’s Done
Data storage on the first computers was done by holes punched in standard-sized cards, but quickly moved to the first magnetic tape system (1955) and the first “hard drive” (1957). By the time accountants brought about the PC revolution in the early 1980s, tape backup devices and software to backup data on floppy diskettes were in use. And by the late 1980s, hard drives were the device of choice.
All of these systems suffered from three flaws. First, they took a long time to accomplish a backup. To avoid corrupting the data during the process, all work on the files being backed up had to cease for the hours it took to copy over the data. Second, the ability to restore the data was uncertain. Corruption of the data during backups was common, as were failures during the restoration process. Computer users took to keeping several generations of their backups so that if the current backup failed, to restore an earlier version would at least capture some of the lost data.
Third, the backup systems were expensive. Even with the falling prices for data storage and the introduction of data compression algorithms, the cost of a reliable backup system remained relatively high if all of the computers in a business were to be adequately covered.
Enter the creation of a variety of solutions that attempted to address these flaws. The software became simpler to operate, offering automatic and timed backups that did not require human intervention. Backups could therefore be scheduled overnight when the offices were not
open for business. Data compression, backup and restore software became more reliable, so failures were less frequent. And with the continued fall in the price of storage, it became feasible to maintain multiple backups both on site and in remote locations.
One gigabyte (GB) of storage, which cost $500,000 in 1980, is less than $1 today.
But the declining cost of storage came with its own set of problems. With data storage relatively cheap, technologists began to look toward a “paperless office” in which all documentation would be communicated, used and stored digitally. Today, the average storage space for a single user — the user’s “data store” — is over 10GB, and the data storage requirements for an average company is growing by 39 percent each year.
Worse yet, today’s computers were never meant to juggle the demands of hundreds of megabytes (MB) of data on 200GB drives. Neither the File Allocation Table (FAT) system used to manage files under DOS and Windows nor the NT File System (NTFS) introduces in Windows NT are up to the demands of tomorrow’s storage requirements. FAT 32, for example, is still limited to a maximum of 32GB. NTFS, which can handle up to a terabyte (TB) of data, has other problems in using the storage devices across multiple systems.
That’s why Win FS (Windows Future Storage, the file system to be included in Microsoft’s forthcoming Longhorn operating system) will manage data in a relational database rather than a flat data file format. Details are still sketchy, but it is safe to say that the file structure shown in Windows Explorer under this new system won’t be how the data is actually stored. And that may mean a complete re-thinking of backup and restoration systems.
The just-completed acquisition of Groove Networks by Microsoft was mainly an effort to beef up Microsoft’s collaboration tools so that it can more directly challenge IBM in the collaboration market. But the acquisition will also give Microsoft a formidable set of archiving and backup solutions that could mean tough competition for stand-alone software and hardware vendors.
Finding The Best System
The backup, synchronization and archiving industry has been forced
to make major strides in innovation while balancing between the needs of large enterprises and the SMB market most accounting firms fall within. The result is that these products are better than ever before and still just a little short of perfect.
The primary complaints from users? Too expensive, too complicated and too awkward.
Backup and archiving solutions are most commonly sold as complete systems, with hardware and software matched for optimal performance. While this adds important reliability, it also forces the systems to be custom-built on relatively short production runs. This means they can’t be cheaply mass-produced, and tends to keep prices higher. It’s the cost of reliability.
This custom configuration also leads to some useful features, like internal timers for backups and single-button operation. But many of the systems on the market remain kludgy and complicated, particularly for the majority of accounting firms and companies that don’t have a full-time IT department.
Likewise, many systems remain awkward to use, with manual tape or drive swapping and extensive human intervention that should have been innovated out of the systems long ago.
Nor is it easy to define which of the dozens of systems on the market today qualify as “the best,” because that rating will be different for differing kinds of firms. Small accounting firms will have widely differing needs from an enterprise, and firms involved in financial planning or other securities-based services will have to meet some strict requirements for data protection that others don’t need.
The vendor listings that accompany this article on page 70 do a good job of surveying the field and assessing the variety of vendors. Beyond that, there are 10 features to look for as you select a system:
• The right tool for the job. Combination backup/synchronization/archiving software may offer the benefit of simplicity — only one system to purchase and learn. At the same time, however, it is often the case that these “all-in-one” solutions are excellent at one task but mediocre at the others. Tape and multi-drive backup devices, for example, are less attractive as long-term archiving solutions and make almost no sense for synchronization. Look for simple solutions that best fit your needs in each different category.
• Setup by wizard. Setting up the backup system should be almost painless. It should be done via a wizard that installs any necessary software, prompts for each setup decision (or automates them with defaults at a single click) and checks the operation of the system as the final step. The system should operate on a “set and forget” basis that requires no further human intervention.
• Automatic backups & synchronization. Human intervention should not be required when the systems are used. The system should have a feature that automatically synchronizes or backs up the data, encrypts the data if preferred, compresses the data if preferred, and checks to ensure the integrity of the backup copy is assured and stores it on any schedule set by the system administrator or office manager. If the system does not have this feature, bypass it.
• Single-button backups and synchronization. There are times when a backup or “synch” is needed on the fly, and this process should also be uncomplicated. You should be able to press one button and have it done. If the system doesn’t have this feature, don’t waste time with it.
• Single file accessibility. Backing up a terabyte of data is a useless exercise if you have to weed through the whole backup to find one bit of information. More advanced systems enable the user to find the file(s) needed and restore them individually.
• Archive accessibility. Likewise, the primary
purpose of an archive is
the ability to locate and
use a single file or group of files. An archive system that does not do this, or is complicated to use, should be bypassed in favor of a better system.
• Security of stored data. Data has never been at greater risk, particularly financial data. Strong security that includes powerful encryption restricted access and safety from physical threats should be the minimum level of security offered.
• Non-proprietary media & exportability. One of
the traps used by less-than-benevolent vendors is the use of proprietary storage media or proprietary file formats. This is done not as a benefit to the user but as a means to ensure that the vendor can extract larger sums for the system and to ensure that users can’t switch to another system. But it also means that if the vendor goes out of business, the company is stuck. Refuse to use proprietary systems whenever possible.
• Backups during backups. Backups take time, especially for larger companies. In the time it takes to perform the backup, changes in the data may be at risk. For 24-hour operations, the software must have a means to ensure that changes made while the backup is in progress are captured. One way to do this is through “snapshot” technology that takes a “picture” of the system at the beginning of the backup and tracks all the changes made from the start of the snapshot. Other systems perform continuous backups so that the system is protected both at set intervals and between those intervals, though such continuous operation is more expensive.
• Document control. Some systems automatically track drafts of documents through to the final draft. This is handy if the drafts are used by a number of authors, or when a document is cycled for approvals through a number of geographically separated offices.
The Next Generation
Technology is in a state of constant change, and the technologies of backup, archiving and synchronization are no exception. Here are just a few of the ways in which these technologies are evolving in 2005:
• Archiving: Think blue. For archiving, one of the best strategies may be to use a DVD-based storage system. With the new generation of “blue laser” DVD recorders (using Blu-Ray or HD DVD technologies) set to enter the market by next year, the storage capacity of a double-sided DVD will be greater than 20GB, or about 10 million pages of text. This provides
a low-cost, simple system for archives that are less susceptible to age, humidity or light.
• Archiving: Remember the Internet. Today’s web browsers have the ability to enable users to read almost any document format, including *.PDF and spreadsheet files. This means that you can safely archive documents you know you will want access to on
a web site on the company intranet or on the public Internet. User’s can refer to the documents but not alter them, similar to the tax forms and instructions the IRS posts on its web sites. It is a low-cost solution whose only major drawback is the manpower required to index and post the files. Web sites also provide an excellent way to synchronize data between devices.
• Backup: Remember the Internet. The newest generation of backup-and-restore systems are web-based sites that will store the information digitally. With compression and encryption built in, these sites can offer an accessible, low-cost, off-site solution.
• Synchronize: Think small. Moving data from one device to another doesn’t need to be a struggle. With new USB-connected drives (or storage devices such as a keychain “jump” drive or SD card), files can be copied from one machine, carried to another and installed in under a minute. It’s easier and faster than burning the files to a CD or DVD, and the amended file can be copied back to the original computer simply by reversing the process. USB-connected jump drives, SD and CD cards can offer a gigabyte of storage for synchronization for about $75. External USB hard drives vary by size, but are inexpensive.
• Synchronize: Walk & talk. Don’t forget that PDAs and “smart” cell phones also make handy synchronization systems for an individual. PDA phones equipped with SD cards can hold documents, presentations and databases for downloading into other devices, and offer automated and one-button synchronization with the desktop.
• Synchronize: Be a
collaborator. The concept of synchronizing is oh-so-Nineties. In the new millennium, collaboration is the key. Microsoft is already pounding this message, and so is IBM.
Others are quickly following. The idea is that documents should be placed in an electronic “work space” where they can be accessed, shared, altered and stored by any member of the team. Microsoft Office has these tools built into the Infospace application, as does IBM’s Lotus Notes.