COPAN Systems Logo
 
 
1.877.COPAN99 (1.877.267.2699) HOME | GOVERNMENT | CAREERS | CONTACT US | SITE MAP |  
 

  SIR-M for MAID

 
 
Shattering the virtual tape cost paradigm, enabling IT organizations to store up to 20 times the data or more in a single, low environmental impact footprint
 
SOLUTIONS : ENTERPRISE-CLASS DATA DE-DUPLICATION

Shattering the Virtual Tape Cost Paradigm.
SIR-M Data De-Duplication for MAID


Digital DataRecovery Point and Time Objectives are key components of any company’s IT tactical plan. The ability to quickly and accurately recover critical data is necessary whether it be due to an outage, a legal discovery or disaster. Virtual tape solutions like that delivered from COPAN Systems has greatly increased a company’s ability to meet these objectives with lighting fast backup and restore, more reliable access to data and a seamless deployment with existing infrastructure. However, the amount of data one could afford to store on one of these systems has been limited……until now. With Single Instance Repository for MAID (SIR-M), the virtual tape cost paradigm has been shattered, enabling IT organizations to store up to 20 times the data or more in a single, low environmental impact footprint.

What is Data De-Duplication?

Data De-duplication (AKA single instance, common factoring or capacity optimized storage) technologies strive to reduce the amount of duplicate data being backed up and then stored. The technologies identify and eliminate common data in and across backup streams. By eliminating the common objects, the resulting storage requirement will be reduced.  COPAN believes that data De-duplication can be a valuable technology and can provide significant value to customers if the correct approach is taken.

Customer Benefits
The two major benefits of data de-duplication are:

  • The reduction of data across the network
  • The reduction in the end state storage requirements

Simplified Look At Data De-Dupe

However, if not implemented correctly, data De-duplication can create serious customer issues including:

  • Network performance issues
  • Creation of a single point of failure
  • Potentially cause unrecoverable data loss

There are three other approaches to architecting data De-duplication into your backup solution - find out more about the approaches here.

COPAN Systems'
SIR-M + Enterprise MAID
The No Compromise De-duplication

There are two primary ways for a storage device or appliance to perform de-duplication, offline or inline. There are pros and cons to each, but when it comes to enterprise class operations, COPAN Systems' offline approach is the only way to meet IT objectives. With SIR-M, the de-duplication (data reduction) process can run if and when the IT organizations desires, without impacting critical backup windows. COPAN Systems' SIRM + Enterprise MAID offers the following benefits:

  • More Effective De-duplications
    COPAN Systems' SIR-M is designed from the ground up to meet the needs of the enterprise. Because of SIR server clustering, data duplication is scanned across up to 672 TBs of raw capacity, greatly increasing the opportunity for de-duplication. Smaller appliances create "data silos" that can result in many copies of the same data, even after de-duplication is performed. And because the servers are clustered, more servers can be added to increase performance in high capacity configurations.
  • Affordable, Reliable, Electronic Replication for the Masses
    Leave the tape truck at home. With SIR-M and replication, redundant data elements do not need to be sent across the WAN, reducing bandwidth requirements by up to 95%! And with COPAN Systems' "many to one" replication, multiple remote offices can form a”hub and spoke” to the primary site, enabling small remote offices to not send duplicate data, even if the duplicate is in another office!
  • Unprecedented Density - Data Center in a Box
    By combining de-duplication with COPAN Systems' revolutionary Enterprise MAID architecture, capacities never before realized can be stored into a single, 10 square foot chassis. Up to 10+ PBs1 at prices that rival physical tape enable IT organizations to re-think backup policies, enabling strategic data protection objectives to be met without compromise.
  • Enterprise-Class Availability
    SIR-M standby enables de-duplication to keep functioning, even in the event of a hardware failure. Backups need not be interrupted, and space saving de-duplication can keep on processing and not miss a beat with this optional feature. In the event of an SIR server failure, the standby unit automatically takes over, replacing any server within the cluster.

    1 Depending on De-duplication ratio
Other Approaches
to Data De-Duplicatoin

Client Based Approach

Client Based Approach DiagramThe client based approach has a number of merits. First, if you’re interested in a D2D approach vs. VTL, Symantec and Tivoli offer SIR or SIR-like capabilities within the BU application. By implementing SIR at the BU client the customer will dramatically decrease the amount of BU data sent across the network and then stored. It does not make sense to add another level of SIR in the solution. Implementing the client based approach also has merits in that the customer has one throat to choke as it relates to data recovery. By holding the BU application responsible for SIR there is no chance of data loss downstream which could cause catastrophic recovery issues.

Appliance Based Approach

Inline Applianch Based Approach DiagramAppliance based pre-data write to storage solution. In this option the customer will implement a SAN or a series of SAN based SIR appliances. These appliances will De-duplicate the BU data stream after it is sent by the client and before it is written to the disk. The appliance will use a VTL image to spoof the BU application and will compute the common data hash prior to writing to the disk. The positive of this implementation is simply moving the processing from the client to a single purpose appliance. This may be justified if the client compute platform is architecturally set and it is easier to move the compute function to an appliance. The draw backs of this solution are more significant. First there is no network savings between the BU client and the appliance. Second the appliance has a performance limitation tied to its ability to compute the hash and it is proven that overall BU write performance will be significantly reduced.  The third option is to separating De-duplicating from the BU application, if data is lost or corrupted within the De-duplication engine the BU application will have no way to recover.

Storage Platform Based Approach

Storage Based Platform Approach DiagramDe-duplication as a post BU write process within a storage platform. The draw back to this solution is the separation of the De-duplication from the BU application and the resulting risk of data loss or corruption which is unknown to the BU application.  The benefits of this solution are powerful. First you avoid the BU performance barrier by moving the SIR actions after the BU application has streamed its write to the storage platform. This allows the full performance benefits of storage bandwidth centric system to increase BU performance. Second the compute engine requirements and thus cost of the system is reduced as this method performs the De-duplication/ SIR function as a background task outside of the BU window. Once the SIR is complete, the system has minimized the amount of data that would be targeted to be sent to a DR site.

All de-duplication products do essentially the same thing: look at data in "chunks" and store only a single copy of each unique chunk. A key attribute of our de-duplication technology is that the process runs offline, after the backup completes.  SIR reads virtual tape cartridges from the library, analyzes the contents, and establishes a repository of unique blocks of data.

The original virtual tape cartridge is then replaced in the VTL with something we call a Virtual Index Tape that is only a fraction of the size of the original.
The space previously occupied by duplicate data in the library is then freed to keep much more data online for longer periods of time.

The de-duplication process itself is fairly straightforward. It begins with a module we call the virtual tape scanner reading data from a virtual tape cartridge.

De-Duplication Process

The scanner analyzes the data in variable-sized blocks and uses the industry standard SHA-1 hashing algorithm to calculate an index value based on the contents of the data. The value is then looked up in an index table to determine if the data is already stored in the repository. The index is pre-allocated and structured for fast lookup. If not, the data is placed into the repository and the index table updated.

In either event, the index value for the data is returned so that the virtual index tape can be constructed. The virtual index tape will occupy only a fraction of the space required for the original virtual tape cartridge since it only contains metadata and repository index pointers.SIR is backup tape format-aware for maximum data de-duplication efficiency. SIR is not confused by extra information the backup program puts on the virtual tape cartridge. This format-awareness also allows SIR to examine data using different size blocks for different file-types to ensure maximum detection of duplicate data.

Single Instance Scan:
The file data is extracted and added to the repository. File data is replaced with links to the extracted data.

Single Instance Scan Diagram

After Single Instance Scan:
The shadow virtual tapes contain only the backup metadata with the links to the repository file data entries. The links are the keys to retrieve the data when needed.

After SIR Scan Diagram

Summary
De-duplication technology is still new to the market but awareness has been raised by the so called "hype factor" that surrounds bleeding edge technology.  Business looking to reap the benefits of Single Instance Repository type technology should considers all the options, the pros and con's prior to any technology purchase.

 
Green Grid Member
 
1.877.COPAN99 (1.877.267.2699)