While everyone seems to be in agreement that data deduplication is a must-have technology for backup data, there doesn’t appear to be a consensus on the best way to perform the actual deduplication. There does seem to be consensus that some approaches are faster and more efficient than others. So, it’s important to understand the differences before deciding which method is best for your particular environment and data protection requirements. The different methodologies include:
- In-Line Processing—Data deduplication is performed as data flows into the secondary storage system. While the CPU-intensive process is only performed once, it requires more processing power and therefore, can be slower than other methods. The speed of in-line processing is highly dependent on the design of the deduplication algorithm and the hardware on which it is running, so pay special attention to these aspects. Aside from these considerations, in-line processing ensures the most efficient use of available disk space when compared with the other approaches.
- Post Processing—Data deduplication is performed once data is already stored, which demands less processing power as the deduplication can be performed during non-primetime hours. Be aware, however, that this approach makes much less efficient use of disk space and can cause bottlenecks when performing regular backups, replication or disaster recovery procedures.
- Parallel Processing—I/O and deduplication operations are handled simultaneously on one storage platform. As a result, processing power can be impacted severely while also slowing backups, if processing is diminished. This approach also can yield inefficient use of disk space.
While tradeoffs exist for each method above, in-line processing has an edge when it comes to accelerating backups and optimizing disk space. For small and medium-sized enterprises (SMEs), in particular, fast backups and highly efficient disk use typically are top-ranked priorities. For that reason, Overland Storage has embraced the in-line processing approach for its just released REO 9500d deduplicating VTL appliance.
Overland has leveraged the deduplication algorithm from Diligent Technologies and optimized the REO platform for use with this technology. Because duplicate data searches are done on a small and efficient memory-resident index, deduplication operations are very fast when compared with other methodologies. As a result, the REO 9500d delivers best-of-class data deduplication while providing simplified, affordable long-term data retention on disk.
--
Jeffrey Graham
Senior Product Manager