Tuesday, 25 October 2011

Compression with Deduplication


Compression and Deduplication have long served two different segments of users. Compression, mostly at End user level (for individuals compressing files and other desktop data) & Deduplication, mostly at Enterprise level (as a part of backup workflow on servers).

Compression when combined with Deduplication, can deliver some amazing results in terms of compression ratio and compression speed. Compression is the process of compressing data by assigning shorter codes to the data and Deduplication is the removal of duplicate data from the data set.

Lets take a typical data sample with few word documents, presentations, images and some other project data (code/drawings/designs/etc). Now over a period of time, while working on the project, same files are modified over a number of times and most of the times, many versions of the same file are stored(may be with different file names, well we don’t want to lose the changes that were made in the project during its course, thus multiple copies). Redundancy can exist in a single document as well (e.g. same image being used at multiple places in a presentation or pdf document).

Well after the completion of the project or during its due course, we would like to archive it for later reference. To do that, most probably we would compress it and dump it somewhere. Normal compression products such as winzip, winrar or 7zip would definitely give some compression on the project data. But wait, let us go back a little bit, the project folder has lot of similar files because of multiple versions which were made during the project. Now, do any of the above compression products do anything about this? Answer is NO.

Is it not wise to remove the duplicate data first and then compress only the unique data? Answer is definitely YES, more so, when we are aware that the data set has duplicate or similar files. Well, to address this deficiency, Essenso Labs has introduced a new Compression cum Deduplication product DZO, which applies Deduplication to remove duplicate data and then applies compression to achieve 3 times better compression at 3 times better speed(off course the improvement is dependent on the redundancy in the data) on most of the day to day data sets.

In cases, when the data set in non redundant (i.e. it has only unique files and with no redundancy), DZO delivers the same compression as lzma. Thus in the worst case, it gives the same compression as lzma and in better cases, DZO compresses the data to unprecedented levels and with incredible speed, based on the amount of redundancy in the data.

Results on a typical architectural data set.



Well, there is two fold advantage in using DZO. The excellent compression saves the precious disk space (which is not so precious now, but still not free) and while transfer of such data over internet(may be to a different office location or dumping it in the cloud), saves very precious bandwidth(which is definitely not cheap and not available all the time).

The other benefit is the resulting time saving, first, while compressing the data faster and then transferring a smaller compressed data set than original.

DZO is a cutting edge data management product which incorporates Positions Encoded Data technique and a new way of digest calculation with content aware data model.

Read DZO whitepaper from http://essensolabs.com/EssensoLabs-Dzo-WhitePaper.pdf

DZO can be downloaded from http://essensolabs.com


1 comment: