TL;DR summary
- The data deduplication feature in Windows Server is free and easy to turn on.
- It works effectively on unencrypted file backups (SharePoint, OneDrive for Business) made with BackupAssist 365, but not with encrypted file backups or mailbox backups.
- User reports show that space savings are possible, typically 20% or upwards.
Quick primer: What is NTFS data deduplication and how does it work?
Data deduplication is a feature built into Windows Server since Windows Server 2012. It is designed to help reduce the amount of storage space taken up by files by only storing a single copy of common data.
Several great articles have been written about what it is and how to turn it on. Here is our pick of articles for different OS types: Windows Server 2019, Windows Server 2016 and Windows Server 2012.
It is important to note:
- It is not real-time. It is a “post-process”, which means deduplication happens as a scheduled task. (This is similar to how you can defragment your disk by running defrag.exe as a scheduled task.)
- After each deduplication run, some space is freed up and reclaimed.
- Deduplication is on a per-volume basis.
- Applications such as BackupAssist 365 are none the wiser whether you have deduplication turned on or not. They function as normal.
How well does it work with BackupAssist 365 backup data?
BackupAssist 365 backs up cloud data, such as Office 365, to local storage. How effective is deduplication on different kinds of backup data?
Type of backup data | Expected effectiveness of deduplication | Why? |
---|---|---|
Mailboxes | Low | The mailbox backups will modify the PST files daily, so they will never stay the same long enough for Windows to attempt deduplicating them. |
Unencrypted files | High | Most file backups remain unchanged, so deduplication works well. (If an incremental backup only changes 0.5% of your file set, then 99.5% remains untouched – and are candidates for deduplication.) |
Encrypted files | Low | The same file, when encrypted multiple times, should give different encrypted data. Therefore, it can never be deduplicated. (See the full explanation below.) |
Results “in the field”
The results you shall experience will depend on the nature of your data. We are happy to share the feedback from one of our experienced MSPs, who reports an average 22% savings across mailbox and file data.
We would like to thank Jeff Vandervoort, from Crescent IT Systems for sharing his results.
The client’s data is a bit more PST-heavy than normal. PSTs are not deduplicated. PSTs can be deduplicated, but because the datestamp changes daily, they are excluded from deduplication, which only processes files older than 1 day (default is 5 days). So the PSTs never get old enough for deduplication to touch. Still, 22% is a welcome savings on a small server where space is tight.
In a normal system with more file data than mail data, you would generally save more. This is compression on their pre-SharePoint file server (no PSTs, all file data):
40-45% is pretty typical for file shares, in my experience.
Why encrypted data won’t deduplicate
Encryption, when implemented properly, will transform actual data into something that looks random. If you encrypt the same file multiple times, each encrypted version will look completely different from each other – this means that an attacker cannot learn if multiple messages are the same.
Because of this security feature, encrypted files will never deduplicate – and incidentally, they will also never compress.
Conclusion
It is realistic to expect 40% space saving for backups of file data (SharePoint, OneDrive for Business).
However, PST mailboxes would not deduplicate because the timestamp of the files changes after each backup.
Depending on the ratio of mailbox data to file data, many small businesses can expect space savings of 20% or more thanks to file system deduplication.