How do you backup 5.5 billion U2 songs to a single USB stick?

Tuesday 10th February 2015 •
Billy Law-Bregan, Communications Officer

If we assume that all 11 tracks are downloaded to each of the 500 million devices, and each track requires an average of 5 megabytes, then disk space allocated to hold the album around the world would total:

27.5 Petabytes   (27,500,000,000 megabytes)   [2]

To put that number in perspective, Netflix was reported to have had just over 3 petabytes of video in their library in May 2013[3].

Whilst that is a huge number, it is distributed amongst millions of devices and the cost of the storage is picked up by consumers who bought the devices. However, what if all of those users backup their iTunes account?

According to[4], raw disk drives cost $81,000 per petabyte.  So to store the backups of the album would cost Apple a cool $2.2 million.

And that is just raw disk. If you were to use enterprise grade storage devices, the midpoint between vendors is $1.7m per petabyte and the total cost is …

$47.1 million.  Now that would be some sales commission.

Now, this is an extreme example and the software developers amongst you will have already started thinking of ways that the iTunes software could avoid backing up the files multiple times.

If you will put that to one side for a moment, it is a great way of demonstrating the value of a technology called de-duplication, which reduces the amount of storage needed on IT systems by removing duplicate copies of data.

De-duplication analyses data and breaks it down into chunks. If these same chunks are seen again (for example, because there are two copies of the file), a small pointer to the original data is put in the place of the duplicate. This means that you only need one copy of the original files, and in our case millions of pointers.

Thanks to de-duplication, we now only need the original files (11 tracks of 5Mb each) and the pointers – which you could fit on a reasonable size USB stick. Don’t try to restore them all at once though.


This is obviously an extreme example, but this simple principle is saving millions of pounds in storage costs every year. Without this principle, it could be argued that we may not be able to keep up with the phenomenal growth in data and virtual machine sprawl.

This principle is perhaps most acute in server backup, where a single business can be backing up hundreds if not thousands of virtual machines every day. The first backup, called a seed backup, would be backing up many identical files from hundreds of Windows operating systems. Modern backup software can spot this duplication and avoid ever sending the duplicated data to the backup server, saving both money and reducing the time taken to perform backups.


Mark Wilson

Mark Wilson is a technology fanatic who works for Node4. He is focused on helping our Customers benefit from innovation and new technology. With a mix of technical and commercial expertise, he has developed innovative IT services for major global outsourcers, mid-market service providers and SME businesses, and is one of 892 IT evangelists recognised worldwide in 2014 as a VMware vExpert. Follow him on twitter @markwilsontech


[2] 5 Mb per track * 11 tracks * 500 million subscribers

