Last Updated on November 27, 2022 2:08 am
You may have purchased a shiny new fat capacity hard drive to store all your Linux ISO’s, cat videos, and the vast multitude of cheesy and just plain bad 20th century sci-fi films. You buy a 10TB drive thinking you can finally combine all your CD’s, DVD’s, Blu-Ray’s, thumb drives, old USB hard drives into one convenient drive. You add it all up and see you have a bit over 9TB of data, so 10TB should be fine, right?
So you boot up your fresh new 10TB external WD USB hard drive and go into Windows and check capacity.. 9.09TB!?!? But I paid for 10TB drive right? What a rip off!!!
No, settle down, silly. It’s not. It comes down to simply a matter of convention and conversion.
Seagate offers a great simple explanation on their website if you visit here: https://www.seagate.com/support/kb/why-does-my-hard-drive-report-less-capacity-than-indicated-on-the-drives-label-172191en/
But I’ll break it down in a bit more detail.
For example, using a 10TB drive, if you look carefully at the drive properties (Windows: right click drive, properties) you will see capacity in actual bytes, approximately (probably a little bigger than) 10,000,000,000,000 (10 Trillion!!!) bytes. But it advertises it at 9.09 TB. What gives?
In short, it’s simply because hard drive manufacturers market capacity in decimal TB or 1012 bytes where Windows, Linux, and at one point MacOS, utilize binary TB or 240 = 10244 bytes. There is a long history of proper convention for storage devices vs other electronic devices like RAM, including many court cases, where in the end hard drive manufacturers were able to market their storage devices as decimal, while everything else computer related was binary.
Somehow drive manufacturers won the battle as far as terms like terabyte (TB), gigabyte (GB), megabyte (MB), and kilobyte (KB) to mean decimal versions, where as what Windows and Linux display are tebibyte (TiB), gibibyte (GiB), mebibyte (MiB), and kibibyte (KiB) respectively. Windows still uses the TB, GB, MB, and KB monikers though, which can add to confusion. If you look closely, Linux actually uses the proper notation of TiB, GiB, MiB, KiB, so they conceded to the “proper” terminology.
It makes sense if you think about it, however, since the prefixes Tera, Kilo, Mega, Giga, etc have all been defined by the metric system as decimal prefixes. 1 Kilogram is 1000 grams. So it would make sense that 1 Kilobyte is 1000 bytes, not 1024.
To to compare see this below conversion:
Notation Symbol Value
1 kilobyte 1 kB 10^3 = 1000 bytes
1 megabyte 1 MB 10^6 = 1000000 bytes
1 gigabyte 1 GB 10^9 = 1000000000 bytes
1 terabyte 1 TB 10^12 = 1000000000000 bytes
1 kibibyte 1 KiB 2^10 = 1024^1 = 1024 bytes
1 mebibyte 1 MiB 2^20 = 1024^2 = 1048576 bytes
1 gibibyte 1 GiB 2^30 = 1024^3 = 1073741824 bytes
1 tebibyte 1 TiB 2^40 = 1024^4 = 1099511627776 bytes
CONVERSION Decimal to Binary:
1 Terabyte (TB) = decimal = 10^12 or 1 000 000 000 000 bytes
1 Tebibyte (TiB) = binary = 2^40 or 1 099 511 627 776 bytes
1 Terabyte = 10^12 / 2^40 = 0.909495 Tebibyte
1 Gigabyte (GB) = 10^9 = 1 000 000 000 bytes
1 Gibibyte (GiB) = 2^30 = 1 073 741 824 bytes
1 Gigabyte = 10^9 / 2^30 = 0.931323 Gibibyte
1 Megabyte (MB) = 10^6 = 1 000 000 bytes
1 Mebibyte (MiB) = 2^20 = 1 048 576 bytes
1 Megabyte = 10^6 / 2^20 = 0.953674 Mebibyte
So if you bought a 10TB hard drive it is actually 10 000 000 000 000 bytes decimal or ( 10 000 000 000 000 bytes decimal ) / ( 1 099 511 627 776 bytes binary) = 9.09495 TiB which is what you’d see displayed in Windows. Or a simpler way would just be multiply your TB capacity of the drive by 0.9095 to get actual binary capacity. So 14TB would be 14 x 0.9095 = 12.73 TiB as shown in Windows.
The one common factor here is bytes. A byte is a byte is a byte. It’s how you convert from bytes to the larger numbers that causes the difference. If we dealt with only actual byte sizes this wouldn’t matter, but then we’d constantly be looking at huge numbers all the time and that wouldn’t be very fun would it?
So What About SSD’s (and RAM)?
In my opinion, using binary instead of decimal makes most sense, because well, pretty much everything in a computer is, or at least has been, predicated on its binary nature. Solid state storage components like RAM and SSD’s and USB flash drives must have a binary capacity because of the binary nature of solid state.
For example, you can’t have a 10GB stick of RAM. Every component has to have a base capacity to the power of 2, so 20 = 1GB , 21 = 2GB, 22 = 4GB, 23 = 8GB, 24 = 16GB, etc. So a 10GB stick of RAM would still have 16GB of capacity, it’s just that 6GB would go unutilized. Not very efficient. Systems can have 10GB of RAM in them, but only with multiple sticks, like 4 + 4 + 2 GB or 8 + 2GB, just not as a single stick of RAM.
Hard drives don’t have this limitation because while they store data as binary 1’s and 0’s, they are not solid state. They are mechanical devices with spinning platters storing billions of 1’s and 0’s. There are slight limitations since most disks these days contain 4k sectors (smallest physical storage unit on a disk), so total capacity has to just be a multiple of 4KB to not have any wasteful capacity. So a 10TB disk ( 10 000 000 000 000 bytes ) / 4096 bytes per sector = 2 441 406 250 sectors. Even if it weren’t an exact integer, the number of wasted sectors would be minimal, under 100KB of storage not utilized.
Now back to SSD’s. SSD’s are still marketed by decimal capacity, even though their actual capacity is equal to that of its binary equivalent (GiB vs GB). If you see a 512GB SSD it is saying it has 512 000 000 000 bytes of usable space. Where in reality it has binary 230 x 512GB = 549 755 813 888 bytes of data. That is about a 7% difference in capacity. So are you still being tricked out of 7% of your storage capacity? No, not really.
SSD’s require free space to work optimally. Without going into a lot of detail, that will take another full article to explain, there is a routine on all SSD’s called “Garbage Collection” (GC). Every manufacturer has their own algorithm and its proprietary, but the end result is pretty much the same. During SSD idle time, your SSD will initiate its GC routine which will scour the data and “clean up” data as well as rewrite data so that storage cells are used more or less evenly. This process requires free space, preferably untouched free space, to work optimally.
This is where that 7% free space comes in. It’s 7% that’s untouchable by the user, but fully accessible by the SSD. A 500 GB SSD still has 512 GiB ( 230 x 512GB = 549 755 813 888 bytes vs 500 000 000 000 bytes ) of data, so it allows even more free space ( ~ 9% ) for the SSD to do its dirty work. You will also frequently see even 240, 480, or 960GB SSD’s, but in reality, they are actually still 256, 512, 1024 GiB SSD’s respectively. It’s just allocating more space for the system to work optimally.
USB Flash Drives
Flash drives are similar to SSD’s. The use NAND flash for storage, but the biggest difference is that an SSD uses a much more complex controller allowing for improved data handling and speeds, and many have DRAM cache to help improve performance even more. Flash drives are basically just “dumb” devices that use a controller for bare minimum control. They typically do not use a “wear leveling” routine which ensures more or less even wear of all the NAND flash cells.
In any case like an SSD, over-provisioning is still required with a USB flash drive as well for optimal performance. Most USB flash drives utilize about 10% over-provisioning, so a 64GiB (binary) flash drive would have about about 57.6GiB (GiB) of space.
So bottom line, expect 10% less capacity than advertised.
File System
One more thing that may lead to disk capacity differences has to do with the file system. Since most users reading this are likely using Windows, NTFS has some overhead associated with it. Meaning it needs its own space to store data about your files including file name and attributes about that file.
In Windows if you right click your drive of interest and select properties you will see “Used Space:” and “Free Space:” Even on an empty drive, like shown in above image, it shows 394MB of data is being used. This is data reserved for the file system. It may grow depending on how many files you end up storing on the hard drive space. Albeit 394MB is pretty negligible compared with a capacity of 10 000 000 (ten million – pinky to corner of mouth) MB, less than 0.004%.
tl;dr
Shown capacity in Windows ( or Linux ) vs advertised disk drive capacity is simply a matter of decimal ( TB, GB, MB, etc ) vs binary ( TiB, GiB, MiB ) conversion. So advertised capacity of 10TB drive is 1012 bytes = ( 1 000 000 000 000 bytes ) รท ( 240 = 10244 = 1 099 511 627 776 ) x 10 ~ 9.095 TiB.
Windows inappropriately shows TiB, GiB, MiB as TB, GB, MB adding to confusion. Mac uses conventional decimal TB, GB, MB. Linux uses binary and uses proper nomenclature such as TiB, GiB, MiB.
I hope this cleared things up a bit.
EDIT Nov 26 2022 – added 1024x notation