"Format (computing)" redirects here. For other uses, see Format#Computing.
Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting".Partitioning is the common term for the second part of the process, making the data storage device visible to an operating system. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels[nb 1] and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files.
As a general rule,[nb 2] formatting a disk leaves most if not all existing data on the disk medium; some or most of which might be recoverable with special tools. Special tools can remove user data by a single overwrite of all files and free space.
A block, a contiguous number of bytes, is the minimum unit of storage that is read from and written to a disk by a disk driver. The earliest disk drives had fixed block sizes (e.g. the IBM 350 disk storage unit (of the late 1950s) block size was 100 6 bit characters) but starting with the 1301 IBM marketed subsystems that featured variable block sizes: a particular track could have blocks of different sizes. The disk subsystems on the IBM System/360 expanded this concept in the form of Count Key Data (CKD) and later Extended Count Key Data (ECKD); however the use of variable block size in HDDs fell out of use in the 1990s; one of the last HDDs to support variable block size was the IBM 3390 Model 9, announced May 1993.
Modern hard disk drives, such as Serial attached SCSI (SAS)[nb 3] and Serial ATA (SATA) drives, appear at their interfaces as a contiguous set of fixed-size blocks; for many years 512 bytes long but beginning in 2009 and accelerating through 2011, all major hard disk drive manufacturers began releasing hard disk drive platforms using the Advanced Format of 4096 byte logical blocks.
Floppy disks generally only used fixed block sizes but these sizes were a function of the host's OS and its interaction with its controller so that a particular type of media (e.g., 5¼-inch DSDD) would have different block sizes depending upon the host OS and controller.
Optical discs generally only use fixed block sizes.
Disk formatting process
Formatting a disk for use by an operating system and its applications typically involves three different processes.[nb 4]
- Low-level formatting (i.e., closest to the hardware) marks the surfaces of the disks with markers indicating the start of a recording block (typically today called sector markers) and other information like block CRC to be used later, in normal operations, by the disk controller to read or write data. This is intended to be the permanent foundation of the disk, and is often completed at the factory.
- Partitioning divides a disk into one or more regions, writing data structures to the disk to indicate the beginning and end of the regions. This level of formatting often includes checking for defective tracks or defective sectors.
- High-level formatting creates the file system format within a disk partition or a logical volume. This formatting includes the data structures used by the OS to identify the logical drive or partition's contents. This may occur during operating system installation, or when adding a new disk. Disk and distributed file system may specify an optional boot block, and/or various volume and directory information for the operating system.
Low-level formatting of floppy disks
The low-level format of floppy disks (and early hard disks) is performed by the disk drive's controller.
Consider a standard 1.44 MB floppy disk. Low-level formatting of the floppy disk, normally writes 18 sectors of 512 bytes to each of 160 tracks (80 on each side) of the floppy disk, providing 1,474,560 bytes of storage on the disk.
Physical sectors are actually larger than 512 bytes, as in addition to the 512 byte data field they include a sector identifier field, CRC bytes (in some cases error correction bytes) and gaps between the fields. These additional bytes are not normally included in the quoted figure for overall storage capacity of the disk.
Different low-level formats can be used on the same media; for example, large records can be used to cut down on inter-record gap size.
Several freeware, shareware and free software programs (e.g. GParted, FDFORMAT, NFORMAT and 2M) allowed considerably more control over formatting, allowing the formatting of high-density 3.5" disks with a capacity up to 2 MB.
Techniques used include:
- head/track sector skew (moving the sector numbering forward at side change and track stepping to reduce mechanical delay),
- interleaving sectors (to boost throughput by organizing the sectors on the track),
- increasing the number of sectors per track (while a normal 1.44 MB format uses 18 sectors per track, it is possible to increase this to a maximum of 21), and
- increasing the number of tracks (most drives could tolerate extension to 82 tracks: though some could handle more, others could jam).
Linux supports a variety of sector sizes, and DOS and Windows support a large-record-size DMF-formatted floppy format.
Low-level formatting (LLF) of hard disks
Hard disk drives prior to the 1990s typically had a separate disk controller that defined how data was encoded on the media. With the media, the drive and/or the controller possibly procured from separate vendors, users were often able to perform low-level formatting. Separate procurement also had the potential of incompatibility between the separate components such that the subsystem would not reliably store data.[nb 5]
User instigated low-level formatting (LLF) of hard disk drives was common for minicomputer and personal computer systems until the 1990s. IBM and other mainframe system vendors typically supplied their hard disk drives (or media in the case of removable media HDDs) with a low-level format. Typically this involved subdividing each track on the disk into one or more blocks which would contain the user data and associated control information. Different computers used different block sizes and IBM notably used variable block sizes but the popularity of the IBM PC caused the industry to adopt a standard of 512 user data bytes per block by the middle 1980s.
Depending upon the system, low-level formatting was generally done by an operating system utility. IBM compatible PCs used the BIOS, which is invoked using the MS-DOS debug program, to transfer control to a routine hidden at different addresses in different BIOSes.
Transition away from LLF
Starting in the late 1980s, driven by the volume of IBM compatible PCs, HDDs became routinely available pre-formatted with a compatible low-level format. At the same time, the industry moved from historical (dumb) bit serial interfaces to modern (intelligent) bit serial interfaces and word serial interfaces wherein the low level format was performed at the factory.
Today, an end-user, in most cases, should never perform a low-level formatting of an IDE or ATA hard drive, and in fact it is often not possible to do so on modern hard drives because the formatting is done on a servowriter before the disk is assembled into a drive in the factory.
While it is generally impossible to perform a complete LLF on most modern hard drives (since the mid-1990s) outside the factory, the term "low-level format" is still used for what could be called the reinitialization of a hard drive to its factory configuration (and even these terms may be misunderstood).
The present ambiguity in the term low-level format seems to be due to both inconsistent documentation on web sites and the belief by many users that any process below a high-level (file system) format must be called a low-level format. Since much of the low level formatting process can today only be performed at the factory, various drive manufacturers describe reinitialization software as LLF utilities on their web sites. Since users generally have no way to determine the difference between a complete LLF and reinitialization (they simply observe running the software results in a hard disk that must be high-level formatted), both the misinformed user and mixed signals from various drive manufacturers have perpetuated this error. Note: Whatever possible misuse of such terms may exist (search hard drive manufacturers' web sites for all these terms), many sites do make such reinitialization utilities available (possibly as bootable floppy diskette or CD image files), to both overwrite every byte and check for damaged sectors on the hard disk.
Reinitialization should include identifying (and sparing out if possible) any sectors which cannot be written to and read back from the drive, correctly. The term has, however, been used by some to refer to only a portion of that process, in which every sector of the drive is written to; usually by writing a specific value to every addressable location on the disk.
Traditionally, the physical sectors were initialized with a fill value of as per the INT 1Eh's Disk Parameter Table (DPT) during format on IBM compatible machines. This value is also used on the Atari Portfolio. 8-inch CP/M floppies typically came pre-formatted with a value of , and by way of Digital Research this value was also used on Atari ST and some Amstrad formatted floppies.[nb 6] Amstrad otherwise used as a fill value. Some modern formatters wipe hard disks with a value of instead, sometimes also called zero-filling, whereas a value of is used on flash disks to reduce wear. The latter value is typically also the default value used on ROM disks (which cannot be reformatted). Some advanced formatting tools allow configuring the fill value.[nb 7]
One popular method for performing only the zero-fill operation on a hard disk is by writing zero-value bytes to the drive using the Unix dd utility with the /dev/zero stream as the input file and the drive itself (or a specific partition) as the output file. This command may take many hours to complete, and can erase all files and file systems.
Another method for SCSI disks may be to use the sg_format command to issue a low-level SCSI Format Unit Command.
Zero-filling a drive is not necessarily a secure method of erasing sensitive data[not in citation given], or of preparing a drive for use with an encrypted filesystem.
Main article: Disk partitioning
Partitioning is the process of writing information into blocks of a storage device or medium that allows access by an operating system. Some operating systems allow the device (or its medium) to appear as multiple devices; i.e. partitioned into multiple devices.
On MS-DOS, Microsoft Windows, and UNIX-based operating systems (such as BSD, Linux and Mac OS X) this is normally done with a partition editor, such as fdisk, GNU Parted, or Disk Utility. These operating systems support multiple partitions.
In current IBM mainframe OSs derived from OS/360 and DOS/360, such as z/OS and z/VSE, this is done by the INIT command of the ICKDSF utility. These OSs support only a single partition per device, called a volume. The ICKDSF functions include creating a volume label and writing a Record 0 on every track.
Floppy disks are not partitioned; however depending upon the OS they may require volume information in order to be accessed by the OS.
Partition editors and ICKDSF today do not handle low level functions for HDDs and optical disc drives such as writing timing marks, and they cannot reinitialize a modern disk that has been degaussed or otherwise lost the factory formatting.
High-level formatting is the process of setting up an empty file system on a disk partition or logical volume and, for PCs, installing a boot sector. This is a fast operation, and is sometimes referred to as quick formatting.
The entire logical drive or partition may optionally be scanned for defects, which may take considerable time.
In the case of floppy disks, both high- and low-level formatting are customarily performed in one pass by the disk formatting software. 8-inch floppies typically came low-level formatted and were filled with a format filler value of .[nb 6] Since the 1990s, most 5.25-inch and 3.5-inch floppies have been shipped pre-formatted from the factory as DOS FAT12 floppies.
In current IBM mainframe operating systems derived from OS/360 or DOS/360, this may be done as part of allocating a file, by a utility specific to the file system or, in some older access methods, on the fly as new data are written.
Host protected area
Main article: Host protected area
The host protected area, sometimes referred to as hidden protected area, is an area of a hard drive that is high level formatted so that the area is not normally visible to its operating system (OS).
Reformatting is a high-level formatting performed on a functioning disk drive to free the medium of its contents. Reformatting is unique to each operating system because what actually is done to existing data varies by OS. The most important aspect of the process is that it frees disk space for use by other data. To actually "erase" everything requires overwriting each block of data on the medium; something that is not done by many high-level formatting utilities.
Reformatting often carries the implication that the operating system and all other software will be reinstalled after the format is complete. Rather than fixing an installation suffering from malfunction or security compromise, it is sometimes judged easier to simply reformat everything and start from scratch. Various colloquialisms exist for this process, such as "wipe and reload", "nuke and pave", "reimage", etc.
DOS, OS/2 and Windows
format command: Under MS-DOS, PC DOS, OS/2 and Microsoft Windows, disk formatting can be performed by the command. The program usually asks for confirmation beforehand to prevent accidental removal of data, but some versions of DOS have an undocumented option; if used, the usual confirmation is skipped and the format begins right away. The WM/FormatC macro virus uses this command to format drive C: as soon as a document is opened.
Unconditional format: There is also the parameter that performs an unconditional format which under most circumstances overwrites the entire partition, preventing the recovery of data through software. Note however that the switch only works reliably with floppy diskettes (see image to the right). Technically because unless is used, floppies are always low level formatted in addition to high-level formatted. Under certain circumstances with hard drive partitions, however, the switch merely prevents the creation of information in the partition to be formatted while otherwise leaving the partition's contents entirely intact (still on disk but marked deleted). In such cases, the user's data remain ripe for recovery with specialist tools such as EnCase or disk editors. Reliance upon for secure overwriting of hard drive partitions is therefore inadvisable, and purpose-built tools such as DBAN should be considered instead.
Overwriting: In Windows Vista and upwards the non-quick format will overwrite as it goes. Not the case in Windows XP and below.
OS/2: Under OS/2, if you use the parameter, which specifies a long format, then format will overwrite the entire partition or logical drive. Doing so enhances the ability of CHKDSK to recover files.
Unix-like operating systems
High-level formatting of disks on these systems is traditionally done using the command. On Linux (and potentially other systems as well) is typically a wrapper around filesystem-specific commands which have the name , where fsname is the name of the filesystem with which to format the disk. Some filesystems which are not supported by certain implementations of have their own manipulation tools; for example Ntfsprogs provides a format utility for the NTFS filesystem.
Some Unix and Unix-like operating systems have higher-level formatting tools, usually for the purpose of making disk formatting easier and/or allowing the user to partition the disk with the same tool. Examples include GNU Parted (and its various GUI frontends such as GParted and the KDE Partition Manager) and the Disk Utility application on Mac OS X.
Recovery of data from a formatted disk
As in file deletion by the operating system, data on a disk are not fully erased during every high-level format. Instead, the area on the disk containing the data is merely marked as available, and retains the old data until it is overwritten. If the disk is formatted with a different file system than the one which previously existed on the partition, some data may be overwritten that wouldn't be if the same file system had been used. However, under some file systems (e.g., NTFS, but not FAT), the file indexes (such as $MFTs under NTFS, inodes under ext2/3, etc.) may not be written to the same exact locations. And if the partition size is increased, even FAT file systems will overwrite more data at the beginning of that new partition.
From the perspective of preventing the recovery of sensitive data through recovery tools, the data must either be completely overwritten (every sector) with random data before the format, or the format program itself must perform this overwriting, as the DOS command did with floppy diskettes, filling every data sector with the format filler byte value (typically ).
However, there are applications and tools, especially used in forensic information technology, that can recover data that has been conventionally erased. In order to avoid the recovery of sensitive data, governmental organization or big companies use information destruction methods like the Gutmann method. For average users there are also special applications that can perform complete data destruction by overwriting previous information. Although there are applications that perform multiple writes to assure data erasure, any single write over old data is generally all that is needed on modern hard disk drives. The ATA Secure Erase can be performed by disk utilities to quickly and thoroughly wipe drives.Degaussing is another option; however, this renders the drive unusable.
- ^E.g., formatting a volume, formatting a Virtual Storage Access MethodLinear Data Set (LDS) on the volume to contain a zFS and formatting the zFS in UNIX System Services.
- ^Not true for CMS file system on a CMS minidisk, TSS VAM-formatted volume, z/OS Unix file systems or VSAM in IBM mainframes
- ^"The LBAs on a logical unit shall begin with zero and shall be contiguous up to the last logical block on the logical unit"., Information technology — Serial Attached SCSI - 2 (SAS-2), INCITS 457 Draft 2, May 8, 2009, chapter 4.1 Direct-access block device type model overview.
- ^Each process may involve multiple steps, and steps of different processes may be interleaved.
- ^This problem became common in PCs where users used RLL controllers with MFM drives; "MFM drives should not be used on RLL controllers.".
- ^ abThe fact that 8-inch CP/M floppies came pre-formatted with a filler value of is the reason why the value of has a special meaning in directory entries in FAT12, FAT16 and FAT32 file systems. This allowed 86-DOS to use 8-inch floppies out of the box or with only the FAT initialized.
- ^One utility providing an option to specify the desired fill value for hard disks is DR-DOS' FDISK R2.31 with its optional wipe parameter (for a fill value of ). In contrast to other FDISK utilities, DR-DOS FDISK is not only a partitioning tool, but can also format freshly created partitions as FAT12, FAT16 or FAT32. This reduces the risk of accidentally formatting the wrong volume.
- ^ abcTanenbaum, Andrew (2001). Modern Operating Systems (2nd ed.). section 3.4.2, Disk Formatting. ISBN 0130313580.
- ^"FORMAT", z/VM CMS Commands and Utilities Reference, z/VM Version 5 Release 4, IBM, 2008, SC24-6073-03,
- ^IBM, "Virtual Access Methods", IBM System/360 Time Sharing System System Logic Summary Program Logic Manual(PDF), IBM, p. 56 (PDF 66), GY28-2009-2,
- ^Hermans, Sherman (28 August 2006). "How to recover lost files after you accidentally wipe your hard drive". Linux.com. Retrieved 22 November 2012.
- ^Smithson, Brian (29 August 2011). "The Urban Legend of Multipass Hard Disk Overwrite and DoD 5220-22-M". Infosec Island. Retrieved 22 November 2012.
- ^"IBM 1301 disk storage unit". IBM. Retrieved 2010-06-24.
- ^"IBM 3390 direct access storage device". IBM.
- ^ISO/IEC 791D:1994, AT Attachment Interface for Disk Drives (ATA-1), section 7.1.2
- ^Smith, Ryan (2009-12-18). "Western Digital's Advanced Format: The 4K Sector Transition Begins". Anandtech.
- ^"Transition to Advanced Format 4K Sector Hard Drives". Seagate Technology.
- ^Using DEBUG to Start a Low-Level Format, Microsoft
- ^The NOSPIN Group, Inc. (n.d.). Low level formatting an IDE hard drive (archived). Retrieved December 24, 2003.
- ^The PC Guide. Site Version: 2.2.0 - Version Date: April 17, 2001 Low-Level Format, Zero-Fill and Diagnostic Utilities. Retrieved May 24, 2007.
- ^Many enterprise class HDDs can be low-level formatted to block sizes other than 512 bytes; e.g., Seagate SAS drivesArchived 2010-11-29 at the Wayback Machine. support sector sizes of 512, 520, 524 or 528 bytes and can reformatted from one size to another.
- ^ abSchulman, Andrew; Brown, Ralf D.; Maxey, David; Michels, Raymond J.; Kyle, Jim (1994). Undocumented DOS: A programmer's guide to reserved MS-DOS functions and data structures - expanded to include MS-DOS 6, Novell DOS and Windows 3.1 (2 ed.). Addison Wesley. ISBN 978-0-201-63287-3. ISBN 0-201-63287-X.
- ^How to Securely Erase (Wipe) a Hard Drive for Free with DD
- ^Quickly fill a disk with random bits
- ^Device Support Facilities User's Guide and Reference
- ^"AXCEL216 / MDGx MS-DOS Undocumented + Hidden Secrets". Retrieved 2008-06-07.
- ^"MSKB941961: Change in the behavior of the format command in Windows Vista". Microsoft Corporation. 2009-02-23. Retrieved 2012-10-24.
- ^"mkfs(8) - Linux man page". Retrieved 2010-04-25.
- ^Data are destroyed in PC operating systems when the /L (long) option is used on format, for a Partitioned Data Set (PDS) in MVS and for newer file systems on IBM mainframes.
- ^Deleting files permanently[unreliable source?]
- ^ ab"Secure Data Deletion". June 7, 2012. Retrieved 9 December 2013.
- ^"ATA Secure Erase (SE) and hdparm". Created: 2011.02.21, updated: 2013.04.02.
Hard Drive 101
How does a hard drive work?
Figure 1The inside of a hard disc drive, showing the disk platter and the read/write head.
While the disk platter looks like a mirror, it’s actually composed of up to trillions of tiny magnets standing on end, arrayed in concentric circles. The polarity of each magnet can be “up” or “down,” which indicates whether the bit is a 1 or a 0. The read/write head moves like a record tone arm, and can flip the polarity of the magnet when it’s writing data, or read the polarity when it’s reading data.
The magnets in a hard disk are organized in concentric circles — as many as 250,000 rings on a 3.5-inch platter. The head skims back and forth at up to 10 meters/second and must stop on a line 1/10 the width of a human hair, and then correctly read the polarity of each bit. It’s amazing that this is even possible, and even more amazing that it’s affordable.
A hard disk drive also has electronics to control the mechanism, to translate the data to a format that can be written to the disk and to do error correction and analysis. Hard drives have a power connector that provides juice for the motor that spins the drive and for the controller circuitry. Each drive also has a data interface: IDE/ATA or SATA for desktop drives, and Serial Attached SCSI (SAS) or Fibre Channel for enterprise drives.
Hard drive sizes
Hard drives come in two basic physical sizes: 2.5-inch and 3.5-inch. These sizes refer to the size of the data platters, not the size of the hard drive mechanism. Traditionally, 2.5-inch drives are used for laptops while 3.5-inch drives are used for desktop computers. Some compact desktops also use the smaller drives to enable a smaller form factor for the computer.
FIGURE 2 shows the two sizes of drives generally in use. 3.5 inch drives, on the right, are used in desktop computers and in freestanding storage devices. 2.5 inch drives are used in laptops and portable storage devices. Newer 2.5 inch drives are also being used in high-performance storage devices.
2.5-inch drives generally spin slower which means that they have slower data throughput. They also have a smaller data capacity and are more expensive per gigabyte. The smaller drives do have several advantages depending on the use.
- They are physically smaller so they can fit in laptops and small portable enclosures.
They may have better “seek” times, since the read head has less distance to travel than with a larger diameter drive.
They need less power to spin so they can generally be bus-powered, meaning they can draw power from a laptop without the use of an external power supply.
And since they are designed to be portable, most of them do a better job of “parking the heads” than full-size drives do. This means they are better able to survive being shipped around or used in a moving environment.
Recent developments in 2.5-inch drives are changing how the small drives are used. A new class of 2.5-inch high-speed drives has emerged that can be used in enterprise and server environments. At the moment these drives are very expensive per gigabyte.
Solid-state drives (SSD)
FIGURE 3Solid-state drives hold a number of advantages over spinning disks.
A new kind of storage device for computers has shown up in the marketplace over the last several years. Instead of spinning disks, solid-state flash memory is being used as primary storage. It offers a number of advantages, particularly for use in portable computing, and for speeding up certain kinds of data storage and access.
Read more about SSDs in this section
Hard drive capacity
The capacity of a hard drive refers to the amount of data it can hold. These days, capacity is measured in gigabytes or terabytes. Due to marketing reasons, the capacities listed on drive specifications may not be calculated in the same way that your operating system calculates data sizes.
For instance, a drive sold as “500GB” actually only contains 465GB (actually, the 500 number is gibibytes, and the 465 number is gigabytes. Aren’t you glad you asked?) Windows continues this practice, but Mac OS 10.6 and later changed the way it calculates size to match manufacturers’ practice.
For most still photographers, we generally suggest that it’s better to get the largest capacity drives you’re likely to need, at least for the next 6-12 months (if you’re on a RAID system, you’ll want a longer time frame — maybe two years — due to the complexity of upgrade). Running fewer drives saves on space, power draw and heat generation. It’s also easier to manage your drives if there are fewer of them.
For high-volume photographers and videographers, the issue can be significantly more complex. Storage needs for individual projects will easily climb to hundreds of gigabytes or into terabytes. If you are in this situation, acquiring hard drive capacity may resemble the model that was used back in the days of tape or film stock. Instead of keeping a general archive, you may have to factor the cost of storage into the price of each project, and buy the drives/tape/discs on a per-job basis.
It’s also possible that you need to go to a Storage Area Network (SAN) model for storage, where enterprise-class servers manage a large tiered storage pool.
READ MORE ABOUT SAN IN THIS SECTION
Should I use big drives or small drives?
One question comes up over and over. Is it better to have your primary storage on (fewer) bigger drives or (more) smaller ones? If you chose big drives, a single drive failure can take out a lot of files, so it might seem like you get more protection with a larger number of smaller drives. We don’t agree.
All your digital storage should be configured so that failure of any one drive does not kill the only copy of any files. You must backup the images to an additional device if you want to preserve them.
If you use a smaller number of larger drives for storage, you will simplify the process of keeping track of the drives, as well as the process for periodically checking on the integrity of your data. You’ll also use less energy to keep them spinning and save on storage or desktop space. Additionally, larger drives are likely to be newer and faster.
Hard drive rotation speeds
As part of its specifications, each hard drive has a speed at which the platter rotates, measured in RPMs. The faster the drive, the faster the throughput, since the head reads and writes the bits at a faster rate.
2.5-inch consumer drives typically spin at 4200, 5200, 5400 and 7200 RPMs. 7200 RPM drives are a good choice at the moment, but sometimes 7200 RPM drives have too large a power draw or generate too much heat for the portable devices in which they are housed. The enterprise-class 2.5-inch drives currently spin at 10,000 or 15,000 RPMs.
3.5-inch drives generally come in 5200, 7200, 10,000 and 15,000 RPM models. The 7200 RPM models are good all-purpose drives and have the largest capacities. The faster drives are generally used for system or scratch disks, where fast disk-swapping speeds up the performance of programs like Photoshop, which often have to work with large files.
You can also purchase variable-speed 3.5 inch drives, typically sold as “energy-saving” or “green” drives, running between 5400 and 7200 RPMs. These drives use less energy and have slower data transfer rates. This makes them a reasonable choice for Archives and for off-line backups.
Hard drive interfaces
Hard drives come with one of several different connectors built in. When you buy a drive, it will specify which one is built into the drive. The five types are ATA/IDE and SATA for consumer-level drives, and SCSI, Serial Attached SCSI (SAS), and Fibre Channel for enterprise-class drives.
For many years, Advanced Technology Attachment (ATA) connections were the favored internal drive connection in PCs. Apple adopted ATA with the Blue and White G3 models. ATA drives must be configured as either a master or a slave when connecting. This is usually accomplished by the use of a hardware jumper or, more recently, through the use of a cable that can tell the drive to act as either a master or slave.
ATA also goes by the name ATAPI, IDE, EIDE and PATA, which stands for Parallel ATA. ATA is still in use in many computers today, but most drive manufacturers are switching over to SATA (Serial ATA). If you have any devices that still use PATA drives, that’s a good clue that you’re in need of an upgrade.
As of 2007, most new computers (Macs and PCs, laptops and desktops) use the newer SATA interface. It has a number of advantages, including longer cables, faster throughput, multidrive support through port multiplier technology, and easier configuration. SATA drives can also be used with eSATA hardware (discussed later) to enable fast, inexpensive configuration as an external drive. Most people investing in new hard drive enclosures for photo storage should be using SATA drives.
SCSi/SAS and Fibre Channel
SCSI, SAS, and Fibre Channel drives are rare in desktop computers, and are typically found in expensive enterprise-level storage systems. You can also find SAS drives (along with the necessary SAS controller cards) in video editing systems where maximum throughput is needed.
Some of the faster drives, such as Western Digital Raptors, come with SAS connectors, so be aware when you mail-order one. Standard SATA drives can be connected to an SAS controller, but SAS drives can't be connected to a standard SATA controller.
Hard drive enclosures
Now that we’ve gone over some characteristics of hard drive mechanisms, let’s consider where the drive can live. The enclosure for your hard drive can be the computer itself (for an internal drive), a single-drive external case, or a multiple- drive external case.
If you are using a tower computer to store your archive, it is likely that you have one or more empty drive bays inside the computer that can hold a new drive. Some advantages of using internal drives are that they are the cheapest way to add storage and they take up the least amount of room. They are also capable of connecting directly to the computer’s logic board, so they provide fast access. One drawback is that they aren’t as easy to swap out as external drives.
Figure 4 Adding a single-drive external enclosure is an easy way to add storage to your computer system.
If you don’t have an empty drive bay, or if installing a new internal drive seems too daunting, it is usually very easy to add external drives to your computer using FireWire (IEEE1394 or IEEE1394b), USB (2 or 3), Thunderbolt, or eSATA connections. External single-drive cases have the advantages of being easily portable and not increasing the demand on your computer’s cooling system. The drawbacks are the higher cost and extra clutter.
You can get single-drive externals in two ways.
- You can purchase an external drive as a ready-made unit. These devices offer a quick and economical way to add storage to your system, but they often come with a shorter warranty than a bare drive, and oftentimes these drives suffer from poor throughput. Manufacturers will often sell their lowest performing drives in external cases.
- You can also purchase a freestanding enclosure and an internal drive and put them together, like the one pictured in Figure 4. We like this option because it offers more control over the components and because we can reuse the case when we outgrow the capacity of the drive.
Multiple-drive cases are an excellent solution for a large archive. Although they are larger, there’s less wiring clutter than with several single-drive cases. And once you have bought a big drive box, you can fill it with less-expensive internal drives, which you can later swap out for higher capacity drives as additional space is required. This is the arrangement that we currently favor.
FIGURE 5 shows a four-bay external drive enclosure. This is a trayless model for SATA drives. These units provide an easy way to add more storage to your computer.
External hard drive interfaces
The hard drive mechanism has its internal interface (PATA, SATA, SAS, or Fibre Channel), and the enclosure has one or more external interfaces as well. The external interface determines how the drive enclosure connects to the computer. There are four principal ones in use, and a few additional ones that are used in high-end systems. Figure 6 shows a drive that has the three most common connection types.
FIGURE 6This photograph shows an external drive with all the most common interfaces.
USB is the most universal connection method for adding peripheral devices to computers. On the PC, USB 2 (stay away from USB 1 because of its slow speeds) is a good way to connect external drives. Data throughput maxes out at a theoretical 30 megabytes per second per device, in most cases. Due to the USB drivers in the Mac OS, USB is considerably slower on Apple machines. USB 3.0 version was recently released and offers a tenfold increase in theoretical performance. USB connectors can supply bus power to attached devices.
Multiple USB devices can be connected to a single port by means of an external hub.
FireWire 400 and FireWire 800 (also known as IEEE1394 and IEE1394b) are more modern connection protocols than USB, with theoretical transfer maximums of 50 and 100 megabytes per second. FireWire devices can be daisy chained, allowing the use of multiple drives on a single port. Like USB, implementations differ between Mac and PC, with Mac generally making greater use of the speed capabilities than PC. FireWire can also offer bus power to run external drives if the FireWire port is a four-pin, six-pin or nine-pin port. (Many PCs only offer four-pin ports.)
Multiple devices may be connected to a single FireWire port, by means of “daisy-chain” connection from one FireWire device to another.
eSATA is a configuration that creates a SATA connection in an external enclosure. It’s generally a fast and stable connection, offering up to 150, 300 or 600 megabytes per second. eSATA is relatively common as a built-in external connection on PC, but is not built in to any Apple computers. You can add eSATA to Apple computers and older PCs by means of an expansion card, such as Peripheral Component Interface express(PCIe) for desktops and ExpressCard for some laptops.
Conventional eSATA does not have the capability to bus-power hard drives so you must use an external power source. We are starting to see some Powered eSATA drives on the market, but they are rare.
eSATA is often described as hot-swappable, meaning that you can disconnect and reconnect different drives without restarting the computer, but this is often not the case. The design of the host (the way the eSATA is connected to the logic board) will determine if the connection really is hot-swappable.
Multiple eSATA devices can be connected to a single port if the port supports “Port Multiplication”.
In 2011, Apple released the first computers with a built-in Thunderbolt connection. This interface supports multiple streams of high-resolution video as well as multiple streams of fast data using the Mini DisplayPort connector. The Thunderbolt standard supports external storage devices as well as external monitors. The data connectivity of Thunderbolt is based on the same kind of PCIe connection that is used with expansion cards on tower computers – basically, it offers a direct connection to the logic board for unsurpassed speed.
The standard also supports the use of adapter cables that allow FireWire, USB and eSATA devices to be plugged into Thunderbolt ports. At the time of this writing, Thunderbolt accessories, cables and peripherals are rare, probably due to the low supply of Thunderbolt chipsets that are needed to provide the Thunderbolt connection.
Up to seven devices (Including monitors in that count) can be connected by daisy-chain to a Thunderbolt port.
Figure 7The Thunderbolt connection carries both video and data over a single tiny connector.
Internet Small Computer System Interface (iSCSI) is a connection method that uses existing Ethernet hardware to attach the storage to the computer. An iSCSI device can be attached directly to a computer's network port, or a router or switch can connect it. It's fast and flexible, and offers throughput in the neighborhood of 120 MB/s.
Note that iSCSI needs "initiator" software that manages the connection. Some devices, such as the DroboPro shown in Figure 8 include this software. Other iSCSI device manufacturers suggest you purchase separate iSCSI initiator software.
FIGURE 8shows the connectors on a DroboPro unit. From left to right they are USB, Firewire 800 and iSCSI.
SAS connections can be internal or external. This fast connection is found mostly on enterprise-level hardware, like dedicated servers, RAID, and tape drive mechanisms. Throughput for SAS devices is similar to SATA 2 or 3, in the neighborhood of 300 or 600 MB/s.
Fibre Channel is a technology that has migrated from supercomputers down to enterprise-level storage (big companies). It offers a high throughput and the potential to be used over distances of several hundred feet. It can be used over copper cable as well as optical fiber. It is rated at up to 3.2 GB/s.
Choosing the right hard drive and connection
When you add external storage to your computer, you’ll want to make sure it’s fast enough for the task at hand. Sometimes, speed won’t be terribly important, such as backup storage for your Archive files. Sometimes speed will be critically important, such as primary storage for video source files. In most cases, it’s not hard to know when your storage speed is the workflow bottleneck. Downloads and transfers will take too long, or Photoshop will seem to stop as you hear the hard drives grinding away.
Choosing the right speed of drive and a drive connection is not terribly hard, but the specifications that are published can be misleading. Sometimes, manufacturers will use the speed of a connection port as the listed speed of the device, when the actual drive is much slower than that. And many connection types don’t actually live up to the listed speed. USB 2, for instance, specifies a transfer rate of 60 MB/s. But that’s really for two devices on the same USB port, and there are almost no single devices that will perform faster than 30 MB/s.
Match the connection speed to the drive speed
There’s no point in paying a lot extra for a fast connection if the drive delivers data at a small fraction of the speed. And there’s no point in setting up fast disks and connecting them with a too-slow connection. The chart in Figure 9 outlines some rough data transfer rates for drive types and for connection types.
Mbps and MB/s
When you look at drive speed figures, you will often see two different notations that look very similar. Megabits per second is written as Mbps, and megabytes per second is typically written as MB/s. There are 8 bits per byte, so the relationship between the two is exactly 8:1. It’s the same with gigabits (Gb) and gigabytes (GB). When the b is lower case, the notation is bits, when it is capitalized, it is bytes. Since most of us think in bytes rather than bits, that’s the one we’ll use for comparison.
For instance, FireWire 400 is named for the number of megabits that can be transferred in a second, which is about 400. Divide that by 8 to get the number of megabytes that can be transferred in a second: 50. (It’s actually just a little bit less: 393 Mbps and 49 MB/s).
Of course, a gigabyte is 1000 megabytes, so once measurements get above 1000 MB/s, we change to GB/s.
Note that there is a difference between the rated speed and the typical real-world speed. All connections provide slower actual throughput than the rated speeds – some significantly so. Check the chart in Figure 9 to get a better idea of actual speed.
Hard drives almost never achieve maximum throughput
It’s very difficult to outline real-world speed for a drive. Hard drives are slowed down considerably when they read or write small files. Data on the outer rings of a drive platter is read faster than data on the inner rings. And as a drive fills up, things slow down even more.
A single 7200 RPM drive, for instance, should outperform a FireWire 800 connection, since peak data transfer is typically above FireWire 800’s 98 MB/s. But you will only find that happening in rare circumstances – in most cases, the drive will be serving up data at a significantly slower rate.
Bigger files transfer much faster than smaller files
When you transfer a big file, your drive can spend most of its time actually reading or writing data, so it works at its most efficient pace. When you transfer smaller files, the drive spends a lot more time “seeking” the files – moving the head to the part of the data platter that contains the files.
SSDs are able to do a much better job with small files, since no parts need to move to the place the data is stored, but smaller files still slow SSDs down. That’s because there’s a certain amount of administrative overhead associated with each file read or write.
Bigger drives are usually faster
There are several reasons that larger capacity drives are usually faster than a comparable RPM drive of smaller capacity.
- Most important, the larger drives are probably newer and, like most computer components, newer is going to be faster due to general technological development.
- Bigger drives are also more dense, which means the head has to travel a shorter distance between data bits. This speeds up the throughput.
- Bigger drives will have less data fragmentation, since there is more room to write files contiguously. This results in reduced seek time.
Drive and connection speed chart
The following chart lists sample speeds for hard drive devices. It can help you decide which external drive connection is right for you. Note that it’s only a rough guide. It is based on the general speeds of new hard drives of good brand-name quality.
Use this chart to help determine which parts of your storage configuration may be slowing you down. You can also use it to make sure that any new storage devices you buy will match the throughput of the connection type. (For instance, a high performance SSD would be wasted if it is on a slow FireWire 400 connection).
Figure 9 ￼This chart shows the connection speed of storage devices, connections types and network configurations, as measured in megabytes per second. These are typical speeds for maximum throughput when transferring big files. Small file transfer will be significantly slower, particularly for conventional hard drives.
Hard drive power supplies
Which power supply the drive will use depends on the case design. An internal drive added to a tower computer will use the computer’s power supply. This is tidier because you don’t have power cables running all over the place. It does tax the computer’s power supply, however, and that can lead to failure.
The power supply for single-drive external cases is typically a power brick that sits outside of the case. If you are going to use these, try always to buy the same brand so that you have swappable components to test if there is a problem.
The power supply for a multiple-drive enclosure is usually inside the case, and is a lot like the power supply inside your computer. If it fails, you can transfer the drives into another enclosure and keep working. (If the drives are in a RAID configuration, you’ll only want to transfer them to an exclosure with an identical RAID controller.)
Portable drives with 2.5-inch disks inside often use the power in USB or FireWire cables to provide electricity to the drive. This is a real convenience for portable devices, but there are a few caveats. Some drives (particularly faster ones) require more current than is supplied by the port. In these cases, the drive will either not fully mount or might disappear when the power draw gets too large. Unfortunately, the only way to see if a drive works with your computer is to hook it up and give it a try.
There’s another note of caution that you should be aware of when using bus-powered drives. Too high a current draw can burn out the port that the drive is connected to. This seems to be typically associated with running multiple drives daisy chained off a laptop’s FireWire port. If you need to run more than one drive off a single port, you should buy one that will accept an external power adapter.
Self-Monitoring Analysis and Reporting Technology (SMART) keeps track of status and error information for a drive and can be helpful in predicting drive failure. Most current computers can give you a pass/fail SMART status for internal drives, as well as for some eSATA-connected drives (if the eSATA port will support SMART data). You can also access the raw values, if you would like a more nuanced report on how well the drive is doing.
SMART data is not available for drives connected by FireWire or USB.
READ MORE IN THE DATA VALIDATION SECTION
|FIGURE 8 SMART Utility is a program that can read the raw SMART values from a drive and give you specific information about its status.|
Hard drive volume configurations
Now that we know about drives and how they can physically be connected, we need to know about the logical configuration. Does each drive show up as a single volume, as multiple volume partitions, or do multiple drives show up as though they were a single drive?
READ MORE IN DRIVE CONFIGURATIONS
Up to Data Storage Hardware
Back to Storage Hardware Overview
On to SSD 101