facebook rss twitter

RAID explained

by Mathew .. on 28 July 2005, 00:00

Quick Link: HEXUS.net/qabj2

Add to My Vault: x

RAID 0 through to RAID 4

RAID 0: Striping

Striping is a method of decreasing the amount of time taken to write data to the drives. If one imagines a large tank of water containing 1,000l of water with a tap at the base. Emptying that water into a 5,000l swimming pool will take a set amount of time. If however, a second tap is installed into the tank and it is used to fill two 5,000l swimming pools then the amount of time taken to will be halved.

Similarly, a hard drive has a maximum read/write speed, so writing a 1GiB file to a single drive will take a set amount of time. However, when writing that data to a striped array with 2 disks, you are writing 500MiB to each drive, theoretically halving the amount of time taken to write the data. With four disks in the array 250MiB is written to each disk, again decreasing the total amount of time taken for the data to be written. In this scenario, files are split into stripes of a pre-determined size and spread over the drives in the array rather than some files going on one drive in their entirety and others on another.

Striping, however, does not provide any method of redundancy. Should one disk fail all the data stored on the array will be lost. If you lose a single disk from a two disk array, you have essentially lost half the data making up the files stored on the array, with no means of reconstructing the lost data. This type of array is best where read and write speeds are paramount, either for an intensive procedure such as video or graphics editing or where data is infrequently changed but regularly backed up by external means.

RAID 1: Mirroring

Mirroring is one method of improving reliability of data storage (the other being parity), the idea is to build in redundancy by writing data to two or more disks simultaneously, hence the term “mirror”. Not only does mirroring provide 100% redundancy in case of disk failure but recovery is fairly swift should a disk fail because all the data is immediately accessible from another disk. Generally, performance is adversely affected by using a mirrored array; some forms of read performance are increased but write performance is decreased as the data is written more than once.

The major drawback for a mirrored array is the overheads in disk space wastage; half the total capacity of the hard drives is lost in providing redundancy, although with hard drives prices falling all the time a fairly large mirrored array can be obtained for a reasonable cost. Newer SATA drives and SCSI drives with hotplug ability mean that should a drive fail under RAID 1 the faulty drive can be removed and replaced without shutting down the entire system, which is very useful where large quantities of data are stored and uptime is important.

RAID 2: Bit-level striping with Hamming code ECC

RAID 2 is no longer used in modern systems and never caught on commercially as it was cripplingly expensive, interesting only in that it did not fully subscribe to standards for other RAID types, another reason perhaps for its demise.

RAID 3: Byte level Striping with Dedicated Parity Drive

Used very rarely, data is striped on the drives at the byte level rather than the block level. Generally used for applications where speed is paramount but where fault tolerance must be maintained, now superseded by RAID 5.

RAID 4: Block level Striping with Dedicated Parity Drive

A commonly used implementation of RAID before the advent of RAID 5, RAID 4 provides block-level striping (like RAID 0) with a parity disk. If a disk fails, the parity data is used to create a replacement disk. Parity is used to provide redundancy without the overhead costs involved in a mirrored array, whereas RAID 1 utilizes 50% of the total capacity the principle of parity is to take “X” amount of data and use that to compute an extra piece of data, then these “X+1” pieces of data are taken and stored on “X+1” drives. If any one of these pieces of data is lost then it can be restored from the data that remains. With parity, if you use a 4 disk array then you have the effective space of 3 disks, whereas with mirroring you would only get the equivalent of the space from 2 disks.

Parity protection is used with a RAID 0 (striped) array and “X” is normally the blocks or bytes distributed across the array. The parity data can be either on a dedicated parity drive as with RAID 4 or spread amongst the drives as in RAID 5.

Parity has some obvious advantages over mirroring in overhead costs, mirroring has a 50% overhead for its redundancy whereas parity has an overhead of 100/D where D = the number of drives in the array. As parity is used with a striped array the performance benefits of striping are also apparent.

The complexity of the millions of calculations that have to be performed every second leads to a major disadvantage with parity. Additional processing power is required leading to the necessity of a hardware controller for high performance. Software RAID with striping and parity utilizes a large amount of CPU power and slows the system as a whole. Similarly, should a drive fail the missing data has to be reconstituted, again requiring millions of calculations which takes time. A mirrored array is quick and simple to recover from particularly if hotpluggable drives are used. The use of a single parity drive can also create a bottleneck in read/write speeds slowing overall performance.