You may have heard the word "RAID" for a long time in computer work. Although this word certainly reminded you of a cockroach, it has nothing to do with preventive spraying of the computer against cockroaches. In this article we will try to decipher what RAID is and how it is used on computers.
Everybody, pretty much, at some point you've lost your data because your precious hard drive has decided to deliver its spirit.
But other than the unique photos you lost, there are computers where the logic that they can lose data is not acceptable at all. Organizations, banks, state records, and much more data, you realize can not be lost. The solution of course is backup. And even triple and quadruple Backup of the same data, each of which will be located in different places, even in other cities or countries.
But in addition to the backup, a way had to be found to save to the original computer, where if the disk crashes, the recovery time is not huge. That is, the bank should not have to download its systems for one day until the data is copied from a backup, back to the original computer (after someone first changes the hit disk).
Another issue that concerns computer professionals is increasing the speed of recording and reading data on a hard disk. The RAID solution was given to both issues. Let's go see.
The word RAID comes from the initials of the phrase "Redundant Array of Independent Disks", which means "Redundant Array of Independent Disks". In Greece it was established to call it RAID because you simply do not listen to the PSAD pleasantly !. The term "RAID" was coined by him David PattersonIn Garth A. Gibson and Randy Katz, at the University of California, Berkeley, 1987. What did these three think? That it is preferable for both security and speed issues to have an array of hard drives rather than a large single disc. And these discs can be very common, cheap discs, themselves worn on personal computers, which with RAID will perform better than an expensive, hi-tech, single disc.
What is RAID?
RAID is the cooperation technique, communication mode two or more hard drives so that a stack of disks acts as a single disk, increasing security and speed. And because this can be done in many ways, RAID has several levels.
So we have the levels 0, 1, 2, 3, 4, 5, and 6, which for better communication between us, we call them RAID 0, RAID 1, RAID 2 and so on. The most common levels used almost to the crown are 1, 5 and 6.
The differences between them will be seen below, but generally these levels move between safety and speed, and some are close to safety, while others are close to speed. You see, in this life you can not have both together, at least to the maximum extent.
Forgive us that we will not show you the different levels, but we do it for the educational good. Let's go to RAID 0 and go straight to RAID 1, 5 and 6.
What is RAID 1?
RAID1 is simply the faithful and concurrent copying one disc to another. That is, RAID1 is one mirroring (mirroring) and that it is written on one disc is automatically written to the other. So if one disc fails the system continues to operate smoothly with the second drive without interruption.
Of course he waits when you put a new disc in place of the damaged one so he can start copying himself, or else he can do it.rebuild”(Reconstruction).
In short, this is the simplest collaboration of two discs and of course it has been in favor and against. Look at them.
- Recording speed = RAID 1 writes at the speed of the slowest disk, as the slow disk delays the common, simultaneous recording on both disks.
- Read speed = because it can read from both disks simultaneously and independently, in theory the read speed can be the sum of the speeds of the two disks, since it can read half the data from one disk and the other half from the other.
- Security = we have a permanent backup and only if both disks are damaged we will lose our data at the same time. If one of the two fails, the system does not stop, but continues to provide service using the other disk.
- Capacity = While we have two trays, we can record data according to the capacity of the smallest disk. A big disadvantage is that we are definitely missing an 50% of the total capacity and if one disc is smaller than the other, say 1T and 750GB, RAID 1 will see a single 750 GB disk.
- Extensibility = Of course, we can add a trio and a fourth disc to those we want, but all will be copies of the first one and they will all have the capacity and speed of recording the smallest.
Generally however, RAID 1 is a good home solution for a personal computer that will store your data on two different drives simultaneously, and one that will be a copy of the other, and will also increase the read speed. If you already have two drives and the other one do a simple backup, consider the idea of doing RAID1 so you can back up and increase the read speed again.
RAID1 is not used to professional servers, where requirements are growing.
What is RAID5?
Almost all servers that have RAID use level 5. The reason? RAID 5 is the golden ratio between safety, speed and capacity.
RAID 5 to work wants at least 3 drives. I'm made so if you lose one of the three, your data will not be lost.
In RAID 5, the data is written to the disks using its technique Striping (lanes). The striping is the separation technique logically sequential data, so as to store these successive segments in different physical storage devices. Are you confused? In short, break a file into tracks (blocks), usually 64KB, and share these tracks to be stored in a row on all trays. The 1 track will be stored on disk1, 2 on disk2, 3 on disk3, and so on. This creates rows of horizontal recordings for aligned trays, such as the following drawing.
The key to RAID 5 is that, for each row (row) of the blocks stored in the disks, a parity block, which is stored on one of your discs. And because the parity technique is common from raid 3 to raid 6, let's take a big bracket and let's analyze it.
The Parity bits technique is based on the ability to easily find some lost data from a broken disk. Once the data is parsed in strips, RAID undertakes to create and register parity. See exactly what it does.
Although usually in RAID5 a file breaks into pieces of 64KB, for ease of comprehension, let's say we have a file that the binary system consists of only 9 bits 0 and 1 (surely you will know that every data on the computer actually consists of a group of 0 and 1 numbers, that is to say, in the binary system).
So let's say we have the 9-bit file "010101110" and break them into three equal pieces, ie 010, 101, 110 and save them on the 3 disks.
Comes now technical parity, sees 3 tracks and makes the following calculations using it logical XOR function (eXclusive OR).
For those who do not know, XOR is a logical function that adds 0 and 1 in its own way. From now on, forget the maths you know, and do not call your primary school teacher to get it, because you think he did not teach you well. It is not a mathematical function but logical function computers (semiconductors for accuracy).
XOR, as a result, makes 0, if we add two similar things
XOR (0, 0) = 0
XOR (1, 1) = 0
and respectively turns 1 if we add two different ones
XOR (0, 1) = 1
XOR (1, 0) = 1
And so parity reads the data from the first two disks, ie 010 and 101, and calculates the XOR for each corresponding bit. 1 from the first triple with 1 from the second. That is, it performs the following operations:
XOR (010, 101) = 111
Then it gets the result and again calculates the XOR with the 3 piece. That is, it performs:
XOR (111, 110) = 001
So the parity of 010, 101, 110 makes us 001. It records this on the next disk of RAID 5. The recordings on all 4 disks, ie 010 | 101 | 110 | 001 are also called row (series).
Finally, parenthesis with Striping and parity and return to RAID5. In RAID 5 the data is not divided by 3 Bits but per 64 KB (64KB = 65536 * 8 = 524288 bits). The process and logic remains the same as in 3 bits.
In an example with 4 total disks, once one of them fails, then RAID5 undertakes to do exactly the opposite work with the above and thus to redesign the missing data from the damaged disk. When you put a new disk in place of the old one, then RAID 5 with the reverse process discovers the data of the damaged disk and writes them to the new one you just put in, that is, rebuild.
If you break a second disk before you do the rebiuld of the first, then you will unfortunately lose all your data.
Note that parity is not written to a separate disk itself, but data and parity are distributed (distributed) all the trays in a rolling order. And while logic and parity block calculation is relatively simple, sharing parity blocks on all drives is a more complex issue. There are four different techniques for RAID 5 with regard to each parity position and if we happen to change the controller, because it can be corrupted, we need to know precisely the type of RAID and the write order to retrieve the data from the new controller .
Let's look for pro and against RAID 5
- Recording speed = The total recording speed is the sum of the speeds of all disks minus one. That is, in an array of 3 + 1 disks we have tripled the recording speed. Somewhere here, of course, comes the quality of the controller that performs RAID 5, since it has to do with how fast it calculates and writes the parity.
- Read speed = It is also valid for recording speed
- Security = Very good security if you consider that you can have an array of 10 disks and you are not afraid that one of them will be damaged. Provided only one disk is damaged, until you manage to rebuild your system with a new disk. If you do not catch a second one, then all the data from all 10 disks will be lost !!. Rebuilding a disk requires re-reading all data from all disks, opening up a possibility of a second disk failure and loss of all data.
- Capacity = Because the data is evenly distributed across all disks, the capacity will be as much as the smaller disk's capacity on the total discs minus one. For example, if we have 1 750 GB and 3 discs of 1 TB then the total capacity of RAID 5 will be (4-1) * 750GB = 2,25TB. By analogy, with 3 same disks in the array, the third is lost as parity, so we lose 33% of the total. The more discs we add so this percentage decreases.
- Extensibility = Of course, we can add a trio and a fourth album to those we want, but if we spoil two of them, then we lose our data. Even 50 discs to have, should not spoil two together.
What is RAID 6?
Security above all. RAID6 is the same as the previous RAID5, but instead of a parity has two parity blocks. Double parity provides error tolerance of up to two inadequate disks. That is, our system will be functional even if we lose 2 discs.
This is more practical for RAID groups that carry a lot of small disks, especially if we are talking about high availability systems, as large capacity drives need more time to recover. RAID 6 requires at least four disks. As with RAID 5, a drive failure results in reduced performance of the entire array until the failed drive is replaced. In previous tests with RAID 5 software, a 1TB disk took about 4-5 hours to rebuild.
With a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to alleviate most of the problems associated with RAID 5. The probability of spoiling 3 discs together is far less than spoiling 2, which would be a problem with RAID 5.
The second parity is not just a copy of the first one. Instead, it is recalculated using a different method than XOR, called Finite field or Galois field and has to do with field theory and complex mathematics. However, since the article is addressed to beginners, it is better not to get involved in explaining this method. At least keep you sober up to the end of the article.
Let us also see the pros and cons:
- Record speed = In the record, because of the extra complexity of calculating parity, it is highly dependent on the controller. At best it will be equal to the sum of the gears of all discs minus two.
- Read Speed = The speed of RAID 6 in reading reaches the maximum as the sum of the speed of all discs minus two.
- Security = Much better security than its main competitor, RAID 5. The chance of three disks crashing and losing our data is minimized. Besides, RAID6 is famous for its good operation, in multi-disk arrays. Consider that the more disks, the more likely it is that two of them will be damaged at the same time. Whenever RAID 6 in this matter is one way.
- Capacity = Total capacity is the size of the smallest disk on the numbers of all discs minus two. That is, if we have 1 of 750GB, 2 of 1TB and 1 of 1,5TB, then the total capacity will be 750 * (4-2) = 1,5 TB.
- Extensibility = In a RAID 6 array we can add as many discs as we want.
What are RAID 1E, 5E, 5EE, 6E
These are the RAIDs you know, so far, only they have an empty spare disk. That is why they call it E, from “Enhanced”(Enhanced).
When these RAIDs spoil a disk then our system is in danger. It should be replaced as soon as possible and then rebuild the whole system. But until all this is done, you are in real danger of destroying a second disk (or 3 for raid 6) and losing all your data.
That's why they put in the stack and a spare empty disk, sitting and waiting and calling it Hot-Spare. If a disk is damaged, the backup automatically comes on and the rebuild is done immediately, without any time until a technician finds it. Especially during holidays and holidays.
In RAID 1E you have a 1 standard disc, a mirroing disc and a third spare disk as a spare. In RAID 5E, the hot-spare disk is distributed as part of the set of disks, in mounted tracks at the end of each disc. Generally, 5Es require the least total 4 discs.
5EE differs with 5E with regard to the position of the backup disk in the battery and the overall function. At 5E, the backup disk is placed in the last position while in 5EE it is placed intermediate and participates in the RAID function by increasing the array levels by one more. This reduces the rebuild time.
There are many critics of Hot-Spare and the possibility of an automatic rebuild. Because, as we said above, rebuilding a table requires reading all data from all disks, opening up a possibility of a second disk failure and loss of all data, they believe that before rebuilding begins, it should first be checked. by a technician.
Being aware of Murphy's law, no one would risk direct reconstruction after a single disc failure, and using a Hot-Spare this will just happen.
What is RAID0?
Maximum speed and no security. RAID 0 is a clean striping without any parity. It requires at least two disks and the data is divided into blocks, which are written in part on all the trays that form the array. If you have four disks, instead of having to wait for the system to write 256k data to a disk, a RAID0 system can write 64k simultaneously on each of the four discs of an array, offering excellent I / O performance.
You realize that if even one disc is lost then all data is lost !!!. RAID0 is only recommended for situations of interest in speed and not the possibility of losing data, such as RAID0 write your operating system and a separate disk, irrelevant to RAID, that you only have your data.
Pros and cons:
- Record speed = It is the sum of the write speeds of all discs.
- Read Speed = The sum of the read speeds of all discs.
- Security = Zero! If you lose a disc, you will lose it at some point, then you will lose all the data and you will have to re-create everything from scratch.
- Capacity = Total capacity is the size of the smallest disk on the numbers of all disks. That is, if we have 1 of 750GB, 2 of 1TB and 1 of 1,5TB, then the total capacity will be 750 * 4 = 3,0 TB.
- Extensibility = In a RAID0 array we can add as many discs as we want.
What you have to remember
- RAID was created due to the need for speed, security of data destruction, and minimization of damage recovery time.
- The most common RAIDs are 1, 5 and 6.
- Striping is sharing a file into tracks on multiple drives.
- Parity is the technology that can rebuild lost data.
- XOR is a logical function
- RAID0 offers only speed, without any security.
- RAID1 is simple mirroring. Good for home computers
- RAID5 is striping with a parity. The golden section in speed and safety and capacity.
- RAID6 is striping with two parity. Focusing on security.
- Hot-Spare or simply E at the end of the name, is an extra backup disk.
Making RAID combinations
If you have imagination and creativity, then you can marry all of the above, aiming at the best solution for your needs. With the logic that a RAID at the end looks like a single disk, nothing prevents you from combining the different RAIDs among them. You can make RAID 10, 01, 50, 60, 100.
In relation to you, we say RAID01 is two RAID0 arrays, one copied to RAID1. That is, we have two mirroring RAID0. It requires at least 4 disks and each RAID0 has a double write and read speed from a single disk, and because they are in RAID 1 between them, the total theoretical speed is at least four times higher. Security, of course, follows the logic of RAID1
Just the reverse of 01, but more popular than the previous one. We have two arrays of RAID1 that we put and striping like RAID0. This layout allows up to two disks to fail, as long as they are in different RAID1. If a third disk fails, wherever it is, then all data is lost. Speed is the same as RAID01, ie the maximum is quadrupled.
These are two arrays of RAID5 that we put them together as two discs with a RAID0. Requires at least six trays. It can lose a maximum of one disk from each RAID5 array, and at speeds it is twice as high as RAID5, totaling four times.
The same as above. We have two arrays with RAID6 and put them together as RAID0. It requires at least eight trays.
If you have more than eight disks and you like lego, then you make four RAID1s, put them in two striped on RAID0 and put it back in striped with another RAID0. Your friends, however, will take a long time to figure out what you did.
What are RAID2, 3, and 4?
Just as you imagined there are three more RAIDs, RAID2, RAID3 and RAID4. We do not think you will meet them anywhere after they have been abandoned. The RAID story reports that only RAID 1 and RAID2 were first displayed. Later and according to the needs of each developer and company, the rest appeared, not necessarily all together. And yes, RAID0 did not appear first, but later than 1 and 2.
After various "experiments" only 0, 1, 5, and 6 remained and all the rest were abandoned. However, in order to be globally aware, we say:
In the case of RAID 2, all data is striping at a bit level rather than a block level, like all others. Each bit is written on a different tray / strip. Such a solution requires the use of the Hamming Error Correction Code (ECC) to correct the errors.
Essentially the first bit was written in the first disk, the second bit, and so on. The number of disks in RAID 2 used to store information is equal to the logarithm of the number of disks that protect the data reported. That is, it uses many more disks for the ECC and for example for 10 data discs it wants 4 disks for ECC or another example for 4 disks with data requires 3 disks for ECC.
In RAID2, the controller tune the discs at the same turns. As a solution, RAID2 is no longer used. It is costly because it requires extra drives and its implementation is complicated as it requires the use of Hamming's code, which is now integrated into modern hard drives.
RAID3 went one step further by RAID2 as it strides on 8 bits (ie one byte), so it does not require many ECC disks, but only one disc devoted to parity. It requires the discs to rotate at the same speed.
The problem with RAID3, as with RAID2, is that the required disk synchronization has good sequential read / write performance, but if you make multiple requests together, the speed will drop dramatically. That is, random read / write has a worse performance and so it is usually not used.
RAID4 does striping not with Byte like RAID3, but at block level (16, 32, 64 or 128 kB). Like RAID 5. But for parity it uses an exclusive disk just for this job, like RAID 3. The exclusive use of a disk only for Parity reduces the recording speed and eventually the appearance of RAID5, RAID3 and 4 were abandoned.
Now, as far as the controllers that make the RAID are concerned, there is hardware and software, each with a pros and cons. We will analyze them in a different article, as far as you have arrived, your head is likely to hurt.