Sometimes I wonder if we argue with each other for the thrill of combat.  Other times I wonder if the other person will ever shut up.

A while ago I was discussing my plans to construct an affordable NAS (Network Attached Storage) device with around a Terabyte of storage to start, and enough expandability to scale easily over the next few years.  I was discussing the basic design with some people at my local coffee house today when out of nowhere this 20-something guy invites himself to the conversation and starts ripping into my recommendation of a RAID5 array to store the data.

Now, before I get off on yet another rant here, I must say that I’m incredibly surprised by how often this argument comes up in chatrooms, forums, newsgroups, and almost anywhere else geeks and lesser hobbyists get together to talk shop.  I would have thought that with all the hard facts regarding the pros and cons of different RAID levels out on the internet and in various trusted trade magazines, the majority of people would be at least familiar with when to use certain levels, and when to avoid them.

The key ideas behind my custom NAS solution are really quite simple.  The device must be:

  • cheap (under $1000 CDN with initial potential capacity of 1 TB)
  • relatively reliable
  • easily scalable

This is not a very big list of “musts”, and it’s the first point that I tried to stress the most to this man who usurped an otherwise pleasant discussion of potential storage solutions.  But like many of the people who argue about anything and everything on IRC or 4chan, this person refused to listen to the requirements before deeming everyone within earshot who would not agree with him to be a “complete idiot” … if I remember his comment correctly.

It was at that point I stopped listening.

Since a blog post can’t be rudely interrupted (once posted), here’s my reasoning for a Level 5 RAID array on the consumer-grade NAS I hope to build.  If you disagree, feel free to post your opinions and perhaps suggest some alternatives that would keep the cost of the storage server within target.  To keep things even, I’ll be voicing not only the pros behind RAID5 (of which there are a few), but the cons as well.

The biggest selling factor behind RAID5 (or RAID6 for that matter) is that it’s cheap, and has some basic redundancy in the event a drive fails.  RAID5 and RAID6 both perform very poorly at sequential read/write performance as well as random read/write performance.  RAID 1+0 or 0+1 has excellent capability in both sequential and random read/write performance, but requires more drives and more expensive hardware to be truly worth the effort of building a NAS.  RAID5 again comes up short in terms of availability as it can only lose 1 drive before the data is unprotected (RAID6 allows you to lose 2 drives).  RAID 0+1 and 1+0 can lose up to half the drives in an array without losing data.

As an example if you have two shelfs with 12 drives in each shelf in a RAID 0+1, 1+0 Array with the Mirror sets being across the shelfs and the stripe sets contained within the shelfs you can lose an entire shelf without affecting the operations of the server, RAID 5 or 6 simply cannot survive in this scenario.  How likely is this to happen?  Well … like everything in life, it’s 50/50.  Either it will, or it won’t.

But who has this kind of money for a home storage server?  How many nine’s do you need for your data at home?

As it is, everything that’s on my existing NAS is backed up on DVD.  The main reason I aim to use networked storage is so that I don’t need to look through my archive index and then flip through dozens of DVD binders to make use of the files I want.  At the same time, I want my data to be easily available to several machines … some of which have no access to my DVD archives.  Then of course comes the problem of streaming all that media to uPnP devices.

Where would I put my 107 Gig of mp3s?  What about all the other digital media I have?  How annoying would it be to fish out my mp3 archives on DVD just to listen to seven or eight CDs that I don’t want to grab individually?

I’ll admit that this can be chalked up to laziness.  If everything is backed up on an optical disc and properly catalogued, it shouldn’t be that much of a hassle to fish out the appropriate binder.  But that’s not the point.  It’s the principle of the matter.

So I don’t really need a high-availability system.  If one drive fails, it would be nice to hot-swap a new drive and let the data rebuild itself (this is excrutiatingly painful with RAID5 as it means the server is pretty much unavailable to everyone until it’s done).  I did play with the idea of RAID0 to get the maximum storage capacity out of the system, but I don’t want to re-load more than a Terabyte should one of the drives fail on me.  JBOD could help me get around this little problem, but it would mean that I’d have to know which files were on that drive to restore that data.

Just because I work with computers for a living does not mean that I want to spend my evenings or weekends reconstructing data.  That’s what computers are for.

As it is, I plan on getting two more 320 Gig Seagates and taking the three existing 320’s I have and putting them to use in this new box.  Under RAID5, that will give me a Terabyte of storage.  This will all be controlled through a 3Ware card (on-board RAID might get me by, but I don’t relish the idea of using quasi-software RAID for this box) running under FreeBSD.  The system will also be configured to send emergency SMS messages to my cell phone should anything fail.

So of one drive dies, I can pick up a new one on the way home for $100 (as of this writing) and let the data rebuild.  If two drives die, well … that would suck, but I could get two new drives and spend a weekend re-loading the data.  As for the OS, that’s going to run on something much sexier than a RAID set.  I plan on using a bootable flash drive for this in order to optimize the power savings.  Once the OS is written, I will rarely ever need to write to the flash card.  A two gig CF would work perfectly, I think.  As it is, my drives are spun down for 10 to 14 hours a day.  Why waste the power?

So that’s my plan.  I’ve studied RAID quite a bit over the last decade, and while I can’t claim to know it all, I do know where certain levels can be used, and where others should be avoided.  But every level has an application in today’s world.  Some are just better suited than others.

Conclusion:

  • RAID 0: Lack of fault tolerance makes it unsuitable for enterprise applications and risky in most others.
  • RAID 1: OK for OS and Application binaries, if you are considering this for a heavy-load transactional database due to limitations in server capacity, you should consider another server.
  • RAID 5: Poor performance and Fault tolerance make it unsuitable for enterprise applications.  Acceptable for small business or consumer storage needs.
  • RAID 6: See RAID 5
  • RAID 10/01 (0+1,1+0): Excellent Performance/Availability make this RAID Level Ideal for enterprise applications, though a bit pricey for the rest of us.
  • RAID 15/51: Poor Performance, Excellent Availability, performance makes this unsuitable for database applications.  Not widely available.  Not something consumers would ever ask for in their home.

Don’t believe me?  Here’s some light reading: