6th November 2009

2.5″ SATA

One interesting twist- when I was doing my table I was mostly thinking of SATA in terms of the high capacity 3.5″ drives, where you can get a 1TB drive for ~$90 and a 1.5TB drive for $120. So that is actually $.08/GB (which I rounded up to $.1). But a very interesting in-between solution is to go with the commodity 7200RPM 2.5″ drives. The fancy 10k/15k RPM SAS drives are always worse performance/$, but a $60 250GB 7200 RPM 2.5″ drive gives you pretty much the same IOPS. Plus I’ve seen 2U server designs that can fit more than 25 in a single box (as compared to ~8-10 max 3.5″ drives).

So if you aren’t trying to maximize capacity in a 2U unit you can either do-
10×1.5TB 3.5″ drives, 15TB storage, 750 IOPS, $1200, 0.625 IOPS/$
or
25×250GB 2.5″ drives, 6.2TB storage, 1875 IOPS, $1500 1.25 IOPS/$

Which represents twice the performance/$ at a cost of less than half the capacity/$. But again, if you weren’t going to be able to use that capacity anyway, its a great trade-off.

Alternatively in the same price range you could get two of those SSDs if you data-set is really small-
2×64GB SSDs, .13TB storage, 6600 IOPS, $1400 4.7 IOPS/$

So the SSD performance/$ is still better, but the capacity is so small that its unlikely to work for many applications.

posted in Hardware, Storage, Technology | 0 Comments

6th January 2009

Database Design

So here is a classic database design quandary. Lets say you are developing a service with accounts and each account has a set of parameters associated with it- have they bought the service, how much storage do they have, how many user licenses, etc.

Now, the set of things you want to store is going change over time for sure. So option 1 is to design a completely flexible system for storing key-value pairs-
CREATE TABLE [AcctParams1] (
[AcctId] [int] NOT NULL PRIMARY KEY,
[Key] [nvarchar] (16) NOT NULL,
[Value] [int] NOT NULL
)
CREATE UNIQUE CLUSTERED INDEX [AcctParamsIdx] ON [AcctParams1] ([AcctId], [Key])

A couple of notes before I move on. First of all its very important to create the clustered index on AcctId. You want to make sure that all the rows for the given account are grouped together on disk. If not, when the database goes to load just one account it might have to do IOs all over the place just to get the scattered AcctParam values.

As I said above, this approach is completely flexible, ideally involving no database schema changes ever. But the storage is somewhat less efficient and you are going to have to do multiple queries to retrieve everything you need about an account (one for the account itself, another for params).

The other way to do it is to just build out a table of the explicit columns for these parameters in classic database way-

CREATE TABLE [AcctParams2] (
[AcctId] [int] NOT NULL IDENTITY,
[Purchase] [int] NOT NULL,
[UserLimit] [int] NOT NULL,
[StorageLimit] [int] NOT NULL
)
With this approach you need to change the database schema every time you want to add some new param. However I think sometimes people get too freaked out by database schema changes (often because of too painful “upgrades” in the past). For example if you just needed to do this-

ALTER TABLE [AcctParams2] ADD [TransferLimit] [int] NOT NULL DEFAULT 5

On modern databases this kind of thing tends to execute pretty quickly but most importantly if your code is written carefully you can run this on your SQL box while your service is still online without having to simultaneously update the code of the service. I did some examples and create the above table and put a million rows of random data in it (which took 229 seconds on my test box). The above ALTER TABLE took 7 seconds to execute.

Guess what? We can do much better. Try this one-
ALTER TABLE [AcctParams2] ADD [TransferLimit] [int] NULL

By making it a NULL column it took… 0 seconds to execute. And again, if your app is built the right way, existing code will continue to work unchanged. So you can use a database schema that really defines the parameter names and types (in effect, SQL is handling the key-value stuff for you), but still has minimal upgrade impact on the uptime of your service.

A third approach is to have something like-

CREATE TABLE [AcctParams3] (
[AcctId] [int] NOT NULL IDENTITY,
[Params] [text] NOT NULL,
)

And stuff XML in the text column. Again, lots of flexibility, but you need to deal with all the XML parsing and it tends to be difficult to get the database to help you with any queries or analysis of that data. I’ve also seen multiple efforts that went down this path and the XML parsers ended up inflexible enough that they actually introduced tons of inflexibility and upgrade hassle. If you need to touch every row to upgrade the XML on this approach you just made yourself a huge problem.

One last note that applies to either approach 2 or 3- you can combine these params in the base [Account] table. Whether you want to do this or not depends on the access patterns. If 90%+ of the time when you want to access one, you want to access the other (IE, you are always writing SELECT * from [Account] join [AcctParams] on [Account].[AcctId] = [AcctParams].[AcctId] ), you might as well combine them so that its always just one I/O. On the other hand if you often want just one or the other AND they start to get large (lots of values, especially to the point where it gets close to the page size), it can make sense to split them out.

As will all optimization, there are no hard and fast rules- just guidelines and good places to test alternatives.

posted in Developers, Storage, Technology | 0 Comments

28th July 2008

Silicon Image SATARAID5 and Reboots

So I have a nice new storage array going for my home using the Sans Digital ESATA case, 4 WD Greenpower drives, and the Silicon Image ESATA card that came with the Sans Digital enclosure. The card comes with software called SATARAID5 that appears to mostly work just fine- it was easy to setup a RAID5 array with my 4 disks, and while I can’t dynamically add more disks too it (it looks like there is some other newer software that supports that kind of thing), overall it works well with one major exception. Before I mention that I should add that I’m just looking for some good reliable mass storage. This isn’t a “backup”, it doesn’t need the highest performance storage possible (although faster is always better), and all that. I’ve tested yanking a hard drive out of the array in mid-file copy and putting it back in later and it rebuilds (takes about 24 hours) but is fine.

The big catch is that everytime I reboot my system it does a rebuild- the management app has a nice UI that says that “Group 0 Volume A was not shutdown properly”, and it kicks off the rebuild which takes a very long time. So far those rebuilds have worked great but its really annoying to have degraded performance and reliability after every reboot.

Anyone have any experience with this stuff or ideas? Is there some way to manually shutdown the volume before I reboot? Some bug fix version that fixes this issue (I’m running SATARAID5 version 1.5.2.1 on Vista 32-bit).

posted in Storage, Technology, Vista | 58 Comments

29th June 2008

Home RAID Array

On Friday I finally got all the bits together for me new home RAID array. This one is a Sans Digital TR4M 4-space ESATA enclosure plus 4 Western Digital GP 1TB drives. Building it as Raid5 it looks like the total capacity will be 2.8TB (as the computer measures it, not as the drive companies market it).

I started it formatting, skipping the “quick format” option. That was over 24 hours ago and its at 36% right now. Which points out one of the biggest problems with large drive arrays (or any kind of large storage)- if you aren’t careful managing it can be a total mess. This does make me a bit happy that I decided to go with just the 4-drive array rather than holding out for a full 8- the bigger one would be even more of a mess to manage at times.

posted in Hardware, Storage, Technology | 0 Comments

23rd May 2008

DIY Laptop Solid State Drive

The hard drive in my Dell laptop started acting poorly so I’m trying to replace it with a solid state drive. Being too cheap to go spend $600 on an “off the shelf” SSD I’m trying to make one using a CF->SATA adapter and a 32GB CF card. Total cost $160.

The only catch is it doesn’t work so far. The Vista install dies part way through “uncompressing files”. Same with XP. At this point I’m wondering if the problem is my CF card (RiDATA 32GB 233X) or the adapter?

Any thoughts? Anyone get this working? There is a really cool looking adapter that lets you use 3 CF cards, but its $180 and its only from geekstuff4u where the shipping to the US is another $45. I can’t find that part from any US place.

posted in Hardware, Storage, Technology | 1 Comment

3rd November 2007

1TB Hard Drive Prices

1TB hard drive prices appear to be falling quickly as competition heats up. Western Digital came out with their “green” drive that can spin down from 7200 to 5400rpm to save energy and Fry’s is already selling them for $265. This represents a price drop of 25% in the past month. I suppose its not surprsing given that the market has gone from a single vendor to three. In any case, it looks like good news for building big storage arrays.

posted in Hardware, Storage, Technology | 0 Comments

10th August 2007

Upgrading a RAID Array

I’ve been giving some thought to upgrading one of my home RAID arrays lately. I currently have two 6-drive arrays attached to a single server via LSI MegaRAID SATA controller cards. One is using 250mb drives for 1.25gb capacity and the other is using 400gb drives for 2gb capacity. So far they have been operating fairly well.

While more capacity is nice, reliability is the most important thing. I bought that first array back in June 2004, which is just over 3 years ago. If I recall from the Google research on hard-drive reliability (ironic note- I couldn’t find the actual study with a quick Google, only lots of articles about it), age is one of the big factors towards failures, with lots of failures starting to happen when drives get to be about 3 years old.

So one question is about an upgrade process. I have lots of practice with simple usage of this array, at least enough to know not to pull multiple drives all at once. But to be honest I have not ever done anything complicated. Can I upgrade the drive size by just pulling the drives one at a time, replacing each with a bigger drive, waiting for it to rebuild until everything is balanced again?

Or is it much safer to copy everything somewhere else? This sounds like a pain since its 1TB, but then again with drive sizes now getting 1tb of free space somewhere else isn’t as hard as it used to be, just a bit slow.

posted in Hardware, Storage, Technology | 5 Comments

26th July 2007

Hard Drive Prices July Updates

Wider availability of the new 1TB drives has had the hoped-for result lowering hard-drive prices across the board over the last month. Drives larger than 250GB have decreased a bit more than 10% in one month. The 500GB drives remain the price per GB leader costing about $.18 per GB (as low as $88 for a 500GB drive). Heading up to 750GB doubles the price and increasing to 1TB almost doubles the price again. Still its nice to see the 750GB drives dip well under $200 and the 1TB drives available for as low as $350 instead of their launch price of $400. The Seagate 1TB drives have not really hit the market in a meaningful way so hopefully things will dip even lower once the Hitachi has some competition.

posted in Storage, Technology | 0 Comments