So here is a classic database design quandary. Lets say you are developing a service with accounts and each account has a set of parameters associated with it- have they bought the service, how much storage do they have, how many user licenses, etc.
Now, the set of things you want to store is going change over time for sure. So option 1 is to design a completely flexible system for storing key-value pairs-
CREATE TABLE [AcctParams1] (
[AcctId] [int] NOT NULL PRIMARY KEY,
[Key] [nvarchar] (16) NOT NULL,
[Value] [int] NOT NULL
)
CREATE UNIQUE CLUSTERED INDEX [AcctParamsIdx] ON [AcctParams1] ([AcctId], [Key])
A couple of notes before I move on. First of all its very important to create the clustered index on AcctId. You want to make sure that all the rows for the given account are grouped together on disk. If not, when the database goes to load just one account it might have to do IOs all over the place just to get the scattered AcctParam values.
As I said above, this approach is completely flexible, ideally involving no database schema changes ever. But the storage is somewhat less efficient and you are going to have to do multiple queries to retrieve everything you need about an account (one for the account itself, another for params).
The other way to do it is to just build out a table of the explicit columns for these parameters in classic database way-
CREATE TABLE [AcctParams2] (
[AcctId] [int] NOT NULL IDENTITY,
[Purchase] [int] NOT NULL,
[UserLimit] [int] NOT NULL,
[StorageLimit] [int] NOT NULL
)
With this approach you need to change the database schema every time you want to add some new param. However I think sometimes people get too freaked out by database schema changes (often because of too painful “upgrades” in the past). For example if you just needed to do this-
ALTER TABLE [AcctParams2] ADD [TransferLimit] [int] NOT NULL DEFAULT 5
On modern databases this kind of thing tends to execute pretty quickly but most importantly if your code is written carefully you can run this on your SQL box while your service is still online without having to simultaneously update the code of the service. I did some examples and create the above table and put a million rows of random data in it (which took 229 seconds on my test box). The above ALTER TABLE took 7 seconds to execute.
Guess what? We can do much better. Try this one-
ALTER TABLE [AcctParams2] ADD [TransferLimit] [int] NULL
By making it a NULL column it took… 0 seconds to execute. And again, if your app is built the right way, existing code will continue to work unchanged. So you can use a database schema that really defines the parameter names and types (in effect, SQL is handling the key-value stuff for you), but still has minimal upgrade impact on the uptime of your service.
A third approach is to have something like-
CREATE TABLE [AcctParams3] (
[AcctId] [int] NOT NULL IDENTITY,
[Params] [text] NOT NULL,
)
And stuff XML in the text column. Again, lots of flexibility, but you need to deal with all the XML parsing and it tends to be difficult to get the database to help you with any queries or analysis of that data. I’ve also seen multiple efforts that went down this path and the XML parsers ended up inflexible enough that they actually introduced tons of inflexibility and upgrade hassle. If you need to touch every row to upgrade the XML on this approach you just made yourself a huge problem.
One last note that applies to either approach 2 or 3- you can combine these params in the base [Account] table. Whether you want to do this or not depends on the access patterns. If 90%+ of the time when you want to access one, you want to access the other (IE, you are always writing SELECT * from [Account] join [AcctParams] on [Account].[AcctId] = [AcctParams].[AcctId] ), you might as well combine them so that its always just one I/O. On the other hand if you often want just one or the other AND they start to get large (lots of values, especially to the point where it gets close to the page size), it can make sense to split them out.
As will all optimization, there are no hard and fast rules- just guidelines and good places to test alternatives.