9th June 2008

Microsoft Velocity and Memcache

posted in Developers, Microsoft, Software, Technology |

Just saw a post about Microsoft Velocity, Microsoft’s answer to memcache. I’m looking forward to checking this out soon- we have had a ton of success using memcache on the LAMP platform and it was a missing piece in the .NET world. I understand why it was a bit hard to see its importance in that environment- Unlike PHP, .NET can persist things in memory in between requests. PHP really needs memcache quickly since you can’t really save anything from request to request so you go to the database very quickly. But as your system grows Memcache fills an even more important role because of how easily you can scale out adding more caching servers.

Memcache (at least the standard one) has a few problem scenarios all revolving around how it so easily shares the load between servers. The coolest thing about memcache is how simple of a mechanism it is- no complex configuration to tune (there are a few things to tune), maintain, debug. You simply configure N memcache servers on all your web server, and when you save or retrieve an object, it takes the ‘key’ (an arbitrary string), hashes it and uses the hash to pick one of the memcache servers (in effect it does H(key)%N where H is the hashing function and N is the number of servers). In effeect you automatically get a smooth distribution of your keys across your caching servers.

However, if you add or remove memcache servers from the array, it changes the hash, so all of the sudden your keys are on different servers. Now, if your site is under low load, this isn’t that big a deal- you just dumped your whole cache and it will build back up and be fine. But if you are running memcache because you really need it, your site just went down as all the web-servers just started pounding directly on the database with every request. Right now DeepRockDrive has a pretty unique situation where we get huge spikes of traffic (that are mostly predictable- showtime) during which the memcache servers really save our bacon, and most of the time we can clear them out more or less safely, but most big sites are going to have a more consistent traffic pattern and would have a harder time with this.

This also means that if you a memcache server goes down you can’t just pull it out of your configuration (at least under normal load). You really really need to replace it. The easiest (although resource intensive) way to do this is to just have a hot-spare server or two in your track. If one of your memcache servers goes down, you map that spare to the same IP and bring it up. You just lost a portion of your cache (10% if you have 10 servers, etc..), but its way better than losing the whole thing. A more complicated setup would be to run multiple instances of memcache on every machine. So if you have 10 memcache machines, you run each with 3 IPs and it looks like your whole array is 30 “servers”. If one goes down, you bring those 3 IPs up on 3 of the other machines distributing some of its load to those machines.

We hadn’t had a chance to fully work out the scalability of the standard memcached running with multiple instances on the same box so far. We have played with it some and on our 8-core boxes even with the right threading libraries we haven’t gotten close to maxing it out with a single instance of memcache. It looks like there may be some I/O limits but I can’t be sure about what is actually going on, still the notion of running multiple instances on the same box seems like a fairly reasonable one for scalability and these fail-over flexibility cases.

The other tricky issue is that memcache gives you a balanced distribution of keys but does not necessarily give you a balanced distribution of access. Lets say you had some runtime configuration information that you wanted to persist on your site. The easy thing to do would be to save it in a key called ‘config’ and just retrieve that key on every request.

What you have accomplished here is to just introduce a nice hard-scalability limit into your system. Memcache isn’t actually that much faster than MySql is for basic queries. If MySql can cache the query well (as it would be able to for a query on a simple table that just gets hit over and over), the performance of the two will be pretty similar. Where memcache shines is that because of how its keys and the hashing thing work, it can transparently distribute that load over the multiple servers. So the person building the app with the ‘config’ key will have something that looks great as its small and on one caching server and then when they try to apply it to a high load site with multiple servers. All the traffic still just goes to the one memcache server (since its one key that gets hashed the same every time) and they will typically be stumped why the performance isn’t better.

The way around this is to generate keys that look something like-

‘config-’.rand(0,9)

(php syntax)

At first this is counter-intuitive. I’m storing the same thing in 10 keys? That means that when they expire I’m going to have to go back and do 10x the initial loading of this object (whether from the database or config files or whatnot). However, at the cost of a very small # of those database queries (they only happen once every 5 or 60 minutes right), I’m spreading my keys out across my memcache array and the result is that the load gets spread smoothly across my whole array. I can even do a couple of slick extra things like every time I refresh the config data write to all 10 keys at once, resulting in no extra load on the database (except for a race condition I’ll cover in a future post) and just a small amount of occasional load on one web server.

I started this post mentioning Microsoft Velocity and then went into memcache- looping back to Velocity, in typical Microsoft fashion it looks like its a much more complicated solution, but it also automatically deals with some of the above issues. As far as I can tell from a few architecture diagrams the servers maintain knowledge of a cluster (memcache servers have no idea about each other) and I’d assume they automatically deal with some of the fail-over and “add a server” cases. It also has a more explicit concurrency model- although memcache supports building things with much of the same concurrency behavior you need to manage it a bunch more yourself.

Looking forward to checking out velocity more later. In particular I’m interested in when the protocol to talk to the servers will be published and whether there will be support for PHP/Python clients talking to these servers.

There are currently 5 responses to “Microsoft Velocity and Memcache”

Why not let us know what you think by adding your own comment! Your opinion is as valid as anyone elses, so come on... let us know what you think.

  1. 1 On June 18th, 2008, Rock said:

    I know the guys at Server Intellect can install this on a VPS or dedicated server. We had it installed on our vps and is currently running without any hiccups yet.

  2. 2 On July 30th, 2008, Hadi said:

    NCache has been the caching solution of choice for various mission critical applications throughout the world since 2005. Its scalability, high availability and performance are the reasons why NCache has earned the trust of developers, senior IT and management personnel within high profile companies. These companies are involved in a range of endeavors including ecommerce, financial services, health services and airline services.

    The range of features that NCache currently showcases are highly competitive and revolutionary. It supports dynamic clustering, local and remote clients, advanced caching topologies like replicated and mirrored, partitioned and replicated partitioned. It also provides an overflow cache, eviction strategies, read-through and write-through capabilities, cache dependencies, event notifications and object query language facilities. For a complete list of features and details please visit http://www.alachisoft.com/ncache/index.html

    Download a 60 day trial enterprise/developer version or totally free NCache Express from http://www.alachisoft.com/download.html

    Team NCache

  3. 3 On August 1st, 2008, Alex said:

    NCache looks interesting, although I’ve got to say the pricing of $1000/cpu seems stratospheric. I can’t find anywhere on the site that says whether that is per-core or per physical chip, but still $2000-$8000 per server when we typically spec out servers that cost less than $2000 for high-end 8-core systems is way too much (especially given the free availability of memcache).

  4. 4 On September 12th, 2008, Iqbal Khan said:

    Hi Alex,

    NCache pricing is per CPU and it does not count any cores. So, a dual-core CPU is considered 1 CPU.

    NCache is the most featureful .NET distributed cache and offers a lot of very important features that Memcache does not provide. To start with, you get 100% uptime due to dynaimc clustering in NCache. Then, you have various caching topologies including Mirrored, Replicated, Partitioned, and Client Cache.

    So, you really need to see the cost to your business of downtime or inability to scale. NCache comes with full support as well (including 24×7 support option).

  5. 5 On April 28th, 2009, Sanjay said:

    I have just deployed Microsoft Velocity. As compared to NCache, when a velocity server goes down the data is lost. In NCache the data is redistributed to the other nodes on the cluster. This is in the case of both sudden crash of the server or a graceful shutdown. As compared to memcached Velocity causes more traffic on the network and a sudden jump to CPU and network usage. Seems like Velocity needs a more stronger archeitecture to support such advanced features. Except for the price, NCache would still be my choice of in memory cache.

    I wish NCache would spend some time evaluating Velocity. From all the posts I have seen around, it seems like Ncache is just pasting the same blog text everywhere.

Leave a Reply