19th June 2008

Recent Interesting Posts- Functional Programming and Dark Launches

Back in March Dare Obasanjo said he was going to stop blogging. Luckily it seems like he didn’t carry through his threat very well and has had a ton of great posts recently.

Dare posted on functional programming, Map/Reduce/Filter in C# 3.0 including some nice background on the topic. I’ve been getting into Python lately which has some really elegant support for anonymous functions, list/collection operations, and specifically things like map/reduce/filter. This is also an interesting approach on some similar techniques in PHP.

Dare also did an interesting post on techniques to dark launch / gradually ramp up new features. The idea is that you want to try out a new feature without risking taking down your entire user base at once.

Some other really simple approaches include putting a hidden iframe into existing pages that access the new feature. Its a really easy way to isolate it from the rest of what is going on for your site. One thing to keep in mind is that as you dark-launch a feature its really important to figure out how you can throttle it up/down. If you have 20 front end servers, maybe you can deploy it only on a certain set of them or else your pages can only serve it up a certain random percentage of the time. This way you can start it up on a very low load (say 1% of the requests) and slowly increase it to try it out more. If you start having some performance problems in your data center, just kick it down a couple of notches.

Deploying to only certain specific servers can be especially interesting because it can (if done right) focus the load on those servers. So those servers behave the way the whole system will once the system gets fully turned on, but don’t have the same risk of taking your whole environment down.

posted in Technology, Developers, Software | 0 Comments

9th June 2008

Microsoft Velocity and Memcache

Just saw a post about Microsoft Velocity, Microsoft’s answer to memcache. I’m looking forward to checking this out soon- we have had a ton of success using memcache on the LAMP platform and it was a missing piece in the .NET world. I understand why it was a bit hard to see its importance in that environment- Unlike PHP, .NET can persist things in memory in between requests. PHP really needs memcache quickly since you can’t really save anything from request to request so you go to the database very quickly. But as your system grows Memcache fills an even more important role because of how easily you can scale out adding more caching servers.

Memcache (at least the standard one) has a few problem scenarios all revolving around how it so easily shares the load between servers. The coolest thing about memcache is how simple of a mechanism it is- no complex configuration to tune (there are a few things to tune), maintain, debug. You simply configure N memcache servers on all your web server, and when you save or retrieve an object, it takes the ‘key’ (an arbitrary string), hashes it and uses the hash to pick one of the memcache servers (in effect it does H(key)%N where H is the hashing function and N is the number of servers). In effeect you automatically get a smooth distribution of your keys across your caching servers.

However, if you add or remove memcache servers from the array, it changes the hash, so all of the sudden your keys are on different servers. Now, if your site is under low load, this isn’t that big a deal- you just dumped your whole cache and it will build back up and be fine. But if you are running memcache because you really need it, your site just went down as all the web-servers just started pounding directly on the database with every request. Right now DeepRockDrive has a pretty unique situation where we get huge spikes of traffic (that are mostly predictable- showtime) during which the memcache servers really save our bacon, and most of the time we can clear them out more or less safely, but most big sites are going to have a more consistent traffic pattern and would have a harder time with this.

This also means that if you a memcache server goes down you can’t just pull it out of your configuration (at least under normal load). You really really need to replace it. The easiest (although resource intensive) way to do this is to just have a hot-spare server or two in your track. If one of your memcache servers goes down, you map that spare to the same IP and bring it up. You just lost a portion of your cache (10% if you have 10 servers, etc..), but its way better than losing the whole thing. A more complicated setup would be to run multiple instances of memcache on every machine. So if you have 10 memcache machines, you run each with 3 IPs and it looks like your whole array is 30 “servers”. If one goes down, you bring those 3 IPs up on 3 of the other machines distributing some of its load to those machines.

We hadn’t had a chance to fully work out the scalability of the standard memcached running with multiple instances on the same box so far. We have played with it some and on our 8-core boxes even with the right threading libraries we haven’t gotten close to maxing it out with a single instance of memcache. It looks like there may be some I/O limits but I can’t be sure about what is actually going on, still the notion of running multiple instances on the same box seems like a fairly reasonable one for scalability and these fail-over flexibility cases.

The other tricky issue is that memcache gives you a balanced distribution of keys but does not necessarily give you a balanced distribution of access. Lets say you had some runtime configuration information that you wanted to persist on your site. The easy thing to do would be to save it in a key called ‘config’ and just retrieve that key on every request.

What you have accomplished here is to just introduce a nice hard-scalability limit into your system. Memcache isn’t actually that much faster than MySql is for basic queries. If MySql can cache the query well (as it would be able to for a query on a simple table that just gets hit over and over), the performance of the two will be pretty similar. Where memcache shines is that because of how its keys and the hashing thing work, it can transparently distribute that load over the multiple servers. So the person building the app with the ‘config’ key will have something that looks great as its small and on one caching server and then when they try to apply it to a high load site with multiple servers. All the traffic still just goes to the one memcache server (since its one key that gets hashed the same every time) and they will typically be stumped why the performance isn’t better.

The way around this is to generate keys that look something like-

‘config-’.rand(0,9)

(php syntax)

At first this is counter-intuitive. I’m storing the same thing in 10 keys? That means that when they expire I’m going to have to go back and do 10x the initial loading of this object (whether from the database or config files or whatnot). However, at the cost of a very small # of those database queries (they only happen once every 5 or 60 minutes right), I’m spreading my keys out across my memcache array and the result is that the load gets spread smoothly across my whole array. I can even do a couple of slick extra things like every time I refresh the config data write to all 10 keys at once, resulting in no extra load on the database (except for a race condition I’ll cover in a future post) and just a small amount of occasional load on one web server.

I started this post mentioning Microsoft Velocity and then went into memcache- looping back to Velocity, in typical Microsoft fashion it looks like its a much more complicated solution, but it also automatically deals with some of the above issues. As far as I can tell from a few architecture diagrams the servers maintain knowledge of a cluster (memcache servers have no idea about each other) and I’d assume they automatically deal with some of the fail-over and “add a server” cases. It also has a more explicit concurrency model- although memcache supports building things with much of the same concurrency behavior you need to manage it a bunch more yourself.

Looking forward to checking out velocity more later. In particular I’m interested in when the protocol to talk to the servers will be published and whether there will be support for PHP/Python clients talking to these servers.

posted in Technology, Microsoft, Developers, Software | 1 Comment

12th May 2008

Twitter, Ruby on Rails and Scalability

Blaine Cook, the former CTO of Twitter writes about scalability. Twitter has often been pointed to as an example of the kind of problems that a Ruby on Rails application will often encounter when trying to really scale big. He points out that languages don’t scale, architectures do.

Which is right. The problem isn’t Ruby. Its Ruby on Rails. Ruby is just a language. Ruby on Rails is an architecture that makes database interactions sometimes _too_ automatic. Unfortunately Blaine’s post seems to miss this distinction.

Its possible to build scalable applications with Ruby on Rails, but for all that Rails advocates making writing web apps the right way “on rails”, it leads you down some poor paths with respect to scalability. I know some great developers who understand enough about how the inner stuff works that I’m sure they can make scalable Rails applications, but I’ve also seen most that aren’t.

posted in Technology, Developers, Software | 0 Comments

9th May 2008

Facebook Connect

Facebook just announced Facebook Connect which lets you use Facebook to authenticate on your own site. Except that, uh, its already possible to do that with the existing Facebook APIs, although other than our implementation at DeepRockDrive I haven’t seen many other sites do this. I suppose the Connect stuff makes the approach a bit more smooth and supports it more officially, but its all there already.

For us supporting Facebook authentication was a no-brainer. We care about getting a ton of people to our site to see our cool interactive shows. If we can skip a whole registration process and all that mess and have them just click the Facebook icon, and it works, perfect.

posted in Technology, Developers, Facebook, Software | 0 Comments

6th May 2008

Visual Studio 2008 Crashes

I’ve been having a ton of issues with Visual Studio 2008 since it came out. I mostly use it as a text editor at the moment for editing my PHP files. I know this is a bit of a wacky scenario, but you can open a web project, get a good view of your directories, and I’m very used to all its shortcuts, etc (I feel lost in Eclipse, other editors).

One of my favorite VS features has been the fairly powerful “find in files” command. I’m sure Eclipse has something equivalent, but I haven’t found it yet. Unfortunately this has been routinely crashing in VS 2008, to the point where I have a copy of VS 2005 open just to search in my project. Then today Visual Studio 2008 crashes just opening one of the PHP files in my project.

I’ve seen a bunch of reports of similar things on the net, although most seem to be issues with 64-bit systems. Searching around a bit I found Scott had posted a link to a hot-fix patch roll-up. The good news is that it looks like it fixed my “find in files” problem, but it doesn’t seem to have fixed the problem opening that one file. I’ll post more as I figure it out later.

Update- After editing that problem file in VS 2005, the crash went away in VS 2008- so far the patch is a big success. Also, Scott Guthrie continues to score points in the “most responsive and helpful Microsoft person ever” category by jumping on my post with an offer to help. Given his 5-gold stars (I’ve been playing too much Rockband) and 1million+ score in that contest, I’m sure no one can catch him anyway.

posted in Technology, Developers, Software | 1 Comment

2nd May 2008

Memcache Append

The Memcache protocol specification defines a command “Append” that looks very useful. You can use it to add data to an existing key without having to do a read/write (and deal with race-conditions that can introduce).

The only catch is that the standard PHP Memcache client implementation I’m using doesn’t seem to support it. So I think I’m out of luck until someone adds it to the standard PHP memcache client library I use.

Of course this is all open source so I’m probably just supposed to grab the PHP sources and add it myself, right? Wow does that sound like a lot of effort…

Update- I discovered that version 3.0 of the Memcache PHP client library supports append. They are on 3.0.1 now but the status is still officially “beta” so I guess I’ll just wait a bit more, but its good to see this is coming soon. Also note that you need the memcached server version 1.2.4 for append.

posted in Technology, Developers | 0 Comments

6th February 2008

Antechinus JavaScript Editor v10

A good programmer’s editor is probably one of the most important tools you can have. I’ve been working with PHP and Javascript a bunch lately and the fact that I mostly use Visual Studio is really quite sad. It doesn’t know anything about PHP and so far I’ve not been that impressed by editing JavaScript in it either (its ok for debugging Javascript).

So I got mail today that the Antechnius JavaScript editor is out with version 10. The big deal in the new version is that they merged the PHP and JavaScript editors. I have played with both before but never stuck with them at all, mostly because it sucked that there were two different ones. The notion of one environment for both sounds great.

First of all, it doesn’t appear to have a “find in files” function. I use this all day long in VS. Find everywhere that calls foo(). Find this variable somewhere in the code-base. Especially when tackling a code-base that you aren’t familiar with yet this is crucial. The editor has a notion of a “project” but it seems limited to providing a file browser and uploading things via FTP.

It does have a handy thing that picks out all the functions in the current file, but again, it doesn’t know anything beyond the current file. So no help to find where foo() is declared.

It doesn’t really support mixing HTML and Javascript. So debugging Javascript in stand-alone JS files is fine, but if you put it in your HTML file you can only execute a little bit at a time by selecting it and saying “execute”. Its also not clear when you do that (or otherwise try out your Javascript) how it deals with bringing in includes and libraries and what-not.

It does let you run PHP stuff and do a syntax check, but its not really integrated. Its just running PHP.exe to do that, and displaying the results in a text window. You can’t even click on errors to go to the right line and they do nothing to help with the poor error messages in the PHP engine (two examples- you can be missing a close parenthesis but it tells you unexpected ‘{’, or forget to close a string and you get unexpected T_STRING, both referencing the wrong line. And yes, I know why the compiler gives the errors there, but its not user friendly and not really 2008-state-of-the-art).

I’ll play with it for a few more days but it feels like its still in the “not quite enough to be useful” category.

posted in Technology, Developers, Software | 0 Comments

5th February 2008

Transparent Javascript

There is this fairly recent notion about making “transparent Javascript”. The idea is that you don’t complicate your markup with Javascript and keep it all in include .js files. The JS files identify various spots in your pages via IDs or classes and attach event handlers at some point on page load time.

So far I’m not a fan. I might warm up to it later, but I’ve been working with it so far and its really hard for me to tell where the code is for anything. I’ve got some piece of HTML, but finding out what it actually does or how to fix it when it stops doing what its supposed to is a pain. Just too “transparent”.

Its similar to various MVC patterns. Good factoring of code can be a really good thing. It can be crucial to build big projects that you can maintain. And I really do like the concept of keeping the HTML clean so that designers can work it it more easily without messing with script all over the place. But abstractions can go too far. I’m also not a fan of the purist MVC models- I’d rather have something that doesn’t abstract so much that its hard to trace the actual code execution.

posted in Technology, Developers, Software | 0 Comments

18th September 2007

GacUtil and 64-bits

Another stupid mistake I just made. I’ve been developing some 64-bit .NET code and was trying to call gacutil to register some DLLs. I kept getting the error message-

“Failure adding assembly to the cache: Strong name signature could not be verified. Was the assembly built delay-signed?”

It turns out I was trying to run a 32-bit version of GacUtil on a 64-bit assembly. The error message was unhelpful, but switching to the 64-bit Gacutil solved the problem right away.

posted in Technology, Developers | 0 Comments

12th September 2007

Stupid Javascript Mistakes

I’m working on a project that involves Javascript and I just got stuck for something like 2 hours. For some reason my .js file was mysteriously not running. IE was not showing any errors, just not running anything in the script. I tried using the debugger, but nothing. I tried putting an alert(”begin”) at the top and alert(”end”) at the bottom and nothing.

After spending far too long spinning in circles I thought of trying to run the file from the command line. Sure enough it spits out a syntax error. I had written
if(myobj.value ! "")
and left off the =. It should have been
if(myobj.value != "")

Still, it was very annoying that IE wasn’t spitting out any error messages. Even more annoying was how long it took me to catch this. I think I’ll use the cmd-line to syntax check Javascript more in the future.

posted in Technology, Developers | 2 Comments