29th June 2008

Home RAID Array

On Friday I finally got all the bits together for me new home RAID array. This one is a Sans Digital TR4M 4-space ESATA enclosure plus 4 Western Digital GP 1TB drives. Building it as Raid5 it looks like the total capacity will be 2.8TB (as the computer measures it, not as the drive companies market it).

I started it formatting, skipping the “quick format” option. That was over 24 hours ago and its at 36% right now. Which points out one of the biggest problems with large drive arrays (or any kind of large storage)- if you aren’t careful managing it can be a total mess. This does make me a bit happy that I decided to go with just the 4-drive array rather than holding out for a full 8- the bigger one would be even more of a mess to manage at times.

posted in Technology, Hardware, Storage | 0 Comments

28th June 2008

Silent Video Cards

For my media-PC I purchased a silent video card. Gigabyte makes the SilentPipe series for several of the Nvidia models and it seemed like a good idea to be able to get a card with decent graphical power (not top, but at the time it was better than any of the other cards I had) and no noise.

The catch is that when building a Media PC you need to keep in mind the overall system. The SilentPipe GF8600GTS has two problems- first of all its form factor is kind of big- the cooling fins stick about a half inch over the normal height of a PCI-express card. In a full sized case this would be fine but in the Zalman media-case I have there is only the exact room for full height cards and the extra half inch means the top of the case doesn’t really fit right.

The second issue is heat. The card runs correctly, but overall it does run pretty hot and since its not blowing the hot air outside the case itself it places extra heat load on the rest of the system + relies on the existing case fans to create air moment through its fins and out its vent. I suspect this results in my overall case fans running higher more of the time, so in the end I fear that I’m actually running a more noisy system for trying to use a silent video card. Instead of adding one relatively slow running video card fan I’m pushing the main fans faster and higher RPMs create a lot more noise.

posted in Technology, Hardware, Graphics | 0 Comments

19th June 2008

Recent Interesting Posts- Functional Programming and Dark Launches

Back in March Dare Obasanjo said he was going to stop blogging. Luckily it seems like he didn’t carry through his threat very well and has had a ton of great posts recently.

Dare posted on functional programming, Map/Reduce/Filter in C# 3.0 including some nice background on the topic. I’ve been getting into Python lately which has some really elegant support for anonymous functions, list/collection operations, and specifically things like map/reduce/filter. This is also an interesting approach on some similar techniques in PHP.

Dare also did an interesting post on techniques to dark launch / gradually ramp up new features. The idea is that you want to try out a new feature without risking taking down your entire user base at once.

Some other really simple approaches include putting a hidden iframe into existing pages that access the new feature. Its a really easy way to isolate it from the rest of what is going on for your site. One thing to keep in mind is that as you dark-launch a feature its really important to figure out how you can throttle it up/down. If you have 20 front end servers, maybe you can deploy it only on a certain set of them or else your pages can only serve it up a certain random percentage of the time. This way you can start it up on a very low load (say 1% of the requests) and slowly increase it to try it out more. If you start having some performance problems in your data center, just kick it down a couple of notches.

Deploying to only certain specific servers can be especially interesting because it can (if done right) focus the load on those servers. So those servers behave the way the whole system will once the system gets fully turned on, but don’t have the same risk of taking your whole environment down.

posted in Technology, Developers, Software | 0 Comments

19th June 2008

Installing Vista SP1 Follow Up

Yesterday I wrote about problems installing Vista SP1 on this machine. Running the “download and run it yourself” installer worked fine with one little glitch. The issue was that I first downloaded the SP1 setup that only contained 5 languages, thinking “hey, I only really actually care about English”. The catch is that it won’t run if you have any languages other than those 5 installed. I had previously installed a ton of languages since they were just check boxes in Windows Update.

Uninstalling languages turned out to be a nightmare. First of all, you can’t uninstall from “Programs and Features”, you have to go into “Regional and Language Options”, go into the “Keyboards and Languages” tab, click a button, click another button. Then most of my attempts to remove languages would fail (after sitting doing something to my computer for what seemed like 30 minutes), PLUS it requires a reboot after each one (even though I’m uninstalling languages I’ve never ever used).

So the solution ended up being downloading the “all languages” version of the Vista SP1 package. The 20 minutes the download took was way faster than uninstalling the languages. Upgrade felt like it took a long time- I could be wrong but it feels slower than initial setup (which does make some sense since initial setup can just write an image down on your disk while upgrade is presumably doing more real work).

All is well now and so far the machine is running smoothly.

posted in Technology, Vista, Microsoft | 0 Comments

18th June 2008

Office Hours

Brad Feld posts about Office hours during college and how he tries to do similar things now at their TechStars incubator. We have a different take on office hours at DeepRockDrive but so far it works out really well and I thought it would be interesting to share.

When I started working with DeepRockDrive the technology folks up here in the Seattle area didn’t have a real office at all yet. Folks just met most days in a coffee shop and would hang out and work on the code. There were a few contractors scattered off around various parts of the world and people would often work at home. Everyone would log in to Skype all day in a common chat room so you get the similar concept to shouting over to the guy at the desk near you.

We have had an office now for several months, but its over in Bellevue. Our staff is all over the Puget Sound area and traffic isn’t so wonderful around here most of the time. I was hoping we could maintain some of the culture of being able to avoid wasting 40-60 minutes a day in traffic plus the advantages of being able to concentrate at my home office (not to mention reducing the environmental impact of all that driving especially in stop and go traffic). At the same time to act as a well oiled agile startup we need to have great communication with each other and it was sometimes really difficult to find a time when all the right people were around to discuss a given topic.

What we came up with was the concept of “core office hours”. This is roughly Monday, Wednesday and Friday from 10am to 3pm. During those times people are expected to be in the office (with the obvious exceptions for vacation, travel, important appointments, etc). Those are great times to schedule a meeting, plus you can usually pull together the right people for an impromptu meeting for just about anything. But, with the limited hours this also helps prevent our schedules from filling up with constant meetings so we have solid times to get code done, tested, write important documents, etc. On Tuesday and Thursdays I can avoid getting in the car at all. On Monday, Wednesday and Friday when I do need to go into the office I can do it at a time when traffic is WAY better (20min vs 40-60) plus its a nicer time if you want to bike too.

So far overall I’d say this system is working great, but I do have a few thoughts about some considerations that are necessary to make it work-

  • It is not going to work for all job roles. Some types of jobs require you to be at the central place where people can be there together. And the job needs to be something where the output is pretty measurable- if you can’t tell if someone is goofing off, its going to breed ill-will. If the job is something where the results speak for themselves (amount of code written, bugs found, etc) it is a good fit.
  • It is not going to work for all people. To make this work you need people that are very self-motivated and self-starting.
  • The Skype thing helps us a ton (although any other form of live chat-room with presence information also works). It helps both give us that ability to communicate and get problems solved with colleagues in real time, as well as helps people be visibly “on the job”.
  • It helps to have good network resources. We rely on a combination of the Skype stuff, as well as GMail, Google Docs, Basecamp, and an SVN and Trac server that we have deployed in our data-center. I’d also point out that all of those services are accessible without VPN so our staff can easily work on stuff from home / a cafe / vacation / the road. In theory having to VPN shouldn’t matter but I’ve always found it to be a big barrier to getting real work done.

posted in Technology, Management, Jobs, Business | 0 Comments

18th June 2008

Problems Installing Vista SP1

I’m having a problem installing Vista SP1 on my main workstation. Microsoft Update keeps throwing a Code 80070570 error which has no real description. The best hint I can find searching the net is that it might be related to some disk issues, but running a CHKDSK to fix errors didn’t make the problem go away.

My next attempt is to download the standalone upgrade- its possible the problem I’m experiencing is with Microsoft Update rather than the service pack itself.

posted in Technology, Vista, Microsoft, Software | 2 Comments

13th June 2008

Memcache Race Condition Fun

Dealing with cache expiration with memcache has some subtle gotchas that many people ignore at first. Lets say you are using memcache to cache an object that you want to refresh at least every 5 minutes. The typical pattern for this would be (in PHP)-

$myobj = $memcache->get('key');
if($myobj == NULL)
{
  $myobj = LoadMyObjFromMySql();
  $memcache->set('key', $myobj, 300);
}

With this code you should expect to see one query on your database every 5 minutes. But implement this on a high capacity web site and you will probably see your MySql database get pegged every 5 minutes with a spike of a ton of queries. What is going on?

Notice that there is a time window in between the memcache get, the database query and the memcache set. This window isn’t large, but it could easily be 10ms. On a web site that is running 1000 requests per second you could easily have 10 different requests where the memcache get fails (because the object is expired), and they each hit the database before they update the memcache and everything is back to normal.

The solution is to not rely on memcache to manage the expiration. We can rewrite the code above to look more like-

$myobj = $memcache->get('key');
if($myobj == NULL || $myobj->expire + rand(0,120) < time())
{
  $myobj = LoadMyObjFromMySql();
  $myobj->expire = time() + 300;
  $memcache->set(’key’, $myobj, 600);
}

This way memcache still has an expiration for the objects, but its longer than our real target. For the first 300 seconds the objects always just get loaded from the memcache server. Starting at 300 seconds there is a tiny chance that each request will refresh from the database and update the memcache. This way if there is a large load on the server, the chances are that only 1 or 2 updates will happen, but that it will happen right away. If there is a lighter load within 7 minutes the object will eventually get expired for sure. So the chance that any given request will refresh the cache ends up being inversely proportional to the current load.

The above code of course relies on you storing PHP “objects” in the memcache so it can tag the extra expire property on. If you aren’t storing something that is already an object you can always create a new object that stores your real thing in a property called “data” and still uses expire. You will pay a small overhead for the object marshaling but it shouldn’t be too painful.

posted in Technology, Software | 0 Comments

10th June 2008

Cheap SSD Drive for my Laptop

A couple of weeks ago I wrote about problems trying to get my SSD drive working in my laptop. Since then I’ve done some experimentation and figured out the issues.

Initially I bought a RiData 32GB CF card (266x speed) and a SYBA SY-SATA2CF CF to SATA Adapter. It wasn’t working (would hang in Windows setup or boot) but I couldn’t tell which component was at fault. Since then I noticed that Sans Digital has the CS2T CF adapter which is shaped like a 2.5″ drive and accepts two CF cards. Its a lot more expensive than the Syba adapter ($99 vs. $18), but it works and $18 isn’t a bargain for a card that I just couldn’t get work right.

Having the Sans Digital be shaped like a normal drive is also a huge help. With laptops you often need to insert the thing way back into the case and they all pretty much assume the standard drive form factor. The Syba was a big pain to get in but the Sans Digital fits into my Dell laptop easily. Once I could tell it worked I bought a second RiData 32GB card and was able to just insert it and expand my volume in Windows- Presto! 64GB SSD drive for $290. It runs Vista great and I’ve installed Voyager (flight planning software) so I should be able to use it in the airplane.

I should mention that I’ve bought several products from Sans Digital so far- I’ve also gotten both of their 4-drive SATA external enclosures, the USB TR4U and the eSATA TR4M. Both work great and are inexpensive and easy to manage ways to add massive storage to your computers.

posted in Technology, Hardware | 0 Comments

9th June 2008

Microsoft Velocity and Memcache

Just saw a post about Microsoft Velocity, Microsoft’s answer to memcache. I’m looking forward to checking this out soon- we have had a ton of success using memcache on the LAMP platform and it was a missing piece in the .NET world. I understand why it was a bit hard to see its importance in that environment- Unlike PHP, .NET can persist things in memory in between requests. PHP really needs memcache quickly since you can’t really save anything from request to request so you go to the database very quickly. But as your system grows Memcache fills an even more important role because of how easily you can scale out adding more caching servers.

Memcache (at least the standard one) has a few problem scenarios all revolving around how it so easily shares the load between servers. The coolest thing about memcache is how simple of a mechanism it is- no complex configuration to tune (there are a few things to tune), maintain, debug. You simply configure N memcache servers on all your web server, and when you save or retrieve an object, it takes the ‘key’ (an arbitrary string), hashes it and uses the hash to pick one of the memcache servers (in effect it does H(key)%N where H is the hashing function and N is the number of servers). In effeect you automatically get a smooth distribution of your keys across your caching servers.

However, if you add or remove memcache servers from the array, it changes the hash, so all of the sudden your keys are on different servers. Now, if your site is under low load, this isn’t that big a deal- you just dumped your whole cache and it will build back up and be fine. But if you are running memcache because you really need it, your site just went down as all the web-servers just started pounding directly on the database with every request. Right now DeepRockDrive has a pretty unique situation where we get huge spikes of traffic (that are mostly predictable- showtime) during which the memcache servers really save our bacon, and most of the time we can clear them out more or less safely, but most big sites are going to have a more consistent traffic pattern and would have a harder time with this.

This also means that if you a memcache server goes down you can’t just pull it out of your configuration (at least under normal load). You really really need to replace it. The easiest (although resource intensive) way to do this is to just have a hot-spare server or two in your track. If one of your memcache servers goes down, you map that spare to the same IP and bring it up. You just lost a portion of your cache (10% if you have 10 servers, etc..), but its way better than losing the whole thing. A more complicated setup would be to run multiple instances of memcache on every machine. So if you have 10 memcache machines, you run each with 3 IPs and it looks like your whole array is 30 “servers”. If one goes down, you bring those 3 IPs up on 3 of the other machines distributing some of its load to those machines.

We hadn’t had a chance to fully work out the scalability of the standard memcached running with multiple instances on the same box so far. We have played with it some and on our 8-core boxes even with the right threading libraries we haven’t gotten close to maxing it out with a single instance of memcache. It looks like there may be some I/O limits but I can’t be sure about what is actually going on, still the notion of running multiple instances on the same box seems like a fairly reasonable one for scalability and these fail-over flexibility cases.

The other tricky issue is that memcache gives you a balanced distribution of keys but does not necessarily give you a balanced distribution of access. Lets say you had some runtime configuration information that you wanted to persist on your site. The easy thing to do would be to save it in a key called ‘config’ and just retrieve that key on every request.

What you have accomplished here is to just introduce a nice hard-scalability limit into your system. Memcache isn’t actually that much faster than MySql is for basic queries. If MySql can cache the query well (as it would be able to for a query on a simple table that just gets hit over and over), the performance of the two will be pretty similar. Where memcache shines is that because of how its keys and the hashing thing work, it can transparently distribute that load over the multiple servers. So the person building the app with the ‘config’ key will have something that looks great as its small and on one caching server and then when they try to apply it to a high load site with multiple servers. All the traffic still just goes to the one memcache server (since its one key that gets hashed the same every time) and they will typically be stumped why the performance isn’t better.

The way around this is to generate keys that look something like-

‘config-’.rand(0,9)

(php syntax)

At first this is counter-intuitive. I’m storing the same thing in 10 keys? That means that when they expire I’m going to have to go back and do 10x the initial loading of this object (whether from the database or config files or whatnot). However, at the cost of a very small # of those database queries (they only happen once every 5 or 60 minutes right), I’m spreading my keys out across my memcache array and the result is that the load gets spread smoothly across my whole array. I can even do a couple of slick extra things like every time I refresh the config data write to all 10 keys at once, resulting in no extra load on the database (except for a race condition I’ll cover in a future post) and just a small amount of occasional load on one web server.

I started this post mentioning Microsoft Velocity and then went into memcache- looping back to Velocity, in typical Microsoft fashion it looks like its a much more complicated solution, but it also automatically deals with some of the above issues. As far as I can tell from a few architecture diagrams the servers maintain knowledge of a cluster (memcache servers have no idea about each other) and I’d assume they automatically deal with some of the fail-over and “add a server” cases. It also has a more explicit concurrency model- although memcache supports building things with much of the same concurrency behavior you need to manage it a bunch more yourself.

Looking forward to checking out velocity more later. In particular I’m interested in when the protocol to talk to the servers will be published and whether there will be support for PHP/Python clients talking to these servers.

posted in Technology, Microsoft, Developers, Software | 4 Comments

28th May 2008

FBCal- Calendar events from Facebook into your calendar

FBCal is one of the most useful Facebook apps yet. It just creates an iCal feed from your friends birthdays and/or your events in Facebook. You can subscribe in iCal, Windows Calendar or Outlook. Slick, and very useful.

posted in Technology, Facebook | 0 Comments