Thursday, October 15, 2009

Short solution

No takers? Oh well. Here's the solution for yesterday's short exercise.

The amount of remote data is small. We only have 10K records and only add a handful a day. This will all fit in memory just fine.

Now imagine that the cache is a bucket with a small hole in it. Over time, as the cache entries become stale, the cache slowly empties. We can calculate a long-term rate at which entries expire. (This isn't actually what happens, though. The entries expire en masse, but let's pretend.) If we continue to fill the bucket at the same rate as the bucket empties, it will always be full. Any slower and the bucket will empty. Any faster and it will overflow.

The remote database can deliver one entry in 150ms, but we don't want to saturate that connection (there are other clients and we presumably want to perform work other than cache refresh). So let's dedicate 2% of the client bandwidth to the cache. If we fetch no more than one entry every 50 * 150ms = 7.5 seconds, we'll remain under 2%. Of course this means that we cannot let the records expire at a rate faster than this. If our cache has 10K records and they expire at a rate of one record every 7.5 seconds, the cache will be empty in 75K seconds, or 20.8 hours. We set the expiration time on an entry at a tad more than that and we're all set. If 20.8 hours is unacceptably stale, we can shorten it by reserving more bandwidth for the cache. There is a limit, though. With a handful of clients each consuming 2%, there would be a small constant load on the server. If we increased each client to consume 10-12%, the server will be spending most of its time servicing client caches.

1 comment:

  1. Can all the clients share a cache? memcached would be perfect for this if your clients can talk to it.