Monday, August 3, 2009

How Not to Write Code


It's About Time

What is it about time and calendars that induces people to make the same mistakes over and over and over? Once again I ran into some date handling issues in some code I was debugging. The problem was the same as it always is, but the attempted solution is, shall we say, interesting.

I was going to rant, but I think I'll just describe the code. You'll see some bad ideas here. They aren't mine. I had nothing to do with the design or implementation of the original code or the so-called solution.

The code keeps track of a set of a few hundred items that change over time. There is a browser-based API that allows you to add items, examine items, make changes to items, and examine the history of an item. It's a very simple application that is only slightly more sophisticated than having all the users share and edit a wiki page (or a big whiteboard).

The persistence layer for this application was a simple key-value store. The items being tracked had a bunch of fields, and each field was string. This was convenient because the key-value store didn't have much of a schema management layer. If you wanted to store anything other than a byte string, you had to define your own serialization mechanisms. If you wanted to change your schema, you were pretty much on your own. Most applications just used the system as a BLOB store.

This application was no exception. But rather than using the field names as keys, it simply serialized the entire item as alternating key value strings in a single BLOB and used an ID number as the key in the database. When fetching an item, the code would get the BLOB and separate out the keys and values and put them in a hash table. This, too, was a common practice.

Since items can change, they have a history, so there are times and dates associated with the items. People want to see and manipulate the times and dates. At first, the times and dates were simply entered as text, stored as text strings, and simply written on the page as the original text. This was sufficient for most needs, but then someone wanted the program to act on the time and date information. Some of the times and dates that were recorded were fairly important and people wanted to get email reminders near those times.

The date library came in handy at this point because it could parse text into date objects. The date objects could be compared with each other to develop a chronology, and the components of the date — the day, month, and year — could be easily manipulated. A small change to the database schema was made for convenience. Dates were no longer stored as human-readable text, but instead were stored as the number of milliseconds since the date epoch.

Things were working fine for a while, but then someone wanted a calendar display. As it turns out, there was a calendar display library that was trivial to link in and use.

There was a slight problem. Everything displayed in the calendar was off by seven hours.

Ever since the application was created, the people using it naturally entered times and dates in local time. The application stored these faithfully as strings. When date objects were introduced, everything seemed to be working fine. What no one noticed was that since the time zone wasn't specified, the date objects were constructed with the assumption of Universal Time. Everything “worked” because nothing in the application so far payed attention to the time zone. The calendar library was different.

There were two ways to fix this, neither way was pleasant. Option one was to go through the entire legacy code and make it aware of time zones. There were three drawbacks to this: a lot of changes to the code, increase in the code size because of the tables involved in handling time zones, and, most importantly, the issue of a lot of dates in the database stored with the wrong zone. Option two was to continue to treat time as local time, but to provide a correction to it prior to calling the calendar library. Option two was clearly the easier path. No legacy code needed to change, and even better, no legacy objects needed to change.

So two routines were written. One to convert a year, month, day, hour, minute, second sextuple in local time to a date object with the correct value of milliseconds since the epoch. The second to convert a date object in universal time to one with the incorrect value of milliseconds, but with the correct values of year, month, day, hour, minute, and second in local time.

The first routine worked like this: the six arguments were used to initialize a date object in universal time. Then the time zone correction was applied. This was done in two phases. First, the longitudinal correction of seven hours was applied, then the resulting date was checked to see if a further correction for daylight savings time was needed. If necessary the additional hour was added. The date object returned had the correct number of milliseconds since the epoch and was suitable for using with the calendar library.

The corrected date objects were not suitable for use with legacy code, so the second routine ‘uncorrected’ them. This was done as follows: first the corrected date object was printed as a string in UTC time. The resulting text represented the desired date. That is, the corrected UTC time printed the same as an uncorrected local time. The second step was to then extract and parse the year, month, day, hour, minute and second fields out of the text string. These fields were used to construct a new date object without using the time zone information. This new date object was uncorrected, but was compatible with the legacy code.

Needless to say this code deserved a unit test, so tests were written to ensure that it behaved correctly near the daylight savings time boundaries.

Things were continuing on smoothly. The code worked and the tests passed. But recently another engineer added an expanded test. The unit tests were run regularly on the primary platforms, but they hadn't been tested on the more obscure ones. The expanded tests used a farm of machines to test the code on many more platforms. Errors appeared.

Playing around with ‘decanonicalized’ date objects like we were doing makes some assumptions about the implementation of the date library. The primary assumption is that if you create a date object and don't specify a time zone, you implicitly mean Universal Time. Some implementations assumed you implicitly mean local time. Another assumption is that the components of a date object (year, month, day, etc.) are synchronized with the millisecond value in the same way in each implementation. Some implementations don't support arbitrary mutations to date objects in this way.

This leads to a whole slew of bizarre behavior. Double corrections, ignored corrections, the month value being modified, and so forth.

The story ends here because this is the state I found the code in. There is a replacement tool in development, so there is no real need to fix this code.

As I said before, I was going to rant about this, but for the moment I'm holding my tongue. I'd like to see what other people think.


Hijo de la Luz said...

It seems like a nominee for www,

offby1 said...

This failure to include time zone information along with timestamps is one of my pet peeves. I rant about it annoyingly often. Another example of this failure is EXIF data in digital images -- they record the time, but not the time zone; this means that I need to remember to reset my camera's clock whenever I travel across time zones (and of course I never do remember), or else announce to the world that all my pictures' timestamps are always UTC, and then watch as flickr sticks PST8PDT onto them anyway ...

Unknown said...

Date-time stuff is screwed up so often because, I think, people think that it is easy. "We swim in this stuff every day" != "We can get it right."

One of my favorite books is _Calendrical Calculations_, and should be required reading for anyone doing anything with date-driven data.

grant rettke said...

Hi Joe,

I think that there is no free lunch.