object-map
that is basically a big lookup table that maps logical object ids to the addresses that the durable storage layer uses.The
object-map
will be used a lot, so it is important to come up with an implementation that has decent performance. The object-map
will also have to be stored durably, so it had better not get unwieldy when it is full of objects. The choice of implementation affects the performance of the store as a whole, so there are dozens of variations on several different themes. I'll discuss just a couple.Perhaps the simplest and most obvious implementation would be a simple vector of durable addresses. Object N's address would be stored in the Nth entry in the vector. Naturally, this is very fast. Unfortunately, every time we make a logical change to the table, we have to write the entire table back out to the durable storage. (Our durable storage layer has no mechanism for `update in place'. This is by design, but the advantages are not apparent yet. One particular disadvantage is quite obvious now.) If object IDs are permitted to be removed from the table, the table could become sparse. In that case, a simple vector would be a tremendous waste of time.
Another possible implementation would be to represent the
object-map
as an association list. This would greatly improve the performance of persistent object allocation because only the new association list entry would have to be written to durable storage. Lookup now becomes linear, however. Removing an object from the association list would require writing back every entry between the object and the most recently allocated object.Another possibility is using a vector as before, but rather than saving the entire vector just save the changes to the table. This technique degenerates into the association list case if the table is never written back.
A popular implementation technique is to use a tree of some sort. A binary tree has the advantage of logarithmic lookup time and, if done cleverly, log time and space insertion and deletion. A tree with a larger branching factor would be faster on lookup, although it would require larger update writes. If the durable storage is structured in physical blocks, it can greatly improve performance if a tree-structured object map uses nodes that correspond to the physical block size.
Regardless of the underlying implementation technique, the API is simple:
object-map/add object-map object-id object => new object map Returns a new object-map that contains all the mappings of the argument object-map in addition to a new mapping from object-id to object. If object-id had a previous mapping, that mapping is not accessible in the result. Object will be serialized into the durable store associated with the object-map. The object-map itself will also be serialized into the store. object-map/address object-map => a durable address Returns the durable address at which the object-map was serialized. object-map/create durable-store => new object map with no entries Argument durable-store is forever associated with this object map. object-map/durable-store object-map => a durable store Returns the durable-store is associated with this object map. object-map/lookup object-map object-id => an object previously added Error if object-id was never added. Returns the deserialized object associated with object-id. The object is deserialized from the durable-store associated with the object-map. object-map/recover durable-store address => object-map or Error if address does not conain an object map. Deserialize an object-map from a durable store at the given address. ;; Optional API items. object-map/lookup-address object-map object-id => the durable address of an object previously added Error if object-id was never added. Returns the durable-address of the serialized object associated with object-id. This function will no doubt exist in any implementation, but the question is whether to export it. object-map/remove object-map object-id => new object map Creates a new object map that is like the argument object-map, but does not contain a mapping for object-id.
Exercise 3: Implement the
object-map
API on top of the durable storage and serialization APIs.As an aside. This random excursion into persistent storage has nothing to do with the “Professor and Students A, B, and C” thread. I'm trying to think of the right was to continue that thread and started this unrelated thread as an amusement.
I'm with you so far.
ReplyDeleteHad you noticed the similarity between your development and the principles underlying Git? Specifically, blobs and trees, so far. I look forward to seeing what analogues to refs and commits you come up with :-)