Thursday, January 9, 2020

Micro-services

All the cool kids are doing micro-services these days. They are either writing their new apps as a micro-service architecture, or are in the process of changing their legacy apps to micro-service architectures. The fact that everyone seems to be doing it surely means there are no drawbacks of any note, or that the advantages vastly outweigh them.

I see these advantages to a micro-service architecture:
  • Architectural impedance match — each architectural feature maps to a few obvious micro-services.  For example, if you are adding a feature that simply surfaces a data field to a user interface, you probably need to modify little more than the component that generates the UI.  If you are adding a richer service, you'll need to add a persistence component and a business logic component in addition to the UI component. But the reason there is such a good match between the architecture and the micro-services is simply because you draw circles around each architectural component in your design and declare it to be a micro-service. Basically, if it is worth drawing as a separate component, you consider it worth implementing it as a micro-service.

    This impedance match works well with Agile/SCRUM planning. At the beginning of each sprint, it is fairly easy to assign resources to handle the changes necessary to the existing components or to write new components. Each service is usually tiny enough that it doesn't exceed the capacity of a single team. If you can assign more than one team to a component, then the component is too large and needs to be broken into more micro-services. For a more complex feature, it is possible to plan to bring the micro-service online over a period of more than one sprint without adding too much to the risk of the component.
  • Deployment impedance match — each architectural feature maps to only a handful of obvious micro-services, each one usually implemented by a stand-alone "plug-in" program.  The micro-services are usually run in a separate container where they can crash and be restarted with little ill effect on the program at large - perhaps some functionality is temporarily lost while the micro-service is restarted, or maybe a hot backup is ready to take over.  The containers are typically something like "docker" and are monitored with "kubernetes" to ensure they are up and running.  Structuring the program as a set of micro-services facilitates this kind of of monitoring and restarting. This works well when you can distribute your micro-services over a fleet of virtual and physical machines.
  • Possible bug reduction — From experience, it seems that two 500 line programs have fewer bugs than a single 1000 line program. In theory, this would scale and a program made of dozens of hundred-line subprograms would have far fewer bugs than if that same program were written as a single subprogram with several thousands of lines of code. In practice, however, I think that it isn't so simple.
    Two bug-free programs might have emergent buggy behavior when they try to co-ordinate their behavior. Emergent bugs are very hard to find and fix.
  • Robustness — it is hard to completely knock down a micro-service architecture. No matter how many subprocesses you manage to disable, at least a few will still remain running. If we are satisfied with occasional missing pieces of our program, this is robustness of a sort.
  • Dependency injection — Back in the 80's, we'd limit the spread of complexity by structuring larger programs to only know about those smaller subprograms that they depended directly upon. This was eventually given a name and declared “a good thing”. You get dependency injection almost for free in a micro-services architecture if not least because the services won't even know about each other unless you tell them. You are virtually forced to follow a reasonable design principle because you have to do something to get the services to know about each other, and this minimal effort also happens to minimize unwanted information sharing.

It is a time-honored tradition in computer programming to take any idea that offers some advantage, give it a name, elevate it to a “principle” and use it to the exclusion of any prior good idea. “Micro-service architecture” is the latest fad, and it has so rapidly replaced “object-oriented programming” that you can often find both in the same job description. This is almost a contradiction because a micro-service architecture often cuts straight through the objects manipulated by the program. What is conceptually a single object might be represented in several completely different ways in various micro-services in the system. It might be reconstituted from a relational mapping by way of an ORM, serialized into JSON format for RPCs, and returned as part of a nested GraphQL query before having its guts evacuated and spread out through the Document Object Model for display. The identity of the object through the system is completely lost.

Despite the plethora of wonderful ideas that come from a micro-service architecture, there seem to be some downsides.
  • Ubiquitous process barriers — A process barrier is thrown up between every functional component of a program. This helps limit the effect of a crash, but greatly hinders cooperation and synchronization. Error handling and try/finally clauses don't easily work across process boundaries, and what before might be handled by a critical section of code now might require a distributed locking protocol.
  • Ubiquitous RPCs — all communication between the micro-services have to take place between the tin-cans and strings we call remote procedure calls. RPCs, besides being slow, introduce failure modes that normal procedure calls don't have. And if the programmer doesn't think hard about these new failure modes, the RPC will be far less robust than its direct counterpart.
  • Anti-object oriented — all objects in the problem domain of a micro-service architecture must be simple enough to marshal into JSON objects so they can be passed as arguments and returned as values. As noted in a previous essay, the more complex the object, the harderit is to marshal across a procedure boundary, and if fields are hard to marshal, methods are virtually impossible. There is a strong incentive to keep "objects" in JSON format as they are passed through the system and only parse the JSON strings as they are needed by the lowest layer. This is the antithesis of object-oriented design which is to try to make objects as similar to their real-world counterparts as possible and encapsulate behavior (methods) along with the data.
  • Non robustness — while it is hard to knock over an entire micro-architecture application, it can be all too easy to knock over parts of it. A program that is 90% functional 100% of the time is also 10% broken 100% of the time. If that 10% is an often used part of the program, it can appear very non robust. So what if the header and footer of each page (each its own micro-service, naturally) is up nearly 100% of the time when the body of each page is missing half its data. A colleague of mine noted that a certain micro-architecture he was working on seemed very fragile for something that is supposed to be so robust.
  • Verbosity — the frameworks used to support a micro-service architecture makes plain old Java look downright terse. Objects and methods have dozens of annotations describing how to marshal the objects and hook up the RPCs. Some frameworks make use of the redundancy of Java to aid in automatically constructing a micro-architecture system, but these often rely on reflection and automated code generation that introduces its own set of problems. And Java's habit of putting every class in a separate file means you often have to follow a tortuous trail through the code to find the path from the request coming in to the method that actually handles the logic that implements the request.

Despite these drawbacks, I expect to see a lot more micro-architectures in the future because they match up so well with “factory style” software production. It really makes it easier to manage teams of developers. I expect enthusiasm to be curbed a bit when it is noticed that rewriting a program as a set of micro-services doesn't really result in a dramatic improvement in the reliability of the program or a dramatic reduction in the complexity of a large body of code.

6 comments:

John Cowan said...

As I was trying to say last time before I got garrulous (as old men tend to do), pipes (local or remote) are a much better foundation for an MSA than RPCs. And of course if you control both ends of the pipe, you aren't restricted to JSON any more than you are restricted to lines of text. As it happens, I've been working on a serialization protocol with both text (S-expression) and binary (ASN.1 DER-ish) flavors that is well-suited to dynamically typed languages, and especially the Lisps.

(ASN.1's bad reputation comes, I think, entirely from its horrible mismatch with statically typed languages. For dynamically typed languages it is obvious and natural; its only disadvantage is that it permits streaming read but not streaming write.)

Joe Marshall said...

Pipes may be a better foundation than RPCs, but how do you plan to deal with the exiting mass of RPC-based APIs already in existence, with thousands of more coming every day? The existing API tools such as Spring, Swagger, and Postman are all built around the RPC model and they aren't going to go away. RPCs are firmly entrenched as the communication protocol.

Joe Marshall said...

I got around the immediate comment problem by enabling anonymous comments. I've turned on comment moderation so I can filter out the spam comments, but now comments might be delayed a bit. It appears as if blogger cannot remember that I am logged in when I comment.

Anonymous said...

Can you elaborate on why you think that pipes are a better model ?

Anonymous said...

Having a history with ASN.1/DER, I’d highly suggest giving CBOR a real try, it’s a delightful elegant design with a lot of possibility from a clean well-defined spec.

John Cowan said...

The main issue I see in a quick look over CBOR is its lack of support for private-use types. ASN.1 allows a distinction between ISO standard, non-ISO standard, and purely local types, and distinguishes between objects that contain bytes and objects that contain sub-objects. So it is possible to add non-ISO types for "list" and "improper list" without treading on either ISO space or truly private-use space, and it is always possible for a decoder that doesn't understand a particular type to return it as a generic bytevector or collection along with the type number without garbling the rest of the stream.

As for pipes vs. RFCs, I was pointing out that you can have an MSA without depending on RPCs; of course, incorporating existing RPC-based components is another story (though you can encapsulate calls to an RPC service in a pipeline component. Pipes are better because they are not spit-and-chewing-gum, and because it's obvious that you can't pretend they are subroutine calls; you need to rethink your design, or better yet think in terms of pipelined components in the first place. Flow-based programming is a black-box coroutine paradigm about as old as Unix but 100% independent of it (it took me a while to convince the inventor that Unix pipes weren't limited to single lines just because that was the most usual unit of operations). All these systems can be thought of as especially coarse-grained dataflow.