Sunday, April 6, 2025

When Laymen Try to be Programmers

When I worked on Google Offers (a Groupon clone), we had a user interaction problem. When a user purchased an item, there were three important steps that we wanted to acknowledge:

  1. When Google recieves an order. The user should get positive feedback that the order was recognized. If the user doesn’t get this, they will either re-submit the order, or be surprised when the order is fulfilled. The order is placed on a queue.
  2. When Google processes an order from the queue. The user should get positive feedback that Google is working on fulfilling the order. Usually, an order is fulfilled reasonably quickly, but there are situations where critical systems are unavailable and the order has to remain in the queue for an extended period of time (several hours). The user gets nervous if they get ghosted by the system after getting positive feedback that Google got the order.
  3. Finally, when Google has made the decision whether to fulfill the order or has declined the order. The user needs to be told whether to expect shipment or a refund. If a refund, then the user can take action to re-submit the order.

So in submitting an order to Google Offers, a total of three emails would be sent to the user so they could watch their order proceed through the system. The sending of these emails was controlled by the “commerce” API. The commerce API was a walled off section of the Google infrastructure that knew how to talk to the credit card companies and charge money. Normal Google code was not allowed to do these things but had to work via the commerce API, and the commerce API would take care of ensuring that the appropriate pop-ups would appear on the user’s screen and that the credit card information was securely obtained. Normal Google code never got its hands on the user’s credit card information, it only got a "go/no-go" signal from the commerce API.

So the commerce API would be the system actually taking the steps of recieving the order, charging the credit card, and returning the go/no-go signal to our system. We would instruct it to send email for each of these steps. So far so good.

The problem was that often the order would go through very quickly. The emails were processed in batches, so the email that acknowledged the reciept of the order could end up being received after the email that acknowledged that the order had been fulfilled. The user would first get an email saying "We charged your card." and only after this would they get an email saying "We got your order." This would confuse the user.

There was no way to add an artifical time lag, nor would we want to. We could not guarantee that the emails would arrive in the right order. (Even if we could, we couldn’t guarantee that the user would read them in the right order.) The solution that I proposed was to explicitly number the emails so that each email would say "This is email number N of 3 expected emails." and even perhaps a small message that said "Emails can arrive in the wrong order." If a user got email 2 first, then email 1, they could figure it out.

But the product managers didn’t see it that way. As far as they were concerned, it was confusing when email 1 arrived after email 2, so they wanted to not send email 1 if email 2 had been recieved. This is a nice idea, but I pointed out that we had no control over the order of arrival of emails, so there was no way to know if email 2 would be received prior to email 1 at the time we were sending email 1. They were adamant: "Don’t send the second email (that is, email 1, which arrived second in some situations)."

Ok, then. I adjusted the code to suppress the sending of email 1. This solved the problem of email 1 arriving after email 2, sure, but recall that email 1 was the "Google has received your order and has placed it on a queue to be processed" acknowledgement. Now when people placed an order, they would no longer get confirmation that Google had received it. Usually, Google would process the order in a timely manner and they’d quickly get email 2 which said "We are processing your order", but if there were some reason that we could not process the queue for some time, the user would be left in the dark about whether Google had heard them in the first place.

Complaints poured in.

The subsequent conversation was something like this:

“Why aren’t you acknowledging that the order has been received?”

“You explicitly told us to not send that email. See this communication (cite reference).”

“But that was not what we meant. We meant, don’t send it if the user has received the ‘Your order has been processed.’ email.”

“How are we supposed to know if email system delivered his mail and that he read it in the right order? We’re good, but not that good.”

“You mean emails can arrive or be read in the wrong order?”

“Exactly what we said in this communication (cite reference).”

“...”

“May I suggest we number the emails so that the user can figure it out if they arrive in the wrong order?”

“No, don’t do that. Just put it back the way it was.”

Done, and we are back to the original problem.

Out-of-order packets are an old problem that existed and was solved before computers. Computer programmers are, of course, very aware of the problem and the solutions. Computer programmers are well versed in the problems of “process”, and when laymen try to play at being programmers by interfering with process, they generally screw it up.

9 comments:

Anonymous said...

There a statistical solution to this? Approve the first email and wait N minutes to get approximately 99.99% of emails in order?

Joe Marshall said...

We were processing millions of orders every day, so even getting 99.99% of the right means a hundred errors daily.

Anonymous said...

Have each email contain full list of the "steps" that have happened. If user gets an email with "steps 1, 2, 3 are done" then gets an email with "steps 1, 2 are done" they can pretty easily see what has happened. Especially if the content for each step is identical between emails.

Anonymous said...

The solution may be the following: setup a relay email server and instruct the API to send emails to user-email-dog-user-domain@relay.server.com instead of real emails. The rather simple script on the relay server will just forward emails to the real addresses, but it can hold out-of-order emails while previous ones will arrived.

Anonymous said...

That sounds like the same problem. How does the relay server know that first email has been delivered to send the second one?

Anonymous said...

You can analyze the mail subject, for example. Anyway, the letter always contains all information needed to identify it (the order number, the stage of processing, etc). If you see the mail #2 for order #XXX, but you did not seen the #1 for #XXX, put the mail back in the queue, instead of delivery.

Joe Marshall said...

Recall that I repeatedly floated the idea of numbering the emails so they could clearly be put in the right order by the recipient. This suggestion was shot down several times. I got tired of suggesting solutions and instead enjoyed watching them flail away at unworkable ideas.

Scott L. Burson said...

This is the kind of thing that worries me about novices "vibe coding" with LLMs. The LLM is unlikely to even warn them of the problem, and certainly won't refuse to implement a wrong solution if they insist on it.

There's more to software engineering than coding.

Anonymous said...

I like when I place an order online and the site sends me an email that links to a status page where I can check the status of my order. And that page will update dynamically, so I always know what's going on even if any notification emails arrived out of order. I think this is a good solution if Google is not doing that already.