Thursday, May 8, 2008

Guess Who

In the Milton Bradley game ‘Guess Who’ each player tries to determine the identity of their opponents ‘hidden face’ by asking a series of yes/no questions. An example question might be ‘Does the person have a beard?’ Depending on the answer, certain faces are eliminated from the set of possibilities. After enough questions, a player can eliminate all incorrect possibilites and deduce the correct one.

To aid in playing, each player has a tableau of the possible faces and can knock over the faces that are eliminated by the answer to a question. When one face remains standing, that must be their opponents ‘hidden face’.

What is the best strategy for playing this game?

A strategy that eliminates as many faces as possible on each turn will quickly get the answer, but there is a balancing act: a question that could eliminate a huge number of characters when answered ‘yes’ will eliminate far fewer when answered ‘no’. The liklihood of a ‘yes’ answer also diminshes with the number of people that would be eliminated: if a ‘yes’ answer would eliminate 90% of the characters, it is the case that a ‘yes’ answer will be obtained only 10% of the time. A question that eliminates exactly half the characters is optimal: regardless of the answer, the same number of candidates is eliminated each time.

How many questions need to be asked?

That depends on the number of faces in the initial pool. Consider the very last question asked. After asking it, there is one candidate left. An ideal question eliminates half the candidates. So before asking the question, there are two candidates. Consider the penultimate question. After asking it, there are the two remaining candidates for the final question. An ideal question eliminates half the candidates, so there must be four candidates before the penultimate question. By assuming more and more questions, we can double the number of candidates we can distinguish.

The Milton Bradley game gives twenty-four faces to each player. Five questions could distinguish between thirty-two faces, but four questions can only distinguish between sixteen, so the number of questions needed is at least four, and sometime five.

For the general case, with a selection of N faces, the number of yes or no questions is log_2(N). For twenty-four faces, this is about 4.6 questions, provided that each question divides the number of candidates exactly in half.

What if you can't find a yes/no question that evenly splits the candidate pool?

Suppose you asked a question like ‘Does the person wear a hat?’ and suppose it were true of one third of the original candidate pool. If the answer is ‘yes’, then you have narrowed the field to the eight characters that have hats. Three more questions will be needed. On the other hand, if the answer is no, you have narrowed the fielt to the sixteen characters that do not have hats and four more questions will be needed.

Everything else being equal, you ought to get a ‘yes’ answer on this initial question one-third of the time and a no answer two-thirds of the time. So after getting the answer, the number of remaning questions on average will be:
         (1/3 * 3) + (2/3 * 4) = 3 2/3
Finding out the answer to a question gives you information. How much information do you get?

We'll measure the amount of information by the change in the number of questions needed to complete the game. If a question divides the candidate pool exactly in half, then we need one fewer questions when we get the answer. This is independent of the total number of questions. In other words, if we start with N faces and we cut this in half by answering the question, we'll have N/2 faces left. Before getting the answer we have log_2(N) questions to ask, and after getting the answer we have log_2(N/2) questions. And
          log_2(N) - log_2(N/2) = log_2 (N/(N/2))
                                = log_2 (2)
                                = 1
Each yes/no question that divides the pool in half gives us one unit of information. In fact, this unit is called a ‘bit’.

What about the ‘hat’ question that didn't divide the pool evenly?

Well, before asking the question we had
    log_2(24) ~= 4.6 questions
and after getting the answer, we had (on average) 3.667 questions remaining, so it gave us 4.6 - 3.667 = .92 units of information.

But since we are working with the change, we need not refer to the total number of questions:
   2/3 * log_2(3/2) + 1/3 * log_2(3/1) = .9182958340544896
.92 units? You mean .92 bits?! How can you have a fraction of a bit?

We're measuring a change in the average number of questions. In each game we can only have an integral number of questions, but on average a game will take 4.6 questions.