Tuesday, May 27, 2008

Ill-formed problem number 2

A and B each pick a biased coin. The bias can be either towards heads or tails and the probability of a particular bias is evenly distributed. Now A and B each flip their coins. Q: What is the probability that one flips heads and the other flips tails?

Friday, May 23, 2008

Ill-formed problem

Person A and person B each pick a random real number in the interval [0, 1.0]

Q: What is the absolute value of the difference of their choices?

Although you can't predict a particular value, you can determine the distribution of values and be able to say “99 times out of a hundred, the absolute value of the difference will be at least .005”

Tuesday, May 20, 2008

When Anton Chigurh comes to town you'd best have your lucky quarter. There could be a lot riding on it when he asks you to call it, and you don't want it to come up tails, friend-o.

You check your pocket. Yep, it's still in there. But wait - there are two quarters in your pocket! You look at them. They are indistinguishable, but you know that one is your lucky quarter that always comes up heads. What do you do?

Obviously, you toss them both. If one comes up tails, the other must be the lucky one. If both come up heads, try again. (If both come up tails, you'd better get out of town, fast!) How many tosses are you likely to have to make?

There are two outcomes: both heads and one heads, one tails. If the second coin is a `fair' coin, then each outcome is equally likely. So with one toss, you ought to have a 50-50 chance of finding out which quarter is `normal'. If you don't, then you ought to have a 50-50 chance of finding out on the *next* flip. Each flip cuts the chances of remaining ignorant by 1/2.

So how many tosses are you likely to have to make? Intuitively, it seems that you shouldn't need more than a few tosses to determine which coin is the lucky one, but it also seems that you might need at least a couple. We can rephrase the question: Do you think that you'll need a million tosses? That's absurd. How about a thousand? Still absurd. How about twenty? Hmmm... 2^20 ~= 1,000,000. One chance in a million? Won't happen. Ten? 2^10 ~= 1,000. Maybe...

We can maps tosses to log-odds (if we use 10log10, we'll get the log-odds in decibels!)

N-tosses     Odds of            
            Getting tails    Log Odds in dB
            at least once 
    0            1/1       0
    1            2/1       3.010299956639812
    2            4/1       6.020599913279623
    3            8/1       9.030899869919434
    4            16/1      12.04119982655925
    5            32/1      15.05149978319906
    6            64/1      18.06179973983887
    7            128/1     21.07209969647868
    8            256/1     24.08239965311849
    9            512/1     27.09269960975831
    10           1024/1    30.10299956639812
    11           2048/1    33.11329952303793
    12           ...       36.12359947967774
    13           ...       39.13389943631755
    14                     42.14419939295736
    15                     45.15449934959717
    16                     48.16479930623699
    17                     51.1750992628768 
    18                     54.18539921951661
    19                     57.19569917615642
    20                     60.20599913279624
.
Why are we tossing both coins? We already know that one coin is going to come up heads every time we toss it, so we don't get any new information. We can just select a coin and start tossing. If it ever comes up tails, the other coin is the one we want, but if it continues to come up heads, it must be the two headed coin. So let's just toss one coin. This changes things slightly. After our first toss, we have three possible outcomes. If it comes up tails, then we know we chose the fair coin. If it comes up heads, we could have chosen the fair coin and gotten lucky, or we could have chosen the unfair coin. The probabilities are this:

        p(Unfair) = 1/2
        p(Fair & Heads) = 1/4
        p(Fair & Tails) = 1/4
.
If it comes up tails, we know we have the wrong coin, but if it comes up heads, what is the probability that we have the right coin? I.e. what is
          p(Unfair|heads)
.
This can be computed with Bayes rule:

                                        p(heads|Unfair) 
        p(Unfair|heads) =  p(Unfair) * ------------------ 
                                           p(heads)      

        p(Unfair) = 1/2
        p(heads|Unfair) = 1
        p(heads) = 1/2 + 1/4
                                      1
        so p(Unfair|heads) = 1/2 * -------- = 1/2 * 4/3 = 2/3
                                     3/4
.
And if we toss it again, the probabilities are these:

        p(Unfair) = 1/2                (unfair is always heads)
        p(Fair & Heads & Heads) = 1/8
        p(Fair & Heads & Tails) = 1/8
        p(Fair & Tails) = 1/4          (we stop flipping!)

.
and we want to compute p(Unfair | heads & heads)


                                                  p(heads & heads|Unfair)
        p(Unfair|heads & heads) =  p(Unfair) * --------------------------- 
                                                   p(heads & heads)      

        p(Unfair) = 1/2
        p(heads & heads | Unfair) = 1
        p(heads & heads) = 1/2 + 1/8 = 5/8


                                      1
        so p(Unfair|heads) = 1/2 * -------- = 1/2 * 8/5 = 4/5
                                     5/8
.
We can continue in this vein as long as we want, but the calculations become *much* easier if we work in log-odds:

  N-tosses    p(Unfair| n-heads)   O(Unfair|n-heads)  log(O(Unfair|n-heads))
     0              1/2                 1/1               0
     1              2/3                 2/1           3.010299956639812
     2              4/5                 4/1           6.020599913279623
     3              8/9                 8/1           9.030899869919434
     4             16/17               16/1          12.04119982655925
     5             32/33               32/1          15.05149978319906
     6             64/65               64/1          18.06179973983887
     7            128/129             128/1          21.07209969647868
     8            256/257             256/1          24.08239965311849
     9            512/513             512/1          27.09269960975831
    10           1024/1025           1024/1          30.10299956639812
.
Each time we toss the coin and it comes up heads, we become more sure that the coin is the unfair one. We can quantify the improvement in the odds:

                       p(heads|Unfair)
         O(Unfair) * ------------------ = O(Unfair | heads)
                       p(heads|Fair)      
.
The odds of it being unfair, given that we just tossed `heads' is the right hand side, the left hand side is the product of the odds of it being unfair *before* we tossed it and the ratio of probabilities of heads depending on the selection of the coin. Rewriting this:


         O(Unfair|heads)      p(heads|Unfair)     2
        ----------------  = ------------------ = ---
          O(Unfair)           p(heads|Fair)       1
.
So every time we flip the coin and get heads, we improve the odds of having the unfair coin by 2:1. In log-odds space, multiplication becomes addition, so flipping the coin and getting heads adds 3.01 to our log odds estimate of the unfairness.

The cool thing about formulating the problem in this way is that you have separated the probability estimation into a number of independent steps. No matter how you arrive at an estimate of the odds, you can flip the coin and add 3.01 to your estimate. Here is an example: suppose you dropped your lucky quarter in a jar full of regular quarters. To find your lucky quarter again, you flip *all* the quarters and set aside the ones that come up tails. If the jar held 128 quarters, about 64 would come up tails and you could filter them out. Another flip discounts 32 more, then 16, etc. It would take you 7 flips just to get back to the point where you had two coins and weren't sure which one was the lucky one. Now the log of 1/128 is -21.07 Each flip adds 3.01 to your log odds estimate, so 7 flips adds 21.07, leaving you at 0 -- 50/50 odds.

Thursday, May 8, 2008

Guess Who

In the Milton Bradley game ‘Guess Who’ each player tries to determine the identity of their opponents ‘hidden face’ by asking a series of yes/no questions. An example question might be ‘Does the person have a beard?’ Depending on the answer, certain faces are eliminated from the set of possibilities. After enough questions, a player can eliminate all incorrect possibilites and deduce the correct one.

To aid in playing, each player has a tableau of the possible faces and can knock over the faces that are eliminated by the answer to a question. When one face remains standing, that must be their opponents ‘hidden face’.

What is the best strategy for playing this game?

A strategy that eliminates as many faces as possible on each turn will quickly get the answer, but there is a balancing act: a question that could eliminate a huge number of characters when answered ‘yes’ will eliminate far fewer when answered ‘no’. The liklihood of a ‘yes’ answer also diminshes with the number of people that would be eliminated: if a ‘yes’ answer would eliminate 90% of the characters, it is the case that a ‘yes’ answer will be obtained only 10% of the time. A question that eliminates exactly half the characters is optimal: regardless of the answer, the same number of candidates is eliminated each time.

How many questions need to be asked?

That depends on the number of faces in the initial pool. Consider the very last question asked. After asking it, there is one candidate left. An ideal question eliminates half the candidates. So before asking the question, there are two candidates. Consider the penultimate question. After asking it, there are the two remaining candidates for the final question. An ideal question eliminates half the candidates, so there must be four candidates before the penultimate question. By assuming more and more questions, we can double the number of candidates we can distinguish.

The Milton Bradley game gives twenty-four faces to each player. Five questions could distinguish between thirty-two faces, but four questions can only distinguish between sixteen, so the number of questions needed is at least four, and sometime five.

For the general case, with a selection of N faces, the number of yes or no questions is log_2(N). For twenty-four faces, this is about 4.6 questions, provided that each question divides the number of candidates exactly in half.

What if you can't find a yes/no question that evenly splits the candidate pool?

Suppose you asked a question like ‘Does the person wear a hat?’ and suppose it were true of one third of the original candidate pool. If the answer is ‘yes’, then you have narrowed the field to the eight characters that have hats. Three more questions will be needed. On the other hand, if the answer is no, you have narrowed the fielt to the sixteen characters that do not have hats and four more questions will be needed.

Everything else being equal, you ought to get a ‘yes’ answer on this initial question one-third of the time and a no answer two-thirds of the time. So after getting the answer, the number of remaning questions on average will be:
         (1/3 * 3) + (2/3 * 4) = 3 2/3
.
Finding out the answer to a question gives you information. How much information do you get?

We'll measure the amount of information by the change in the number of questions needed to complete the game. If a question divides the candidate pool exactly in half, then we need one fewer questions when we get the answer. This is independent of the total number of questions. In other words, if we start with N faces and we cut this in half by answering the question, we'll have N/2 faces left. Before getting the answer we have log_2(N) questions to ask, and after getting the answer we have log_2(N/2) questions. And
          log_2(N) - log_2(N/2) = log_2 (N/(N/2))
                                = log_2 (2)
                                = 1
.
Each yes/no question that divides the pool in half gives us one unit of information. In fact, this unit is called a ‘bit’.

What about the ‘hat’ question that didn't divide the pool evenly?

Well, before asking the question we had
    log_2(24) ~= 4.6 questions
.
and after getting the answer, we had (on average) 3.667 questions remaining, so it gave us 4.6 - 3.667 = .92 units of information.

But since we are working with the change, we need not refer to the total number of questions:
   2/3 * log_2(3/2) + 1/3 * log_2(3/1) = .9182958340544896
.
.92 units? You mean .92 bits?! How can you have a fraction of a bit?

We're measuring a change in the average number of questions. In each game we can only have an integral number of questions, but on average a game will take 4.6 questions.

Thursday, May 1, 2008

There doesn't seem to be too much bitrot in this old code, so I've made a bunch of progress on it. I can fasload and initialize up to ‘record’, but there is some obscure bug with the way I'm handling the record definition.