Anyone experienced with 20 Questions knows that you should ask questions that tend to cut the remaining possibilities in about half. Computer hackers know that you can distinguish a maximum of 2^20(1048576) distinct objects with twenty yes-or-no questions. Given a set of distinct objects, you need about log2 bits to enumerate them.
Intuitively, a yes-or-no question that divides your sample space in half gives you 1 bit of information.
But what if your question doesn't divide the space evenly? Again, it is intuitive that if your question has the answer `yes' for every object, then your question doesn't help narrow things down at all. This is true if your question has the answer `no' for every object as well. So if we were to plot the amount of information we get from asking a question based upon the percentage of `yes' answers, we'd have a curve that starts and ends at zero and has a peak of 1 bit at exactly the half-way point.
Ok, so suppose your yes-or-no question is `yes' for 1/4 of the objects. If the answer comes out yes, you have gotten the effect of two `binary' questions, which would be 2 bits of information. So the amount of information you get from a question depends on how much you trim the object space. This would be mathematically

-log2(x) where x is the ratio of the trimmed space to the original space. For example

,-log2(1/2) = 1 bit, 1 bit of information if you trim the space inhalf. -

log2(1/4) = 2 bits if you trim the space to 1/4 its originalsize. As a check,

-log2(1/1048576) = 20 bits --- if we go from 1048576 down to a single object we have gained 20 bits ofinformation.
But the reason we don't ask questions that trim several bits off the space of objects is because they are usually answered in the negative. If our yes-or-no question is 'yes' for 1/4 of the objects,and the answer comes out `no', we only gain

-log2(3/4) = 0.415 bits.(ok, fractional bits are weird) So when we ask a yes-or-no question we need to take into account both possible answers.
Well, since the answer is 'no' 3/4 of the time, we expect to get .415 bits 3/4 of the time, and we expect to get 2 bits the other 1/4 of the time. The average is .81 bits. We can write a formula for computing this sort quantity:

(defun log2 (n) "Log base 2" (/ (log n) (log 2)))
(defun question-bits (p)
"Given the probability that a question is true, return
the number of bits of information you can expect by
asking the question."
(let ((p~ (- 1.0 p)))
(+ (* p (- (log2 p)))
(* p~ (- (log2 p~))))))
(question-bits .5) => 1.0 ; a fifty-fifty question gives us one bit
(question-bits .25) => .81 ; a 1/4 question gives us 8/10 of a bit
(question-bits .75) => .81 ; same answer (of course)
(question-bits .1100278644383595) => .5

If we could only come up with questions that were true about 1/10th of the time, we'd have to play 40 questions.
Back to clustering next....