What happens if I try to do in C# what I did in Lisp a few days ago?
Here are the functions we'll need to compute our probabilities in
C#:
double PH1 ()
{
return Utility.Product<string> (this.TermWeight, Globals.AllTerms);
}
double D ()
{
return Globals.TuningAlpha * Utility.Gamma (N ())
+ leftChild.D () * rightChild.D ();
}
double Pi ()
{
return Globals.TuningAlpha * Utility.Gamma (N ()) / D ();
}
double P ()
{
double pi_k = Pi ();
return pi_k * PH1 () + ((1.0 - pi_k) * leftChild.P () * rightChild.P ());
}
double R ()
{
return Pi () * PH1 () / P ();
}
double ProbabilityToOdds (double p)
{
return p / (1.0 - p);
}
double ROdds ()
{
return ProbabilityToOdds (R());
}
The marginal probability of clustering is given by function
R
, and the marginal odds of clustering is given by
function
ROdds
.
In the Scheme version, I
proceeded to convert the program to odds space by substituting the
definitions of
R
,
probability->odds
,
Pi
, etc. and then rewriting and simplifying the resulting
code. This is because in Scheme the name of the function and
the function itself are
interchangable.
Let's see how this works in C#. First let's try ProbabilityToOdds.
In Scheme, we have this:
(define probability->odds
(lambda (p) (/ p (- 1.0 p))))
(define r-odds
(lambda (cluster)
(probability->odds (R cluster))))
And we simply replace the name with the definition:
(define r-odds
(lambda (cluster)
((lambda (p) (/ p (- 1.0 p))) (R cluster))))
But in C#, what exactly is
ProbabilityToOdds
defined as?
ProbabilityToOdds = ????
It's a static method, not a first-class object. But C# has
delegate
objects that can be used to act as if they were
first-class methods. We can do this:
Func<double,double> P2O = ProbabilityToOdds;
And now we have a
delegate
for
ProbabilityToOdds
. The
delegate
is a first-class object. We can use it just
like we used
ProbabilityToOdds
. For example
double ProbabilityToOdds (double p)
{
return p / (1.0 - p);
}
static Func<double,double> P2O = ProbabilityToOdds;
double ROdds ()
{
return P2O (R());
}
That didn't get us very far. We sort of want to go in the reverse
direction. We want to make an unnamed delegate. You can do this in
C#.
static Func<double,double> P2O =
delegate (double p)
{
return p / (1.0 - p);
}
double ROdds ()
{
return P2O (R());
}
This is a little weird because it seems that the
delegate
doesn't have a declared return type. It gets weirder. Let's
substitute
P2O
with the delegate.
double ROdds ()
{
return delegate (double p)
{
return p / (1.0 - p);
} (R());
}
Compiler error: ``Method name expected.''
It seems that the compiler is playing some tricks on us. Anonymous
delegates weren't originally in the language, and they aren't directly
supported by the runtime. However, the compiler understands what it
is we're trying to accomplish and arranges to create a good
simulation. At least in the first case. We confused it with the
second. With a couple of hints to the compiler, we can say what we
mean:
double ROdds ()
{
return ((Func<double,double>)
delegate (double p)
{
return p / (1.0 - p);
}) (R());
}
We can substitute
R
, too.
double ROdds1 ()
{
return ((Func<double, double>)
delegate (double p)
{
return p / (1.0 - p);
}) (((Func<double>) delegate ()
{
return Pi () * PH1() / P ();
}) ());
}
We'd better simplify this. It's getting a little messy. That second
function takes no arguments and is being applied to an empty argument
list. We can η-reduce it.
double ROdds1 ()
{
return ((Func<double, double>)
delegate (double p)
{
return p / (1.0 - p);
}) (Pi () * PH1() / P ());
}
It might not be immediately clear what I just did, so I'll re-set
this.
double ROdds1 ()
{
return ((Func<double, double>)
delegate (double p)
{
return p / (1.0 - p);
}) (((Func<double>) delegate ()
{
return Pi () * PH1() / P ();
}) ());
}
η-reduction is removing the function call apparatus (in green) and
using the computed value (in red) directly:
double ROdds1 ()
{
return ((Func<double, double>)
delegate (double p)
{
return p / (1.0 - p);
}) (Pi () * PH1() / P ());
}
When invoking a function, we compute the value of the operands, bind
them to the formal paramaters, and evaluate the body. In essence, we
do this:
double ROdds1 ()
{
return ((Func<double>)
delegate ()
{
double p = Pi () * PH1 () / P ();
return p / (1.0 - p);
}) ();
}
Now we can η-reduce, or we
could if this were
syntactically allowed:
double ROdds1 ()
{
return
double p = Pi () * PH1 () / P ();
p / (1.0 - p);
}
I removed the function call apparatus just as I did before, but this
time it didn't work. The body of a function in C# consists of a
series of statements, but a call invoking a function is an
expression. You can't use a statement where an expression is
expected. Then why did it work before? There was a single statement
in the function body and it was a return statement. When we removed
the return, we serendipitously converted the return statement to an
expression. Since there were no preceeding expressions, we got away
with it. In the current case, however, have a problem.
We'll have to move any statements that were in the body of our
delegate to a context where statements are permitted. With luck,
there will be a context close enough that it won't change the
semantics of the code. As it turns out, we're in luck this time.
double ROdds1 ()
{
double p = Pi () * PH1 () / P ();
return
p / (1.0 - p);
}
We were lucky because the context of the call to the delegate was the
first subexpression of a statement and we didn't have to move the code
very far. But if it were something more complicated, like, say,
this:
for (int i = 0; i < ((Func<int>)
delegate ()
{
object x = foo();
return bar (x, x);
})(); i++)
We're going to have more problems. The nearest place to hoist the
binding of
x
is outside of loop, and this will likely change the
semantics.
There
is a hack to work around this limitation, but it is
really very nasty. In C#, as in C, assignment is an expression
rather than a statement. We can separate the binding of
p
from
computing the value if we can figure out how to sequence the
expressions.
double ROdds1 ()
{
double p;
return
First
p = Pi () * PH1 () / P ()
then
p / (1.0 - p);
}
In C and C++, you can sequence expressions with a comma. This is one of those
things people use all the time in
for
statements, but
forget that it can be used elsewhere as well. You would write this:
double ROdds1 ()
{
double p;
// In C or C++ this will work.
// NOT IN JAVA OR C#
return (
p = Pi () * PH1 () / P (),
p / (1.0 - p));
}
Unfortunately, Java and C# have overlooked the comma expression.
Fortunately, they did
not overlook the conditional
expression. We can abuse it to force our sequencing because the
predicate in the conditional is required to be evaluated first.
double ROdds1 ()
{
double p;
return ((p = Pi () * PH1 () / P ()) == 0.0 || true)
? p / (1.0 - p)
: 0.0;
}
The assignment is put in the predicate as part of boolean expression
that is always true, then the second expression is put in the true
branch of the conditional. The false branch is a dummy value of the
correct type.
I did say it was very nasty, but sometimes programming is a dirty
job.
Obviously, we don't need to go this far for the problem at hand,
this:
double ROdds1 ()
{
double p = Pi () * PH1 () / P ();
return p / (1.0 - p);
}
will do just fine.
Ok. This is ridiculous. People don't do this sort of stuff when they
hack C, C# or Java. The reason is obvious. It's far too hard and
confusing, even in this toy example. It would be insane to try to do
this in real life code. Instead, of just cutting and pasting code from
here to there, we'd look at the destination context, reason about how
the return value is going to be used and how it could be best computed
in the new context, and then rewrite the code at the destination to have the
desired effect.
More to follow...