Tuesday, June 30, 2009

DTFT

Jason Nyberg suggested a Fourier transform. I did a little programming and came up with this:
(define (dtft samples)
  (lambda (omega)
    (sum 0 (vector-length samples)
        (lambda (n)
          (* (vector-ref samples n)
             (make-polar 1.0 (* omega n)))))))
Given a vector of uniformly spaced samples (over time), this returns the Discrete-time Fourier Transform (DTFT) of the sampled function. I vaguely remember doing something with these in my signals and systems course, but it was a long time ago and I've forgotten almost all of it.

The basic idea is fairly simple. Suppose the original data contains a regular signal at some frequency. I an arrow that rotates uniformly at some frequency and multiply it by my samples. If the arrow is rotating at the same frequency as the embedded signal, then it will be pointing in the same direction every time I hit one of the bumps caused by the embedded signal. These will all add up to a really big arrow.

The omega variable is the amount (in radians) that I rotate the arrow for each sample. I represent my arrow by a complex number 1 unit in length. As I walk through the samples, I rotate the arrow (counter-clockwise) by the appropriate angle, multiply it by the sample value, and add that to the running total. I have 1000 samples per second, so I'll need to do unit conversion if I want to keep on the same time scale (and this is confusing enough as it is).
(define (hertz->radians-per-sample samples-per-second)
  (lambda (hz)
    (/ (* *pi* 2 hz) samples-per-second)))

(define (dtft samples-per-second samples)
  (let ((convert (hertz->radians-per-sample samples-per-second)))
    (lambda (hz)
      (let ((omega (convert hz)))
        (sum 0 (vector-length samples)
             (lambda (n)
               (* (vector-ref samples n)
                  (make-polar 1 (* omega n)))))))))

(define Y (dtft 1000 *data*))

(y 10)
;Value: -150.06753930463557+44.941130809509566i

(y 15)
;Value: -213.97389237338032-84.17677317997187i
So far so good, but I'm getting back complex numbers as answers. These encode both the direction and the length of my arrows. Right now, the direction isn't important to me. It tells me the phase of that frequency component, but I just want to know how big it is (the magnitude).
(magnitude (y 15))
;Value: 229.93598187410205

(magnitude (y 20))
;Value: 292.10647664795385
So let's plot the magnitude over a range of frequencies: The hair on my graph is at exactly 64Hz.

I have a couple of comments. First, you might wonder why I'm using complex numbers. I could have done the arrows by using a two-component vector or even by keeping track of X and Y variables. The answer is that complex numbers have the concept of rotation ‘built in’. If I were explicitly modeling the arrows, I'd have a lot of trigonometric functions kicking around. Since I'm modeling frequencies and such, I should have sines and cosines all over the place. These are all tucked in to the complex numbers. There is a 2π visible in the conversion routine, but all the trig is gone.

Second, (and this confused me a bit), my original graph is discrete, but the DTFT is continuous. I can get the magnitude for any frequency, even if it is not a multiple of the sampling frequency. (This confused me because at first I only looked at multiples of the sample frequency and I didn't find the peak. I assumed I got the program wrong and did a lot of
(magnitude (y (+ 40 3/5)))
;Value: 222.50621191745878
In fact, 64 Hz is 15.625 ms, which is not an even multiple of my 1 ms sampling rate.

So what is special about 64 Hz? It seems that this is the rate at which Windows does its time slicing.

Now that I know what the hair is, the next problem is removing it.

Monday, June 29, 2009

Getting rid of stubble

I have a time series: As you can see, my time series has hair. I want to separate out the hair.

My initial idea was to smooth the series by doing some sort of averaging, but I'm not convinced that averaging won't change the underlying curve. My second idea was to simply clip the outliers, but the outliers aren't as extreme down in the tail, so I'd have non-uniform noise in the curve.

My current thought is to apply a band-stop filter of some sort.

There's not much programming in this, yet, but I suspect there will be.

Sunday, June 21, 2009

Serialization, etc.

In order to speed up my interpreter, I'm playing around with specializing certain elements of the SCode. For instance, the form
(+ x 3)
would normally involve three sub-evaluations. One to lookup +, one to lookup x, and finally one to evaluate the literal constant 3. The EvalStep looks like this:
        public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
        {
            Control unev0 = this.arg0;
            Environment env = environment;
            object ev0;
            while (unev0.EvalStep (out ev0, ref unev0, ref env)) { };

            if (ev0 == Interpreter.UnwindStack) {
                ((UnwinderState) env).AddFrame (new PrimitiveCombination1Frame0 (this, environment));
                answer = Interpreter.UnwindStack;
                environment = env;
                return false;
            }

            // It is expensive to bounce down to invoke the procedure
            // we invoke it directly and pass along the ref args.

            if (this.method (out answer, ev0)) {
                TailCallInterpreter tci = answer as TailCallInterpreter;
                if (tci != null) {
                    answer = null; // dispose of the evidence
                    // set up the interpreter for a tail call
                    expression = tci.Expression;
                    environment = tci.Environment;
                    return true;
                }
                else
                    throw new NotImplementedException ();
            }
            else return false;
        }
With the appropriate configuration flags set, the constructor for PrimitiveCombination2 would delegate to PrimitivePlusFixnum, PrimitivePlusFixnum would notice that x is (for example) lexically bound in the current frame, and that 3 was a quoted constant. It would (under these conditions) construct a PrimitivePlusFixnumA0Q. The EvalStep for this is:
        public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
        {
            answer = (int) environment.Argument0Value + this.rand1Value;
            return false;
        }
This, you can imagine, is a fair amount faster.

Now I want it to be the case that when I change the configuration flags I get better or worse performance depending on what I allow to be optimized. So when I turn off the EnablePrimitiveCombination2Optimization flag, I want the slow, generic form of the SCode to be constructed. To make this work correctly, when I serialize the SCode, I always serialize it as if it were the base form. When it is deserialized and the constructor is called, it has the choice of returning the optimized or non-optimized version as called for by the flags.

That's the theory, anyway.

There are a bunch of problems, though. In C#, like in Java, the contructor for an object must return an object of the type constructed. If I call new PrimitiveCombination2, there is no way to return a PrimitivePlusFixnumA0Q instance instead. There are standard workarounds for this problem — ‘factory’ classes are one way — I decided to use ‘smart constructors’. Each SCode class has a static Make method that is called to instantiate the class. The specialized classes are subclasses of the base, unspecialized version, so Make checks the flags and simply dispatches to the appropriate Make method for the specialized class (or calls new if the flags do not permit optimization).

Of course the default serialization protocol doesn't know about this, so I have to patch it. The base class for each unoptimized SCode class overrides the default deserializer. The new serializer actually serializes a special ‘deserializer’ object. Later, when the SCode is deserialized, an instance of the special deserializer is created. Then, during the fix-up phase of deserialization, the special deserializer calls the Make method of the base class, and at that point the appropriate specialized object is created.

This actually sort of works. But here's a problem. Almost all of the SCode is immutable. There is no reason to mutate the SCode (the exception being adding advice to lambdas), so we can avoid the immense amount of complication that mutable SCode would entail (consider mutating some SCode that was currently being run in another thread). If the SCode is immutable, then there can be no cycles in the SCode graph. (Another benefit. SCode analyzers can assume the input is a finite acyclic directed graph rather than something more complex.) Since the SCode is immutable, the SCode graph must be constructed from bottom up. The contents of a node must exist before the node itself is constructed. Under normal conditions, like when reading a file, this is naturally the case. But deserialization isn't a normal case.

The default serialization will in fact traverse the data structure in depth-first order, so when the object is reconstructed its components will already have been deserialized. But I don't want to use the default serialization because I want to change what nodes are constructed depending on the configuration flags. The specialized deserializers however are not necessarily called in the correct order. The subcomponents of the objects ought to exist, as far as I can determine, but they are not necessarily initialized. This causes problems.

When I create a lambda object I do a sanity check to ensure that there are no duplicates in the argument list. This check fails on deserialization. The argument list seems to be a vector of nulls at the point where I deserialize the lambda, and then it gets fixed up before the graph structure is returned to the caller. It is easy enough to turn off the sanity check (which I in fact did), but this throws a monkey wrench into my plan for using smart constructors. How can the smart constructor analyze the body of the lambda expression without knowing what the argument list is?

This makes no sense to me. Why does the deserializer defer fixup of the lambda argument list until after the fixup of the lambda? More importantly, how can I enforce the deserialization order I want (I don't see a means to override the deserialization of built-in objects like vectors)?

And there is a lurking bug having to do with deserializing the entire heap. SCode files are not circular, but the heap most definitely is. And under my current design, the type of a closure is related to the type of the lambda it is closed over, and the type of an environment is related to the type of closure that created it. If these objects are not instantiated in the right order upon deserialization, I won't be able to keep the SCode consistent with the envionment structure.

I guess I'll have to figure that out soon.

Saturday, June 20, 2009

I understand that Erik Naggum recently passed away. I'm sorry to hear that. He was one of the more interesting characters I've encountered on line.

Friday, June 19, 2009

Of course I spoke too soon

I just knew if I said my interpreter was self-hosting that I'd have to take it back. It really can self-host, but I need to rebuild from the original MIT Scheme binaries again.

As I mentioned in a far earlier post, my interpreter works by walking a tree-shaped data structure called SCode. SCode is essentially an abstract syntax tree for Scheme. Both MIT Scheme and PLT Scheme interpret code by walking an abstract syntax tree (although PLT Scheme has been using a JIT compiler lately, and I'm not sure where that fits in to the picture).

The SCode for MIT Scheme comes from two sources. The runtime system will create SCode on the fly for when the user evaluates code at the prompt or loads a Scheme source file. The other source of SCode is a program called ‘sf’ which translates Scheme source code off-line into SCode and dumps the SCode into a binary file. Sf is written in Scheme, so there is a bootstrapping problem if you don't have a previous version of sf.

To solve the bootstrapping problem, I wrote a loader that can read the data format that the standard MIT Scheme distribution uses. The format is basically a memory dump with some base and offset information so you can relocate the pointers. The loader builds C# data structures using the memory dump as sort of a template.

I can read the binaries produced by the standard MIT Scheme, but I cannot write them. I never intended to be bi-directionally compatible in this way. The uni-directional compatibility is just for bootstrapping, anyway. Instead, I'm writing binary files by using the serialization facilities built in to C#.

As of yesterday I was able to load the old format binaries of sf and then run it to create new format binaries for sf and the rest of the runtime system. Then I was able to restart my interpreter using the new binaries only and perform the cycle again.

So what's the catch?

MIT Scheme SCode only has about a dozen node types. These are sufficient to describe the language, but they are also fairly generic. By adding more node types we can greatly improve performance by using specialized versions of the evaluation code. My interpreter does this on the fly at SCode creation time. The specializations and optimizations are controlled by configuration flags. In theory, any combination of the configuration flags would work, the only difference being the performance of the interpreter. The Scheme runtime system has primitives to manipulate SCode, and it doesn't expect to see these specialized nodes. But the API to the SCode hides the specialization and Scheme is none the wiser.

The problem comes in serializing the SCode to the binary format. My intent was to serialize in an unspecialized format. This would enable me to toggle the configuration flags before starting Scheme and have the system re-specialize (or not) on the fly as it deserialized the binaries. My plans, however, “gang aglee” as they are wont to do and I ended up with the specialized objects in my serialized binaries. This isn't a big deal, but when I changed how the specialized code is serialized, it broke the deserializer. So now I have to back to the original MIT distribution and read its binaries.

Thursday, June 18, 2009

“The night was quiet — too quiet.”

Time to restart posting. I have a bunch of half-finished posts. I think I was trying to write too much in one sitting. I'm going to write smaller things, but I intend to write more frequently.

Big news today: my C# version of MIT Scheme can now self-host. (I would say that it is self-hosting, but I don't want to tempt fate that much.) When I started, I was loading the SCode binaries that were created by the native Windows version of MIT-Scheme (with the runtime written in C). My bootstrap fasloader would translate the binaries into .NET data structures on the fly. Unfortunately, I couldn't create these binaries without the original syntaxer from MIT-Scheme. But I have managed to get the syntaxer to re-generate the binaries. The C# binaries have a completely different format from the original binaries. This is because I simply serialize the data structures into the file using the built-in C# serialization protocol. The fasloader now attempts to load binaries by first attempting to deserialize them, and if that fails, falling back on the bootstrap fasloader that understands the original binary format.

So I have a bootstrap runtime built out of the original binaries. I boot from them, and then use them to create a new set of binaries. I booted from the new set, created a second generation set of binaries, and saved them. Now I can boot from the second generation and create the next generation without needing a native MIT Scheme executable. (Again, in theory. I might have second or third generation bugs waiting to prove me wrong, so I'm not discarding the original code quite yet.)

I'll upload the latest code in a short while.