Monday, June 29, 2009

Getting rid of stubble

I have a time series: As you can see, my time series has hair. I want to separate out the hair.

My initial idea was to smooth the series by doing some sort of averaging, but I'm not convinced that averaging won't change the underlying curve. My second idea was to simply clip the outliers, but the outliers aren't as extreme down in the tail, so I'd have non-uniform noise in the curve.

My current thought is to apply a band-stop filter of some sort.

There's not much programming in this, yet, but I suspect there will be.

6 comments:

Jason Nyberg said...

I'd expect your stubble to stick out like a sore thumb in an FFT of your series...

jrm said...

I uploaded the raw data to http://code.google.com/p/jrm-code-project/source/browse/trunk/scheme/signal0.scm so others can play with it.

gyom said...

I would go through the list of numbers and fill out another list in parallel that basically tells you what the median values are within a time window of a length L=10.

Use the median, not the average from the time window. Do keep track of the standard deviation, though, with another list.

Then you go though the original and whenever you get a value that exceeds the median in that window by more than 3 times the standard deviation (or whatever), you change that value to be the median instead.

The basic idea is to re-evaluate periodically how spiky the values can be before being considered bad.

There might be a roughly equivalent way to do this with convolutions to make it faster. I don't know much about signal processing.

Joshua said...
This comment has been removed by the author.
jpc said...

gyom: Getting the median is a non-linear operation so you cannot express it with convolution.

Will Farr said...

You might have a look at Numerical Recipes (available online if you have Adobe Reader at www.nr.com); they have some chapters on smoothing data. For example, you can define various moving-window polynomial filters for the data which preserve some chosen number of moments of the data (so that, for example, all peak locations and symmetric widths are preserved, but the anti-symmetric skew about the peak is eliminated by the filter). Such filters are go by the name Savitzky-Golay. Numerical Recipes also discusses smoothing by filtering in the Fourier domain.