Wednesday, June 6, 2007
A while back I got hit by a spam worm (Sobig) that sent me tens of thousands of emails. I wanted to plot the amount of email I got as a function of time so I could see how the attack evolved over time. The problem with plotting discrete events (the arrival of a spam) is that the closer you zoom in on the problem, the flatter the plot looks. For instance, if I plotted the number of spams arriving with microsecond accuracy, you'd find that in any microsecond interval that you'd have one or zero emails. During the spam attack, I was getting several emails a second, but after the attack was over, I could go for hours without getting a single message. I played around with a number of ways to visualize the rate of email arrival before I hit on this method: along the X axis you plot time, along the Y axis you plot the log of the amount of time since the last message (the log of the interarrival time). This makes a very nice plot which shows rapid-fire events as dark clusters near the X axis and periods of calm as sparse dots above the axis. The scale doesn't flatten out as you zoom in, either: the points plotten remain at the same height as you stretch the X axis. This allows you to `zoom in' on the rapid fire events without causing the graph to `flatten out'. I haven't seen this particular kind of plot used very often, but I have stumbled across it every now and then. For instance, this graph uses the technique. I don't know what it is plotting, but you can see there is definitely more of it on the right-hand side. I'd be interested in hearing if this kind of plot works for others.
Posted by Joe Marshall at 4:51 PM