Tuesday, February 8, 2011

More about shuffle.

In my last post, I asked why
public static void shuffle(List<?> list) was given a ‘B’ grade. A few readers didn't see the problem, so I thought I'd clarify.

I was writing code to perform some operations on the elements of a collection. Multiple copies of the code will be running. I want to avoid having each copy being synchronized with the others, so it makes sense to iterate over the list elements in a non-deterministic order. This will not prevent multiple instances of the code from occasionally ‘colliding’ on a single object, but it will make this case the exception rather than the rule.

I originally wrote the code using a Set<Foo> rather than a List<Foo> because there is no natural ordering of Foo objects. The code looked like this:
    for (Foo foo : fooSet) {
        operateOnFoo (foo);
    }
where fooSet was an instance variable declared as Set<Foo>.

A Set is an unordered abstraction. Obviously, an iterator on a Set will return the objects is some order, but abstractly, there is no meaningful way to ‘change’ the order of the Set itself. A List, on the other hand, is an ordered abstraction that imposes a external order on the elements. (An array imposes the additional unnecessary constraints of compactedness on the indexes and random access.) Changing the instance variable from Set<Foo> to a List<Foo> was trivial.

The problem is now to ‘iterate pseudorandomly’. Pseudorandom iteration is not an uncommon operation, so I assumed there was a way to do it. Pseudorandom iteration is not common enough that a special iteration construct would be defined for it, so I assumed I would simply use a normal iterator and iterate over a permutation of the list. What immediately came to mind was something like:
    for (Foo foo : fooList.permute()) {
        operateOnFoo (foo);
    }
That doesn't work.

So Java doesn't provide a ‘permute’ method on a list. I vaguely recalled that Java had something called ‘shuffle’ rather than ‘permute’. However, the List class doesn't have a ‘shuffle’ method either. None of the methods on a List will permute it. That is strange because permuting a list is a fairly common operation.

I was sure that I had seen ‘shuffle’ somewhere, so I hauled out grep to search for some code that called it. I found some and chased down the documentation. The shuffle method is a static method in the Collections class. This means that I have to use a different syntax:
    for (Foo foo : Collections.shuffle(fooList)) {
        operateOnFoo (foo);
    }
That doesn't work.

The shuffle method mutates its argument but does not return a value. This means I cannot simply call the method as an expression, but I have to hoist the call to the nearest ‘Command’ level:
    Collections.shuffle(fooList);
    for (Foo foo : fooList) {
        operateOnFoo (foo);
    }
That works, but now I have to revisit my original design: is it OK to directly shuffle the fooList, or should I copy it first? Fortunately, I don't need a copy.

To recap, what I wanted to write was this:
    for (Foo foo : fooList.permute()) {
        operateOnFoo (foo);
    }
what I had to write was this:
    Collections.shuffle(fooList);
    for (Foo foo : fooList) {
        operateOnFoo (foo);
    }
By making shuffle ‘return’ void, the author has relegated the method to command contexts only. I happened to want to use it in an expression context.

I'm sure a lot of readers are wondering “So what?” It is in fact a trivial point, and now that I've discovered how to permute a list I doubt I'll encounter this particular problem in the future. But the point is that Java distinguishes between commands and expressions. You are prohibited from using a command where an expression is called for. However, you are permitted to use an expression where a command is called for. The programmer has to keep track of which things are commands because they require a command context. The programmer doesn't have to remember which things are expressions because they can be used in either context. By making shuffle a command, the author has added yet another item to my mental “remember these aren't expressions” list. If shuffle returned the shuffled list, I wouldn't have to think about it at all.

Arcane Sentiment noted that “ Java uses the return type as a way of declaring that the operation has side effects... ”. I guess I'm weird because I use the return type to declare the type of object that will be returned.

8 comments:

John Cowan said...

I don't agree with you here.

If you want to be pure functional and not support mutation, that's one thing. If you do want to support mutation, it's better IMHO to return nothing as Java does, or to return some kind of consistent trash object (as Common Lisp sometimes but not always does, and as most Scheme implementations do) than to return an object which may or may not bear some relationship to the object being mutated.

Worse yet, the return value may be the object mutated under some mutations and not others, leading people to believe that they can safely write "let x = foo x" without worrying about whether foo mutates x or not (that is, when they no longer care about its previous value), and then suffering from nasty bugs when x is no longer what they expect.

Steve Knight said...

Definitely agree with what John Cowan says but I also agree with what you say.

My view is that if the implementation of 'shuffle()' was in my code it shouldn't mutate because I just don't like that. But that's just me.

However, I can see if I was writing a language library that my users might need some constant space shuffling algorithm if the collection was sufficiently large or the constraints were very tight.

Other users might prefer a more functional style. As the implementer of this library function I can give both types of users what they need but at the expense of expressibility for the functional user.

mquander said...
This comment has been removed by the author.
mquander said...

I think that the (much less useful) operation "randomly permute this list, and then iterate over it" should look different than the operation "iterate over a random permutation of this list." After all, the resulting state after one or the other is somewhat different. I don't like an API that makes them look the same.

Unknown said...

Hmmm.

I assume that you have explicitly given each copy of the code its own pseudo-random generator with its own independent seed? Otherwise, I expect every copy of the code to compute exactly the same pseudo-random shuffle.

It seems that you are concerned about making copies of the data. But in that case, isn't each copy of the code using the same data, hence the same shuffle? Every time one copy of the code shuffles, all the other copies see the same shuffle, collisions galore!

I think your purpose is served by a class that implements Iterator, where the constructor takes a length-N collection and a PRNG; a private field stores a permutation of {0,...,N-1}; and its "next" method returns the next item (according to the local permutation) from the collection. The idiom would be something like "for (Foo elt : new RandomIterator(fooList, randomSeed)) {...}"

Actually, now that I think about it more... If two copies of your code can safely "collide", then it seems reasonable to assume that just one copy of the code could safely "collide" with itself --- it could process the same data element more than once. If that is the case, then you can have a really trivial iterator, because there is no need to remember the entire permutation! For the "next" method, just randomly pick an element from the collection. It's not a big deal (and it only happens with probability 1/N^2 in any case) if the same element happens to be picked twice. Easy-peezy.

mquander said...

Francis, I'm not sure I really follow your suggestion, but as to getting the "same shuffle", the Java docs say that Collections.shuffle uses a "default source of randomness", by which it appears to mean "new java.util.Random()" which claims that it sets the "seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor." So I don't believe you're likely to get the same shuffle repeatedly.

mquander said...

Oh, I might be wrong, actually, about the mechanism -- here it appears that there's a static Random() object which persists between shuffles, and that's where the "default source of randomness" comes from. So obviously you wouldn't get the "same shuffle" then, either.

Unknown said...

mquander: It is important that the programmer takes control over the source(s) of pseudo-randomness. Otherwise it is practically impossible to write repeatable tests, and similarly, it is practically impossible to reproduce bugs. Pseudo-randomness is a useful part of the toolkit for CS and software engineering --- randomness is a horrible PITA.