Tuesday, January 14, 2020

Palindromes, redux, and the Sufficiently Smart Compiler

The Sufficiently Smart Compiler is mentioned by authors as shorthand for “a compiler that performs nearly all reasonable optimizations, but in particular this one I want”. Many attempts were made up through the 80's and maybe into the 90's to write a Sufficiently Smart Compiler that would perform all “reasonable” optimizations, and although many impressive results have been obtained, there always seem to be fairly obvious optimizations that remain unoptimized. These days it seems that people realize that there will be good compilers and some very good compilers, but never a Sufficiently Smart Compiler. Nonetheless, it is worth considering a Sufficiently Smart Compiler as a tool for thought experiments.

I was curious what would be necessary for a Sufficiently Smart Compiler to generate optimal code for the palindrome problem given the naive algorithm.

The naive algorithm is inspired by the axioms
  • A zero or one element string is a palindrome.
  • If the first char matches the last char, and the middle is a palindrome, the result is a palindrome.
and gives us this:
(define (palindrome1? string)
  (or (< (string-length string) 2)
      (and (char=? (string-ref string 0)
                   (string-ref string (- (string-length string) 1)))
           (palindrome1? (substring string 1 (- (string-length string) 1))))))

The higher performing algorithm is inspired by the idea of keeping two pointers to each end of a string and comparing the characters at the pointers. If the characters are the same, you move the pointers inward and when they meet, you have seen a palindrome. If at any point the characters differ, you don't have a palindrome:
(define (palindrome2? string)
  (define (scan front-pointer rear-pointer)
    (or (>= front-pointer rear-pointer)
        (and (char=? (string-ref string front-pointer)
                     (string-ref string rear-pointer))
             (scan (+ front-pointer 1) (- rear-pointer 1))))
  (scan 0 (- (string-length string) 1)))
As you can see, these really aren't very different to start with. Both algorithms are iterative and both work their way in from the outside of the string. There are basically two differences. First, access to the rear of the string is either by a rear pointer, or by using the string-length of the string and subtracting 1. Second, the iterative call either uses substring or moves the pointers closer together.

First, let's assume that our processor has can reference through an indexed offset. This would mean we could point at the element one beyond the rear-pointer and not incur overhead. This isn't an unreasonable assumption for a CISC architecture such as an x86, but would probably cause 1 instruction overhead on a RISC architecture. So the second algorithm becomes this:
(define (palindrome2? string)
  (define (scan front-pointer rear-pointer)
    (or (< (- rear-pointer front-pointer) 2)
        (and (char=? (string-ref string front-pointer)
                     (string-ref string (- rear-pointer 1)))
             (scan (+ front-pointer 1) (- rear-pointer 1)))))
  (scan 0 (string-length string)))

Now this next assumption is a bit more of a stretch. The implementation of palindrome1? uses substring on each iteration and that's going to result in a lot of string copying. If our implementation used “slices” instead of copying the string, then there will be a lot less copying going on:
(define (palindrome1? string)
  (or (< (- (slice-end string) (slice-start string)) 2)
      (and (char=? (string-ref string (slice-start string))
                   (string-ref string (- (slice-end string) 1)))
           (palindrome1? 
             (substring string (+ (slice-start string) 1) (- (slice-end string) 1))))))

It is not uncommon for a compiler to introduce internal procedures for looping, so we can do that.
(define (palindrome1? string)
  (define (scan slice)
    (or (< (- (slice-end slice) (slice-start slice)) 2)
        (and (char=? (slice-ref slice (slice-start slice))
                     (slice-ref slice (- (slice-end slice) 1)))
             (scan (subslice slice (+ (slice-start slice) 1) (- (slice-end slice) 1))))))
  (scan (make-slice 0 (string-length string))))

We'll enter fantasy land again and let our compiler be smart enough to “spread” the slice data structure into the argument list of scan. This is no doubt asking too much from our compiler, but the information is available and it could in theory be done:
(define (palindrome1? string)
  (define (scan slice-start slice-end)
    (or (< (- slice-end slice-start) 2)
        (and (char=? (slice-ref string slice-start)
                     (slice-ref string (- slice-end 1)))
             (scan (+ slice-start 1) (- slice-end 1)))))
  (scan 0 (string-length string)))

And now we have palindrome2? (modulo renaming).

This doesn't really prove anything. But with a couple of somewhat unlikely compiler tricks, the naive version could be transformed to the more optimized version. It suggests that a it would be surprising but not a complete shock for an ambitious compiler writer to attempt.

I wish someone would write that Sufficiently Smart Compiler.

6 comments:

Anonymous said...

I've always done:

Palindrome X =
Let y = Reverse X
Return X == y

Joe Marshall said...

That would be even harder to compile efficiently, but if slices were implemented such that if the start pointer were after the end pointer, the contents were considered in reverse order, then maybe...

John Cowan said...

It seems to me that this problem is better solved with a Sufficiently Smart Runtime™. In particular, SRFI 13 provides substring/shared, which is intended to produce just such a slice. The SRFI allows this procedure to fall back to substring for use on Dumb Runtimes, but at least it should work well on Guile.

Alternatively, SRFI 135 provides immutable shareable string-like objects called texts. Although they are disjoint with strings, most of the procedures in the SRFI work on either strings or texts (jointly called textuals). There are three implementations available: one based on strings, for systems that have O(1) string-ref, and two based on bytevectors, one using UTF-8 and the other UTF-16 encoding. Texts are implemented as a special case of ropes with components of bounded size, so textual-ref is O(1) on all three implementations.

Joe Marshall said...

So if a Sufficiently Smart Compiler™ implements all "reasonable" optimizations, does a Sufficiently Smart Runtime™ implement all "reasonable" SRFIs?

Brad Lucier said...

If you munged your (nonempty) string into a SRFI-122/179 array, you could do something like

(let ((s (make-array (make-interval '#(0)
(vector (string-length s)))
(lambda (i)
(string-ref s i)))))
(array-every char=? s (array-reverse s '#(#t))))

array-reverse just sets up the array to be indexed in reverse order.

John Cowan said...

No reason not to, or at least the ones that can be portably implemented.