Tuesday, August 13, 2013

Let's start with a list:
'(a b c d e f g h i j)

We'll add some elements:
'(a b c d e f g h i j k l m n o)

Delete some:
'(a b c d e i j k l m n o)

Add some more:
'(x y z a b c d e i j k l m n o)

Maybe delete one:
'(x y z b c d e i j k l m n o)

We want to compare the original sequence with the end sequence and
summarize the difference with a set of "indels", which specify the
indexes at where elements were inserted or deleted:

(diff '(a b c d e f g h i j) '(x y z b c d e i j k l m n o))

(#S(INDEL
    :LEFT-SUBSEQ-START 0         ;; 'a'
    :LEFT-SUBSEQ-LIMIT 1
    :RIGHT-SUBSEQ-START 0        ;; 'x y z'
    :RIGHT-SUBSEQ-LIMIT 3)
 #S(INDEL
    :LEFT-SUBSEQ-START 5         ;; 'f g h'
    :LEFT-SUBSEQ-LIMIT 8
    :RIGHT-SUBSEQ-START 7        ;; nothing
    :RIGHT-SUBSEQ-LIMIT 7)
 #S(INDEL
    :LEFT-SUBSEQ-START 10        ;; nothing
    :LEFT-SUBSEQ-LIMIT 10
    :RIGHT-SUBSEQ-START 9        ;; 'k l m n o'
    :RIGHT-SUBSEQ-LIMIT 14))
Exercise 6 (hard): Implement this.

For an extreme challenge, implement this in a way that is not hopelessly inefficient.

3 comments:

Jason said...

I suppose it would be hard if you tried to come up with it yourself:

https://en.wikipedia.org/wiki/Longest_common_subsequence_problem

Anonymous said...

@Jason: isn't Edit distance a better starting point for this task?

pjb said...

What about a solution in O(length(a)+length(b))?

(defstruct indel left-subseq-start left-subseq-limit right-subseq-start right-subseq-limit)
(defun diff (a b)
(list
(make-indel :left-subseq-start 0 :left-subseq-limit (length a)
:right-subseq-start 0 :right-subseq-limit (length b))))


The problem is ill-specified.